Stacking Streams: A Rate-Limited File Transfer in Java
Here’s a deceptively simple problem: you have one big file and a crowd of clients who all want it, but only so much uplink bandwidth to give them. Hand it out too fast and you saturate the pipe and starve everything else on the box; hand it out naively and one greedy client hogs the link. What you actually want is a budget — a hard ceiling on total throughput, shared fairly — plus a little compression to make the most of every byte. That’s JunkTransfer, a rate-limited file-transfer system I wrote in Java, and it turned into a small lesson in how elegantly orthogonal concerns can stack.
The setup: chunks and threads
The server splits the file into fixed 1 MB chunks and gives each connected client one chunk, served by its own thread. The actual byte-pumping leans entirely on Guava’s ByteStreams — skip to this client’s offset, cap the read at one chunk, and copy:
ByteStreams.skipFully(fileBuffered, chunkSize * chunkId); // seek to my chunk
InputStream chunk = ByteStreams.limit(fileBuffered, chunkSize); // bound it
long sent = ByteStreams.copy(chunk, socketOutputStream); // pump bytes
That last line — copy — is doing nothing clever. It just reads bytes and writes them. All the interesting behavior is hidden inside what socketOutputStream actually is.
The gem: a stack of streams
This is the part I still find satisfying. Java’s FilterOutputStream lets you wrap one stream in another, each transforming the bytes on their way through. So the server’s output stream isn’t a socket — it’s a stack:
OutputStream out =
new CompressedBlockOutputStream(1024, // 1) compress in 1 KB blocks
new SpeedLimitedOutputStream(8, bucket, // 2) then throttle to the budget
socket.getOutputStream())); // 3) then write to the wire
A byte written at the top gets compressed, then rate-limited, then sent — and ByteStreams.copy is blissfully unaware that any of it is happening. Compression doesn’t know about throttling; throttling doesn’t know about sockets; the copy loop doesn’t know about either. Three independent concerns, composed like Lego, each readable on its own. The decorator pattern gets used as a textbook example so often that it’s easy to forget how genuinely nice it is when the bytes are real.
The governor: a token bucket
The throttling itself is a classic token bucket. The bucket holds a number of tokens equal to your byte-per-second budget; sending a byte costs a token; every one-second period the bucket refills. If you’ve run out, you block until the next refill. Tokens that would overflow a full bucket are simply lost — which is what makes it behave like the leaky bucket it’s modeled on:
public void consume(long tokens) {
while (true) {
synchronized (this) {
long now = Ticker.systemTicker().read();
if (now > nextRefillTime) { // a new period began
nextRefillTime = now + periodDuration; // = 1 second
size = numTokens; // refill to the budget
}
}
if (size >= tokens) { size -= tokens; break; } // spend and go
Uninterruptibles.sleepUninterruptibly(1, NANOSECONDS); // else wait
}
}
There’s a real trade-off buried in that periodDuration, and it’s the kind of knob QoS people obsess over. A long refill period lets a big burst through at once — cheap on CPU, but bursty traffic that’s murder for real-time or multimedia flows. A short period smooths the traffic into a fine drip, but you pay for it in CPU as threads wake up constantly to check the clock. Smooth or cheap: pick one.
The trick: one bucket, many threads
Here’s the detail that makes the whole thing work as a system rather than a per-connection limiter: every client thread shares the same bucket.
bucket = new TokenBucket(maxTransferRate); // one bucket...
for (int i = 0; i < clientsCount; i++)
new Thread(() -> transfer(clients[i], i)).start(); // ...shared by every thread
Because the budget lives in one shared bucket, the server’s total egress is capped at the user’s limit no matter how many clients connect — ten clients don’t get ten times the bandwidth, they split the one pipe. And it self-balances: if a client drops mid-transfer, its thread stops drawing tokens and the survivors naturally absorb the slack. The bandwidth ceiling is a property of the bucket, not of any one connection.
The rate-limited stream that draws on it is itself tiny — a FilterOutputStream that spends tokens before it writes, in configurable burst-sized gulps:
public void write(byte[] b, int off, int len) throws IOException {
for (; off + burst <= len; off += burst) {
bucket.consume(burst); // pay first...
super.write(b, off, burst); // ...then send
}
// ...and the remainder
}
That burst is the same smooth-versus-cheap dial as the bucket period, one level down: consume one token per byte and the shaping is exquisite but the CPU cries; consume in bigger gulps and you trade a little burstiness for a lot of headroom.
Watching it actually work
The fun part of a bandwidth limiter is that you can see whether it’s lying. I spun up multiple clients on a single machine using loopback interfaces with distinct IPs, ran the whole thing across Vagrant VMs to mimic distributed nodes, and then stared at iftop, tcpdump, and Wireshark to confirm the line held at the number I asked for. The most satisfying test was the cruel one: kill a client mid-transfer and watch the remaining flows climb to reclaim its share, with the aggregate never once poking above the ceiling.
What it taught me
JunkTransfer is an older project — Java 8, Ant, one thread per client, IPv4 only — and I wouldn’t architect a transfer service this way today. But the ideas in it have aged well. Composing compression, rate-limiting, and I/O as a stack of independent filters is still the cleanest way I know to keep orthogonal concerns from tangling. And a token bucket is still the whole of traffic shaping in about thirty lines: a count, a clock, and the discipline to pay before you send. Sometimes the most durable thing you build is a good metaphor with a tight loop around it.
The code is on GitHub at github.com/arazmj/JunkTransfer.