Reimplementing S3 from Scratch: An Object Store in Rust

The S3 API looks like the simplest thing in the world. You PUT an object, you GET it back, you DELETE it. Three verbs and a bucket name. I’ve used it for years without once wondering what it would take to be the thing on the other end of those requests — and that itch is exactly the kind I can’t leave alone. So I sat down and rebuilt it. R3 is an S3-compatible object store I wrote from scratch in Rust, and it taught me that the simplicity of S3’s front door is a very elegant lie.

A simple core — objects on disk — where the work is multipart assembly and version chains.

The simple core is genuinely simple

The first afternoon was almost suspiciously easy. On actix-web, a bucket is a directory and an object is a file. Upload writes bytes to disk; download reads them back. That’s the whole storage engine:

#[post("/{bucket}/{object}")]
pub async fn create_object(path: web::Path<ObjectPath>, payload: web::Payload)
    -> Result<impl Responder, Error> {
    let bytes = payload.to_bytes().await?;
    let mut f = path.into_file()?;            // bucket dir + object file
    f.write_all(&bytes)?;

    let etag = format!("{:x}", md5::compute(&bytes)); // S3's content fingerprint
    versioning::create_version(&path.bucket, &path.object, &bytes, &etag)?;
    Ok(HttpResponse::Created().insert_header(("ETag", etag)).finish())
}

Add registration and login on top — passwords hashed with bcrypt, nothing fancy — and you have something that answers to a few curl commands and feels like a storage service. For about a day, I thought this was going to be easy. Then I started implementing the features that make S3 actually S3.

Multipart upload: where the bookkeeping lives

The first real wall is multipart upload. It exists for a good reason: you don’t want to push a five-gigabyte file in a single request that dies at 99% and starts over. So S3 lets you cut a big object into parts, upload them independently (in parallel, even), and stitch them together at the end. The dance has three steps — initiate hands you an uploadId, you upload each numbered part, and you complete the upload to seal it.

What surprised me is that the actual byte-shuffling is the boring part. The interesting part is the accounting: remembering which parts have arrived, in what order, and with which ETags. Each part lands on disk as part-1, part-2, and so on, with its own MD5. Completing the upload is then just a careful, sorted concatenation — after checking that no part is missing:

// every part from 1..=N must be present, no gaps
let expected: Vec<u32> = (1..=upload_info.parts.len() as u32).collect();
let mut got: Vec<u32> = upload_info.parts.keys().cloned().collect();
got.sort();
if got != expected {
    return Err(ErrorBadRequest("Not all parts are present"));
}

// concatenate the parts in order into the final object, then clean up
for n in &got {
    let mut part = File::open(format!("{}/{}/part-{}", bucket, upload_id, n))?;
    std::io::copy(&mut part, &mut final_file)?;
}
fs::remove_dir_all(&upload_dir).ok();

That “no gaps” check is the kind of thing you skip in a toy and regret in anything real — a client that uploads parts 1, 2, and 4 and then asks you to finish should get an honest error, not a silently corrupt object. Getting that right felt like the moment R3 stopped being a file dump and started being a protocol.

Versioning, and the beautiful idea of a delete marker

Then I reached versioning, and that’s where I fell a little in love with how S3 is designed. Turn versioning on, and every write to a key keeps the old contents around under a unique version ID instead of clobbering them. My store for this is just a nested map — bucket, to key, to a stack of versions:

// bucket -> key -> ordered list of versions
versions: Mutex<HashMap<String, HashMap<String, Vec<VersionInfo>>>>

fn add_version(&self, bucket: &str, key: &str, info: VersionInfo) {
    let list = /* ... entry for bucket/key ... */;
    for v in list.iter_mut() { v.is_latest = false; } // demote the old head
    list.push(info);                                   // new latest on top
}

But the part that genuinely delighted me was how S3 handles deletion of a versioned object. It doesn’t delete anything. Instead it pushes a special, empty version on top of the stack — a delete marker — with a flag I named is_delete_marker. A plain GET now walks to the top of the stack, finds the tombstone, and returns a 404 as if the object were gone:

if let Some(version) = VERSION_STORE.get_latest_version(&bucket, &object) {
    if version.is_delete_marker {
        return Err(ErrorNotFound("Object is deleted"));
    }
}

And yet every byte is still sitting right there, one version down, fully recoverable by asking for its version ID. “Delete” becomes a reversible, append-only event rather than destruction. The first time I deleted an object, watched it 404, and then pulled it back out by version ID, I actually grinned at my terminal. That single idea — that a deletion is just another version — is what makes S3’s undo-everything durability possible.

The dirty secret is XML

There’s a less glamorous lesson hiding in all of this: a huge part of being “S3-compatible” is emitting exactly the right XML. Real S3 clients don’t want JSON; they want a <CompleteMultipartUploadResult> shaped precisely the way the SDK’s parser expects, down to the element names. R3 overloads a single URL with query strings the way S3 does — ?uploads begins a multipart upload, ?uploadId=… addresses one in flight, ?versionId=… reaches into history — and each one answers in the dialect the client is listening for. Compatibility, it turns out, is mostly empathy for someone else’s parser.

What it actually taught me

R3 is unapologetically a learning implementation — the metadata lives in in-memory maps, the backend is the local filesystem, it runs on a single node. It is not going to replace anyone’s storage tier. But that was never the point. The point was to discover where the difficulty in an “obvious” API actually lives, and the answer was clarifying: it isn’t in reading and writing bytes. It’s in the metadata — the parts that must arrive in order, the versions that must never clobber each other, the deletes that aren’t deletes. S3’s real genius is putting a three-verb surface over all of that and making it feel effortless. Rebuilding the inside is the best way I know to appreciate the outside.

The code is on GitHub at github.com/arazmj/r3 if you want to poke around the parts that aren’t bytes.