Why we Rewrote the Agent in Rust

The Portabase agent handles backups end-to-end: it reads data, chunks it, compresses it, encrypts it, and streams it to remote storage. It runs close to the data and has to stay stable under load, on machines we don’t fully control.

The first version was written in Python. That was the right call at the time.

The Python version: good for building fast, less good for long-term stability

Python helped us move quickly. We could wire the full pipeline in a short time, iterate on chunking and compression strategies, and validate the system behavior without getting stuck on low-level details.

That early version did its job: it proved the architecture worked.

The issues showed up later, once we started running it at scale.

Under load, concurrency behavior was not fully predictable. Not because Python is “bad”, but because of how the runtime works: the GIL, thread scheduling, and I/O coordination introduce variability that becomes visible when you push throughput.

Memory behavior was another point. During large backups, compression buffers and streaming overlap in ways that make usage harder to reason about precisely. Most of the time it works fine, but under pressure you get spikes that are hard to anticipate.

What Rust actually changes

Rust doesn’t magically make code better. What it does is remove a lot of ambiguity before the program runs.

The ownership model forces you to be explicit about who owns data and how long it lives. That sounds theoretical, but in practice it removes entire categories of bugs: accidental sharing, dangling references, unexpected mutation.

For the agent, this matters because the pipeline is inherently concurrent. Multiple stages run at the same time: reading data, compressing it, encrypting it, uploading it. In a dynamic runtime, you have to be careful not to introduce subtle race conditions or hidden shared state.

Rust makes a lot of those mistakes impossible to compile in the first place. The borrow checker is not about “safety as a feature”, it’s just enforcing rules that you would otherwise have to manually enforce in code review and testing.

Why Rust fits stream processing naturally

The agent doesn’t process files as a whole. It works in streams.

Data is split into chunks, and each chunk moves through a pipeline: read → compress → encrypt → upload. This kind of workload is mostly I/O, not CPU-bound, and it benefits from predictable scheduling and low overhead.

With Rust’s async model, each step can run efficiently without needing a heavy runtime. Tasks are lightweight, and you don’t pay a constant tax in terms of abstraction layers.

More importantly, memory usage becomes easier to reason about. Buffers are scoped to specific stages of the pipeline and released deterministically. There is no garbage collector making decisions later, and no hidden retention caused by runtime references you forgot about.

The end result is simpler: when load increases, behavior degrades in a more predictable way.

Deployment: still Docker, just a much simpler container

Both versions run in Docker. That didn’t change.

What changed is what the container actually contains.

With Python, the container includes a full runtime: interpreter, dependency management, packages, sometimes native extensions. It works, but it means more moving parts inside every deployment.

With Rust, the container is basically just a binary. No interpreter, no package manager, no runtime dependency chain.

This doesn’t remove Docker complexity, but it removes a whole layer of variability inside the container itself. When something breaks, you are no longer asking “which dependency version caused this”, but “what did this code do”.

That difference matters more than it sounds in production.

Conclusion: this was about predictability, not performance

Rewriting the agent in Rust was not about chasing benchmarks.

Python helped us validate the system quickly, and that was valuable. But once the system started running continuously and at scale, we needed something more stable in behavior.

Rust gives us that by making a lot of failure modes impossible at compile time, and by reducing runtime variability. Combined with Docker, it gives a deployment model that is simple, reproducible, and easier to debug under real load.

In practice, the main gain is not speed. It’s that the system behaves the same way more often, under more conditions, with fewer surprises in production.