One Bucket, Three Jobs

The last piece ended with an image leaving my hands and landing somewhere I cannot see. I described shipping NavEngine as pushing a container image to a registry and trusting that a process on a customer's server will eventually notice.

That is accurate. It is also incomplete.

So far, there are two states for NavEngine; the hosted version - whose database I am responsible for whether it fails at three in the afternoon or three in the morning and the one a customer installs.

For this read, I share the story of what I built to hold it all together and why it ended up being more than a place to put files.

The Decision I Didn't Know I Was Making

I had a lot of questions at the start of the project.

How would the customers download the product?
Where would it be stored?
How do companies like Canonical handle distribution?
What does distribution actually imply?

I had set out to solve a storage problem. I needed a place that was reliable, versioned, and reachable from anywhere I needed to reach it from. S3 was the obvious answer. Most things are obvious in retrospect. I did not think much about the decision at the time.

What I did not anticipate was that a storage decision is also an architecture decision, and an architecture decision at the distribution layer has a way of spreading.

Job One: The Product Suite

NavEngine started as an ISO. Before a customer would run anything on their infrastructure, they needed to have an existing artifact . They needed somewhere they could fetch it from, with integrity guarantees, across whatever network conditions exist on their end.

Indulge This raised a different constraint: multiple products would need the same guarantees.

I needed a place to store iterations as I built towards UAT - User Acceptance Testing. The registry handled images. The bucket handled everything else.

The decision to version the artifacts explicitly - not overwrite, not float - came from the same reasoning behind the floating tag in the update pipeline: I need to know exactly what was shipped.

forbidden Versioning, but without guarantees..

If v4.5.0 ever needed to be reproduced or rolled back to, it should already exist. Not rebuilt. Not reconstructed. Present.

Indulge The hardest part of distributing software you do not host is preserving the integrity of what you shipped. Tags drift. References move. The artifact you think you gave a customer can diverge from what they actually received. Versioned, immutable storage is the only reliable paper trail when something breaks on a system you cannot access.

Job Two: The Backup

We initially served backups of NavEngine by running weekly and monthly VM snapshots. It was tedious, human-dependent and error-prone.

Example: If an engineer needed to manually update a database record, they would have to manually create a backup for that before that individual operation. This backup would then live on the same machine.
This meant that while the backup was present, if anything were to hit the fan, it was game over.
I have been here. The only grace I had was that it was on the staging environment. The story of how I got back is for another day.

The issue wasn’t that we lacked backups. It was that restoring them required the same precision that introduced the risk.

A backup that lives in a different system than your product artifacts is a backup that gets forgotten about during an incident, when you are moving fast and your attention is already split. Having everything in one place means one set of access credentials, one retention policy to reason about, one place to look when something is wrong.

It was anything but elegant. It was reliable.

Indulge Backup strategies fail at the moment of failure, not at the moment of setup. The question to ask is not "do I have backups" but "can I restore from them at two in the morning, under pressure, with the right people asking questions?"
If the answer is uncertain, the backup strategy is incomplete regardless of how often the job runs.

Job Three: The Agnostic Product Registry

This is the job I had not planned for.

What if we had other products to ship that had the same constraints?

Beyond hosting the versioned images of NavEngine, the storage option needed to be agnostic enough to store other images while retaining the ability to version those products too.

Somewhere in the process of building the first two jobs, I noticed that the structure I was laying down did not have to be NavEngine-specific. The bucket had opinions - a versioning scheme, an artifact layout, a naming convention - but those opinions were not tied to NavEngine. They were tied to the idea of a distributable product. Any product that needed the same things NavEngine needs - versioned artifacts, clean separation between releases, a retrievable history - could sit in the same bucket structure without the structure caring what the product was.

I did not set out to build a product registry. I set out to store some files. The registry emerged from the constraints.

As the team builds BusinessAI, the need to account for portability became obvious. Different pricing. Different licensing. Same distribution problem.

This matters because distribution surfaces compound. When the second or third product needed the same infrastructure, the cost should be near-zero. The structure would already exist, the access policies already reasoned about and the retention logic existing.

I wanted to reuse the shape I had already committed to without building new infrastructure for a new product.

I am building around invariants, not specifics. This isn’t storage for NavEngine. It’s storage for products that require versioning, integrity and retrievability from unknown environments.

The Bucket That Became Infrastructure

At some point the bucket stopped being a storage decision and became infrastructure.

Storage holds data. Infrastructure carries consequences.

For this use, I went for Garage - a self-hostable and distributed S3 store. Migrations, such as our port from v2 to v3 no longer required moving 60GB+ of assets. The problem reduced to managing references - ASCII text pointing to stable objects.

Indulge In the case of migration, moving from v2 to v3 meant that while the database rows were ported, pointing the assets for each organisation including white-labelled data was a manual process. It relied on knowing the file system storage path and identifying what asset belonged to what organisation.
This meant moving large files over a network from server A to server B - a process I would not like to do again.

Three jobs, one system - not because it was the cleanest architecture, but because the jobs turned out to share more than I initially gave them credit for -versioning, integrity, global access, and durability beyond any single system.

What I built was, inadvertently, the distribution layer for the company's technical output.

It holds what we ship, what we back up and what we might ship next.

That is more than I planned for when I sat down to store some files.

I later ported the same S3 infrastructure to my Homelabbing series, extending it with POSIX-compliant storage via versity.

This means that even as I develop software for my own use case, I can safely extend anything pre-existing to use the volumes I already have. That's it. No config. No workaround. And most importantly, no downtime.

What This Changes

As I dive deeper into product agnostic infrastructure and look back at what this quarter has meant, my thinking about distribution has shifted.

Uptime isn’t just about keeping systems running. It’s about designing for the moment you lose control of them - backups, extensibility, storage.

The CI/CD piece ended with designing for absence - for the reality that once the software leaves your hands, you have no control over what happens to it. The S3 structure is the complementary question: what must never leave your control?

Artifacts. Backups. Release history - the source of truth.

What the customer runs is a copy. If the copy breaks, you come back to the source. If the source is solid, the copy is recoverable.

That is the job the bucket is actually doing.

Not storage. Infrastructure.