Files and blobs
DesignFiles let an operator keep the opaque bytes that go with an estate, a firmware image, a config dump, a runbook, a packet capture, searchable and deduplicated, with a searchable file handle over a content-addressed blob store, behind the same Storage Gateway as everything else.
Two layers: the file handle and the blob
Section titled “Two layers: the file handle and the blob”fileis indexable metadata: name, content-type, size,sha256, tags. The searchable handle an operator references and finds (a firmware image, a device config dump, a runbook doc, a screenshot, a packet capture). It owns no bytes; it points at a blob by hash.- the blob store holds the bytes, content-addressed by
sha256. The hash is the key, so identical bytes are one blob.
Splitting them means search and inventory operations (list, filter, tag) never touch bytes, and the same blob can back many file handles.
file tags reuse the tag key registry (the same tenant-wide governed vocabulary, so category
means the same thing on a firmware image as on a component, config and credentials),
but bind as a flat per-file set: a file is not on the structural exclusive-arc, so there is no parent
to cascade from. The vocabulary is shared; the cascade is not.
Content-addressing earns four properties
Section titled “Content-addressing earns four properties”A blob is keyed by the hash of its bytes, not a UUID, which buys:
- dedup: identical bytes collapse to one blob (two operators uploading the same firmware, the
same
rawpayload seen twice); - integrity: the hash verifies the bytes on read, tamper-evident by construction;
- immutability: bytes cannot change without changing the key, like the append-only ground-truth logs;
- backtest-stability: an event referencing a hash still resolves under a backtest, because the hash is stable across a backtest.
So rows reference a hash, never inline bytes. Inline bytea would kill the hash-ref stability
property and bloat the firehose row. Small structured values (a datapoint, its labels) stay inline
in the row’s jsonb; large or opaque payloads become a blob hash-ref (a dedicated indexed
blob_sha256 column on the referencing row, so GC can probe it, not buried in jsonb): a big log_datapoint
body, and especially a collection.failed event’s raw when the
wire payload is large (a full SNMP walk, a big HTTP body, a capture). Raw stays inline when small;
the size threshold is the switch.
Dedup is database-scoped
Section titled “Dedup is database-scoped”The blob key is sha256, the bare content hash. There is no tenant_id: isolation is
per-database (a database per tenant), so each tenant’s blobs live in a separate database and dedup
is global within that database. One tenant can never detect another’s content by hash collision,
because the blobs never share a store. The efficiency cost of not sharing bytes across databases is
the right price for physical isolation.
Backends, swappable behind the gateway
Section titled “Backends, swappable behind the gateway”The bytes live behind the Storage Gateway, so the backend swaps with no model change (the same seam as the columnar and object tiers):
- default:
pgblobs(a dedicated Postgres blob table), the single-binary, no-external-dependency story; - scale: an S3-compatible object store;
- disk for local and dev.
The file and the hash reference are identical across backends; only storage_ref resolution
differs.
Reference-counted GC, not age-based
Section titled “Reference-counted GC, not age-based”A blob is collectable only when no live reference points at its hash AND a grace or retention floor has passed. Age-based GC alone is wrong: dedup means a blob uploaded long ago can be the one a recent event references, so collecting by the blob’s own age would orphan a live hash. References come from:
- a
filehandle; - a large
log_datapointbody; - a
collection.failedraw hash-ref; - an attach event (a
state_datapointoraudit_logrecording “this component was attached to this file at T”).
References disappear two ways: a file is deleted, or a referencing event ages out (a
retention partition drop). So GC is coupled to retention: dropping a partition releases its
references, after which a now-unreferenced blob past the grace floor is collectable.
Mechanism: index-probe mark-sweep by default. GC enumerates blobs past the grace floor and,
for each, probes the indexed hash-ref columns on the referencing tables; a blob with no live
reference is collected. A maintained refcount column or blob_ref table is a measured
optimization, earned only if the per-blob probes profile too expensive (the same
ship-the-simple-thing discipline as the storage projections). The grace floor is the safety
margin against an in-flight reference, so GC never races a just-written event.
Storage
Section titled “Storage”The handle and the content-addressed bytes; the physical layout (the gateway, GC) is above and on storage.
| Table | Key columns | Notes |
|---|---|---|
file | id, name, content_type, size, sha256, tags | searchable metadata handle; points at a blob by hash |
blob | sha256, bytes / storage_ref, size, content_type | content-addressed bytes; dedup; backend pgblobs / S3 / disk behind the gateway; reference-counted GC |