Skip to content

Files and blobs

Design

Files let an operator keep the opaque bytes that go with an estate, a firmware image, a config dump, a runbook, a packet capture, searchable and deduplicated, with a searchable file handle over a content-addressed blob store, behind the same Storage Gateway as everything else.

  • file is indexable metadata: name, content-type, size, sha256, tags. The searchable handle an operator references and finds (a firmware image, a device config dump, a runbook doc, a screenshot, a packet capture). It owns no bytes; it points at a blob by hash.
  • the blob store holds the bytes, content-addressed by sha256. The hash is the key, so identical bytes are one blob.

Splitting them means search and inventory operations (list, filter, tag) never touch bytes, and the same blob can back many file handles.

file tags reuse the tag key registry (the same tenant-wide governed vocabulary, so category means the same thing on a firmware image as on a component, config and credentials), but bind as a flat per-file set: a file is not on the structural exclusive-arc, so there is no parent to cascade from. The vocabulary is shared; the cascade is not.

A blob is keyed by the hash of its bytes, not a UUID, which buys:

  • dedup: identical bytes collapse to one blob (two operators uploading the same firmware, the same raw payload seen twice);
  • integrity: the hash verifies the bytes on read, tamper-evident by construction;
  • immutability: bytes cannot change without changing the key, like the append-only ground-truth logs;
  • backtest-stability: an event referencing a hash still resolves under a backtest, because the hash is stable across a backtest.

So rows reference a hash, never inline bytes. Inline bytea would kill the hash-ref stability property and bloat the firehose row. Small structured values (a datapoint, its labels) stay inline in the row’s jsonb; large or opaque payloads become a blob hash-ref (a dedicated indexed blob_sha256 column on the referencing row, so GC can probe it, not buried in jsonb): a big log_datapoint body, and especially a collection.failed event’s raw when the wire payload is large (a full SNMP walk, a big HTTP body, a capture). Raw stays inline when small; the size threshold is the switch.

The blob key is sha256, the bare content hash. There is no tenant_id: isolation is per-database (a database per tenant), so each tenant’s blobs live in a separate database and dedup is global within that database. One tenant can never detect another’s content by hash collision, because the blobs never share a store. The efficiency cost of not sharing bytes across databases is the right price for physical isolation.

The bytes live behind the Storage Gateway, so the backend swaps with no model change (the same seam as the columnar and object tiers):

  • default: pgblobs (a dedicated Postgres blob table), the single-binary, no-external-dependency story;
  • scale: an S3-compatible object store;
  • disk for local and dev.

The file and the hash reference are identical across backends; only storage_ref resolution differs.

A blob is collectable only when no live reference points at its hash AND a grace or retention floor has passed. Age-based GC alone is wrong: dedup means a blob uploaded long ago can be the one a recent event references, so collecting by the blob’s own age would orphan a live hash. References come from:

  • a file handle;
  • a large log_datapoint body;
  • a collection.failed raw hash-ref;
  • an attach event (a state_datapoint or audit_log recording “this component was attached to this file at T”).

References disappear two ways: a file is deleted, or a referencing event ages out (a retention partition drop). So GC is coupled to retention: dropping a partition releases its references, after which a now-unreferenced blob past the grace floor is collectable.

Mechanism: index-probe mark-sweep by default. GC enumerates blobs past the grace floor and, for each, probes the indexed hash-ref columns on the referencing tables; a blob with no live reference is collected. A maintained refcount column or blob_ref table is a measured optimization, earned only if the per-blob probes profile too expensive (the same ship-the-simple-thing discipline as the storage projections). The grace floor is the safety margin against an in-flight reference, so GC never races a just-written event.

The handle and the content-addressed bytes; the physical layout (the gateway, GC) is above and on storage.

TableKey columnsNotes
fileid, name, content_type, size, sha256, tagssearchable metadata handle; points at a blob by hash
blobsha256, bytes / storage_ref, size, content_typecontent-addressed bytes; dedup; backend pgblobs / S3 / disk behind the gateway; reference-counted GC