# Omniglass documentation (full text)

Omniglass is an open, AV-native observability and control plane for AV and IT estates: one Go binary over PostgreSQL. This file concatenates the whole documentation site as one machine-readable artifact for LLM tools (NotebookLM and the like).

This is a proposed, forward-looking architecture; per-page build status (Design / Partial / Built / Diverged) lives at /architecture/status/. Source: https://docs.omniglass.hyperscaleav.com/

# Why Omniglass

URL: /architecture/why/

What Omniglass is, what it is for, and why AV needs its own observability platform instead of an IT monitoring tool.

Run AV at scale and you know the feeling. You find out a room is broken when someone walks out
of it. The operation runs on tribal knowledge, escalations, and last-known-good guesses. Every
high-profile meeting is a potential resume-generating event, and when one goes wrong you are in
the postmortem with no data to back you up. It is the difference between a good night's sleep and
a 3am call.

Here is the part nobody says out loud: **that is not a reflection of you or your team.** AV is
genuinely, structurally hard to see, and the industry never shipped a right way to do it. Omniglass
exists to ship one.

## AV is not built to be observed

Most AV gear is built to be **controlled, not monitored**. You can tell a device what to do. Asking
it what it is *doing* is a different problem entirely, and the device was not designed to answer it.
So you cannot reliably say what is up, what is down, and what is about to fail. At a handful of rooms
you can brute-force that with grit. At a thousand rooms, it collapses.

It is hard for reasons that are real, and none of them are your fault:

- **It is agentless.** AV gear is firmware appliances. You cannot install an agent and be done; you
  have to ask the device from the outside and take whatever it is willing, or able, to give you.
- **There is no standard, and the APIs are uneven.** Control interfaces are usually decent;
  *management* data is an afterthought, when it exists at all. Different port, protocol, and format
  for every vendor, every product, sometimes every firmware revision. Every integration is bespoke.
- **The system is the hard part.** A room is not a device. It is a signal chain (a display, a video
  bar, the microphones, a DSP, a control processor, the UCC service in the cloud, the network) and
  "healthy" is a fact about the whole chain. Two of those mics might be redundant; the control
  processor speaks its own command dialect over a TCP port. Every unique combination of gear is its
  own health model, and there is no standard to lean on.
- **It is fragmented by design.** Each manufacturer portal sees only its own devices. Stack them up
  and you have a dozen panes of glass and still no single view of the room.

Put plainly: AV was never required to be observable, so it isn't. That is an industry problem, not
an operator one.

## Why an IT monitoring tool does not finish the job

The IT monitoring world (Zabbix, Prometheus, and the rest) is genuinely excellent at what it was
built for: a fleet of servers, an agent on each one, clean and standardized metrics. The host-and-
metric model is the right shape for a data center, and these tools are the best in the world at it.

That model quietly assumes the three things AV does not have: **an agent to install, a standard API
to read, and a host that is the thing you actually care about.** Point it at a room and the gaps
show. There is no agent. There is no standard to read. And it has no idea what a "room" is, no
language for an AV control protocol, no concept of a redundant mic. It can tell you a host is up. It
cannot tell you the room is usable.

You *can* bend these tools to AV. Skilled people do it every day, scraping web interfaces,
automating CLI sessions, gluing middleware on the side to reach the gear the platform cannot. That
work is real and it is impressive. But you are doing the platform's job for it, by hand, forever,
and it still has no model of the room at the end.

## It is an architecture problem, not a tooling problem

The fix is not a better dashboard. It is a method: figure out **why** you monitor, then **what**
(model what "healthy" means), then **how** (go get the data, however the device will give it). That
is the [AV Observability Framework](https://hyperscaleav.com/framework), and its keystone is the
**health model**, the thing that answers one deceptively simple question:

> Is this room usable right now?

The health model always runs. The only question is whether it runs *as a system* against real
signal, or in the operator's head at 3am against half of it. Omniglass is the tool that runs it as a
system.

## What Omniglass is

Omniglass is an **open, self-hosted observability and control plane for AV (and IT) estates**, built
for the real world rather than the demo. It does three things an IT tool cannot, because they were
designed in from the start, not bolted on.

**It meets the devices where they are.** Agentless and protocol-diverse, it goes and gets the data
however the device will give it (SNMP, HTTP, SSH, a control processor's raw command dialect) and
normalizes every vendor's reading into one canonical signal, so a Sony display and a Samsung display
answer the same question the same way.

**It models your estate the way it actually nests.** Components, systems, rooms, buildings. The
room is a first-class system, not a tag, so health, alarms, and config attach at the level you
actually operate.

**It runs the health model.** Signals roll up the tree into "is the room working," and the rollup is
role-aware: a *required* display down takes the room down, a *redundant* mic only degrades it, an
*informational* sensor does not touch it. That is what turns a wall of red dots into one honest
answer, and it is what makes a real uptime SLA possible at all.

```d2
direction: up
classes: { node: { style.border-radius: 8 }; warn: { style: { border-radius: 8; bold: true } } }
c1: "Display: up" { class: node }
c2: "Video bar: not in call" { class: node }
c3: "Backup mic: down\n(redundant)" { class: node }
system: "Boardroom A\ndegraded" { class: warn }
floor: "Floor 3\n1 room degraded" { class: node }
c1 -> system
c2 -> system
c3 -> system
system -> floor
```

And then it acts: notify the right person, run remediate-verify-escalate (send the command, wait,
re-check the real signal, escalate if it did not take), open and close the ticket as the alarm opens
and clears.

It is flexible enough to handle the mess, and clear enough that you can actually run it across a
thousand rooms. Open source, self-hosted, vendor-agnostic, one server over a database you already
know how to run. And it is free.

## The architecture, as one journey

Every monitoring system is the same shape: **collect, evaluate, raise an event, hold it as an alarm,
act, and see it the whole time.** Omniglass is that shape, built AV-native, and the architecture
follows it end to end.

```d2
direction: right
classes: { node: { style.border-radius: 8 }; key: { style: { border-radius: 8; bold: true } } }
gear: "AV gear\nSNMP · HTTP · SSH · raw AV control" { class: node }
datapoint: "datapoint\none canonical signal" { class: key }
event: event { class: node }
alarm: "alarm\nroom degraded" { class: node }
act: "notify · remediate · ticket" { class: node }
config: "config\ndesired: input = HDMI1" { class: node }
gear -> datapoint: collect: functions, parse at the edge
datapoint -> event: evaluate: event_rule
event -> alarm: fire opens / clear resolves
alarm -> act: act
config -- datapoint: "drift?" { style.stroke-dash: 4 }
```

Read it as a journey, and each stop is a page:

1. **[Collection](/architecture/collection/)** goes and gets the data from gear that never wanted to
   give it, and parses it at the edge.
2. **[Datapoints](/architecture/datapoints/)** type every reading into one owned, canonical signal, the
   same measurement across every vendor.
3. **[Config](/architecture/variables/)** holds what a device *should* be, so drift becomes
   a signal you can see and a fix you can push.
4. **[Health](/architecture/health/)** rolls the signals up the system tree into the one answer that
   matters.
5. **[Alarms and actions](/architecture/alarms-actions/)** detect a condition, hold it until it
   resolves, and respond.

That journey is the whole architecture. The [overview](/architecture/) is the map of it.

## The point

We did not build Omniglass to add another monitoring tool to a crowded shelf. We built it so the
people who keep rooms working can finally **know their systems**: see them as systems, not as a pile
of hosts, and act before the 3am call.

An IT tool answers "is the host up?" Omniglass answers "is the room working?" The industry never
shipped a right way to see AV. So we did.

---

# Architecture

URL: /architecture/

The architecture told as one journey, following a single reading from the gear through its whole life to the answer and the action on it.

Monitoring, stripped down, is one shape: **collect the data, evaluate it, see it, act on it.** The
whole reason to do it is to **know your systems**, and the one question that matters most is
deceptively simple:

> Is this system working right now?

Omniglass is built around that question. This page follows a **single reading through its whole
life**, top to bottom, from the gear to the answer and the action on it. Each **bold word** is an
official term; the linked ones open their deep dive, and every one is defined in the
[glossary](/architecture/glossary/).

:::note[A proposed architecture]
This is a **proposed, forward-looking architecture**: where we intend to take Omniglass, written in
present tense as the target design, not a promise that every detail ships unchanged. Expect it to adjust
as we build. Each page carries a status badge, **Design** (specified, little or none built), **Partial**
(some capabilities shipped), **Built** (all shipped and tested), or **Diverged** (built, but the
implementation differs from this design, see the page's note); the badge is the page's floor. The
per-capability breakdown and what is actually shipped live on
[implementation status](/architecture/status/); undecided design points are flagged inline as
`Open question` asides. Every prose architecture page is also published as one machine-readable file at
[/llms-full.txt](/llms-full.txt) (with a curated index at [/llms.txt](/llms.txt)) for LLM tools (the interactive `.mdx` pages are not included).
:::

## The estate

Three nouns describe what you operate.

- A **[component](/architecture/core-entities/)** is a deployed device, app, or service: a display, a
  codec, a DSP, a control processor, a cloud UCC service.
- A **system** is a set of components that work together to do one job. A meeting room is a system.
  So is a classroom, a video wall, a broadcast chain. The word is deliberately universal: a system
  is the unit you actually care about, whatever shape it takes.
- A **location** ties systems and components to a physical place (campus, building, floor, room).

A component belongs to a system; a system sits in a location.

```d2
direction: down
classes: { node: { style.border-radius: 8 }; key: { style: { border-radius: 8; bold: true } } }
location: location { class: node }
system: system { class: key }
c1: component { class: node }
c2: component { class: node }
c3: component { class: node }
location -> system
system -> c1
system -> c2
system -> c3
```

## Something happens

A display drops off the network. A codec changes input. A meeting starts, or a fan stalls. The gear
changes state, and that change is what the rest of the architecture exists to catch and make sense
of.

## Collect

AV gear is **agentless**: you cannot install something inside it, so the reading has to come from
the outside. Sometimes the component **pushes** it to Omniglass; usually Omniglass **polls** for it
on an interval. Either way, a **[node](/architecture/nodes/)** running close to the gear reaches a
component over an **interface** (whatever the device speaks: SNMP, HTTP, SSH, a control processor's
own command language) and reads.

How to reach a class of device, and what to read from it, is declared once in the component's
**template**, the reusable device shape. The node runs that and, crucially, **parses the answer
right there at the edge**, turning a vendor's raw response into a normalized reading on the spot.

That normalized reading is a **datapoint**.

## The datapoint

A **[datapoint](/architecture/datapoints/)** is one value of one **canonical signal** (`power.state`,
`audio.level`), owned by exactly one entity through the **exclusive arc**: one owner, a component or
a system or a location, never more than one. It carries a **provenance** (how we know it: **observed**
from the device, **calculated** by Omniglass, or **intended** by a command we sent) and a **source**
(which sensor or path told us).

The meaning of each signal (its kind, unit, and validation) lives in a governed **registry**, and
a template *references* a registered signal rather than inventing one. That is the whole trick: two
displays from different manufacturers answer the same question the same way, because the
**measurement** is named, not the device. One canonical name, one comparable signal across the whole
fleet.

## What it should be

Not every value is measured. Some are **declared**, set by an operator rather than read from a
device: a setting that should hold (this input should be HDMI1), or a value that rides down the tree
(this system polls every 30 seconds). A declared value is **[config](/architecture/variables/)** when
it is bound to a signal, or a plain **variable** when it just rides down the tree, both resolved down
a **[cascade](/architecture/cascade/)**: set once high, overridden exactly where it matters. Config
has an observed side, so the gap between intent and reality is **drift**, a signal you can alarm on or
a fix you can push back.

## Detect

An **[event_rule](/architecture/alarms-actions/)** watches a datapoint and fires when its condition
is met, recording an **event**: our assertion, in our own words, that something happened. Pair a fire
with a clear and the two events open and resolve an **alarm**, the stateful incident, one row per
occurrence, the thing an operator works and a ticket binds to. An alarm can carry a **health impact**,
which is what turns a detection into a verdict on the system.

## Model health

A single alarm is rarely the point. The headline is **[health](/architecture/health/)**: a
first-class state carried as a calculated datapoint and owned by the **system**. A component goes
unhealthy when a health-impacting **alarm** opens on it, and the system **rolls its members up,
role-aware**. A *required* component down takes the system down; a *redundant* one only degrades it;
an *informational* one does not touch it. That is the answer to "is the system working?", and a
target on it over time is a real uptime **SLA**.

The rollup ships **opinionated by default**, a first-class model rather than a byproduct of the rules
engine, with an escape hatch for the systems the defaults get wrong. The health model always runs;
the only question is whether it runs *as a system* against real signal, or in the operator's head
against half of it.

## Act

An **action_rule** subscribes to events and alarms and runs an **[action](/architecture/alarms-actions/)**.
An action can be one step (notify the right person) or many (remediate, wait, re-check the real
datapoint, escalate if it did not take; or open and close a ticket as the alarm opens and clears).
The loop closes where it started, at the gear.

## See it

The operator never queries raw tables. Reads go through **views** (a named query returning a uniform
`{columns, rows}`), rendered in the **[console](/architecture/ui/)**: the fleet-health grid, the
alarm drill-down, the "why did this value win" cascade explainer. The whole journey is visible the
entire time.

## The journey, end to end

```d2
direction: right

# Shape colors are deliberately omitted: the inline SVG is themed from the site's
# brand tokens in custom.css so it follows Starlight's light/dark toggle. Only
# structure (rounding, dashes, the highlighted key node) lives here.
classes: {
  node: { style.border-radius: 8 }
  key: { style: { border-radius: 8; bold: true } }
}

gear: gear { class: node }
datapoint: "datapoint\ncanonical signal" { class: key }
event: event { class: node }
alarm: alarm { class: node }
health: "health\nrolls up the system" { class: node }
action: "action\nnotify · remediate · ticket" { class: node }
config: "config\ndeclared" { class: node }
views: "views → console" { class: node }

gear -> datapoint: collect (node + edge parse)
datapoint -> event: event_rule
event -> alarm: fire / clear
alarm -> health: health impact
alarm -> action: action_rule
action -> gear: command { style.stroke-dash: 4 }
config -- datapoint: drift { style.stroke-dash: 4 }
datapoint -> views
alarm -> views
health -> views
```

## Underneath

The journey rides on a few foundations, named once:

- the **[Storage Gateway](/architecture/storage/)** is the one door to the database; every read and
  write goes through it, which is where **scope** ([identity and access](/architecture/identity-access/))
  is enforced: a permission on every route, a visibility filter on every query.
- the **[workers](/architecture/workers/)** are one machinery draining a few worklists (the rule
  engine, the outbox, the clock, reconcile); no bespoke loops.
- the **[audit](/architecture/audit/)** trail and the operational logs are immutable, append-only
  ground truth: the record of who changed what and what the platform did.
- **[time](/architecture/time/)** is the one primitive that turns the passage of time into events, so
  the rest of the pipeline stays purely event-driven.
- **[scaling and deployment](/architecture/scaling/)**: the single binary is a modular monolith with run
  modes, deployed as one container for a small estate or scaled out on Kubernetes with a distributed
  edge. One binary is the packaging, not a scale ceiling.

Datapoints are parsed and emitted at the edge, so they are not re-derived from a raw store. Raw
payloads are a debugging aid (a raw mode you turn on while developing, plus failure logging on
collection); how much of that to persist, and for how long, is still being settled.

## The invariants

A handful of patterns hold everywhere, and they are why the model stays coherent:

- **Exclusive-arc ownership**: every datapoint, event, and alarm names exactly one owner (component,
  system, location, node, or global), so system- and location-level signals are first-class.
- **Immutable template versions**: an instance pins a frozen template version (or tracks `latest`);
  editing mints a new version; re-pointing is explicit.
- **On-row lineage**: a derived row carries its own evidence; there is no separate execution table.
- **Scope and the `official` boolean**: the key registries (`datapoint_type`, `event_type`) carry a
  `scope` (template / org / official) deciding where a name is unique; the other registries and rule
  rows carry an `official` boolean (the same axis minus the template layer). `official` is the curated
  ship-with set, the rest is operator-authored and local to a deployment.
- **Views by default**: current-state reads are plain views, materialized only when a profile proves
  it necessary.
- **Not event-sourced**: stateful entities (alarm, action) hold their state directly.
- **Per-database isolation**: there is no tenant column; a tenant is a database.

## Look up any term

Every official term is defined once in the **[glossary](/architecture/glossary/)**. The deep
pages in the sidebar follow this same journey: collection, the device shape, the data model,
config and credentials, the cascade, health, alarms and actions, then the foundations underneath.

Omniglass is built greenfield, one vertical slice per PR; the physical schema lives in
[storage](/architecture/storage/).

---

# AI

URL: /architecture/ai/

AI as a governed capability acting through the same API, permission, and scope seams as any caller, marked and audited, with human-in-the-loop gating.

AI in Omniglass is a **capability that spans from assistive to operational**, governed exactly like any other actor: at the assistive end it enriches and explains, at the operational end it proposes and acts. Today an AI tool authenticates via **OAuth as a `human` or `service` principal** and acts with exactly that principal's grants, so it reaches the estate through the same seams every caller uses, never a private lane ([identity and access](/architecture/identity-access/)).

## The capability spectrum

What AI does, from the assistive end toward the operational end:

- **Enrichment.** Event and alarm enrichment: context, a likely cause, a suggested next step on an occurrence the operator is already looking at. Read-only, surfaced inline.
- **Diagnosis and reporting.** Troubleshooting support, root-cause analysis across correlated signals, and report generation (health summaries, incident write-ups, period reviews).
- **Natural-language surfaces.** NL business query ("which rooms had the most ghost meetings last month"), NL configuration (authoring dashboards, rules, and alarms from a description), and NL template development (drafting a component template from a device's behavior).
- **Operational actions.** Acting on the platform on an operator's behalf: room and meeting rebooking, and general platform configuration, under that operator's grants.
- **Closed-loop automation.** Diagnose-and-fix flows that close the loop on a known failure class. **Human-in-the-loop is the default**: a mutating action is gated until the class has earned looser handling.

## AI acts through the same seams as any principal

AI is **not a side channel**. It reaches the estate through the same three seams every actor uses:

- the **API** (no private back door, no direct database path),
- **IAM permissions** (the `<resource>:<action>` capability checked on every route), and
- the **Storage Gateway scope** (the ABAC visible-set injected on every applicable query).

The richest AI seam is the **generated [MCP server](/architecture/api/)**: an MCP tool call is a call to a real API operation, so an external model drives Omniglass through the same routes, permissions, scope, and [audit](/architecture/audit/) as the SPA or the CLI, carrying the **acting user or service principal's** credential, never a parallel surface. It is a generated client like the others (a curated tool catalog, the [views](/architecture/views/) exposed as search tools, not a raw one-method-per-tool dump).

If a permission or a scope would stop a human from doing something, it stops the AI doing it too. There is no elevated AI lane.

## Provenance and audit

Every AI-produced output, an enrichment, a calculated value, a configuration change, is **marked as AI-sourced and audited**. The marking is what keeps the capability assistive-not-authoritative: a reader can always tell what came from AI, weigh it accordingly, and trace it. The audit half is native: the write attributes to the **acting principal** (the human or service the AI authenticated as) in [`audit_log`](/architecture/audit/), and the AI-sourced marking rides alongside, so the trail names a responsible actor on every move. Nothing AI touches is anonymous or unattributable.

## Human-in-the-loop gating

Mutating AI actions can require **operator sign-off**: the AI surfaces a proposed change, an operator approves it, then it executes, and the approval lands in the audit trail. Read and diagnostic actions run within the acting principal's scope without a gate. This is a **policy on AI-sourced mutations**, not a separate authorization model: the AI never exceeds the grants of the principal it acts as, and the gate is an extra confirmation on top of that boundary.

---

# Alarms and actions

URL: /architecture/alarms-actions/

How Omniglass detects a condition with a stateful alarm and responds with an action, which is a flow when it has more than one step.

Alarms and actions are how Omniglass turns a detected condition into a held incident and then into a response, so an operator gets paged, a ticket opens, or a device gets fixed without anyone watching a dashboard. An **alarm** detects a condition and holds it; an **action** does
something about it. A simple action is a single step (a `notify`); a multi-step action is a
**flow** (it branches, waits, and runs steps in parallel). Both alarm and action are **stateful
entities that hold their state directly** (not event-sourced). The credentials an action uses to reach a sink live in
credentials; the Expr and Go-template machinery in
[expressions](/architecture/expressions/).

## The alarm (a stateful entity)

**Metaphor: a morning alarm, not a pop-up.** A pop-up *alert* appears and is gone
(that is a `notify` action). An **alarm** goes off when its condition is met and
**stays high until it is interacted with**. The `alarm` row **holds its current
state directly** (`status`, `severity`, `opened_at`, `resolved_at`, `acked_by`); it
is **not** event-sourced. It is **one incident, a new row per open**, keyed by
`(event_rule, owner)` (the exclusive-arc owner, so a system- or location-owned datapoint
yields a system/location-owned alarm), the ITSM correlation anchor ([datapoints](/architecture/datapoints/)).
The open and resolve **events** carry the `alarm_id` and are the edge log; the alarm
row is the live state.

The alarm is **PG-first**: the firing `event_rule` consumer writes the stateful alarm
row in the **same Postgres transaction** as the event, with the alarm transition
**serialized per `(event_rule, owner)`** so an incident never double-opens. A
leader-elected CDC publisher (logical decoding of the WAL) then publishes the committed
open / resolve transition to JetStream, where `action_rule` consumers react. Born in the
commit, fanned out by CDC: there is no dual-write, and Postgres is never a message bus.

Transitions, by who drives them:

- **opened / resolved** are **rule-driven** (the `event_rule`'s `fire_criteria` /
  `clear_criteria`), each emitting an `event` carrying the `alarm_id`;
- **acked / snoozed** are **operator-driven**, recorded in `audit_log` (also carrying
  the `alarm_id`) and applied to the alarm row.

The full timeline assembles by `alarm_id` across events + audit; the alarm row is
never reconstructed from them.

Alarms are **terminal upstream**: they never write datapoints, so they cannot feed
back into the datapoint layer (see *Cycle safety*).

## The `event_rule`

An `event_rule` carries a required `fire_criteria` and an optional `clear_criteria`.
With a `clear_criteria` the fire event **opens** an alarm and the clear
event **resolves** it; without one the rule is momentary (a one-shot event, no
alarm). There is no separate `alarm_rule`.

```yaml
event_rule:
  scope: 'component.template == "polaris-dsp-16"'   # the shared selector (expressions)
  datapoint: dsp.temperature
  window: { reduce: avg, over: 10m }                 # machinery (optional)
  fire_criteria: "value > 65"                        # opens the alarm
  clear_criteria: "value < 60"                       # resolves it (defaults to !fire_criteria)
  for: 0                                             # fire-side sustain (optional)
  for_clear: 0                                       # clear-side sustain (optional)
  severity: average
  health: degraded                                   # optional: degrade the owner's health while open
```

`scope` selects the entities (fan-out, one alarm per match); `datapoint` is the
input; `window` / `for` / `for_clear` are the aggregation machinery; `fire_criteria` / `clear_criteria`
are the Expr leaves; `severity` names a level by id (below). A rule is **suppressible by name through
the cascade** ([cascade](/architecture/cascade/)): a high-weight group can remove a
false-firing rule without editing it (the firmware-bug workaround).

`for` and `for_clear` are **symmetric sustains** on the two edges. `for` is the
fire-side sustain: the `fire_criteria` must hold for `for` before the alarm opens.
`for_clear` mirrors it on the recovery edge: the `clear_criteria` must hold for
`for_clear` before the alarm resolves, so a source flapping at the cadence boundary
does not churn the alarm open and clear. Both default to `0` (immediate), and a rule
can set them independently (a long `for_clear` over a short `for` holds an incident
through a noisy recovery without delaying the page).

An event_rule also carries an optional **`health` impact** (`down` / `degraded`,
default none): while the alarm it opens is open, it moves its owner's
[health](/architecture/health/) by that much. Most rules carry none (an advisory
alarm); the few that do are the owner's failure conditions. This is what makes health
**alarm-sourced** rather than a parallel computation. Because a rule is scoped to
whichever arc owns its datapoint, the same machinery yields **component-level** and
**system-level** alarms: a system-scoped rule reads member data and fires a
system-owned alarm for a condition only the system cares about (a display on input 2
is fine for the display but wrong for the room), which is how system health sees what
no single component can.

## Severity: a registry of named levels

Severity is a **registry of named levels**, not a bare integer. Each level has an
**`id`** (`info`, `warning`, `average`, `high`, `disaster`), a **`label`**, a **`color`**,
and an integer **`order`** used only for comparison. Operators and rules reference a level
**by id**; the order ranks them. Official defaults ship, **spaced** so a new level inserts
by picking an order between two others, no renumbering:

| id | label | color | order |
|---|---|---|---|
| info | info | gray | 10 |
| warning | warning | yellow | 20 |
| average | average | orange | 30 |
| high | high | red | 40 |
| disaster | disaster | dark red | 50 |

Severity is **distinct from health**: severity is alert importance, health is entity
operational state ([health](/architecture/health/)), different axes. Higher order is
more severe. The level set is **operator-customizable**: an operator can relabel to
P1/P2/P3, recolor, add a level between two others, or define three levels or seven, all
config, no code. Rules and `action_rule` predicates compare **by level**, resolved through
the order (`alarm.severity >= "high"` matches `high` and `disaster`); the UI renders the
label and color from the level.

:::caution[Open question]
Whether a severity level is purely a label, color, and order, or also carries policy such as a
default ack-timeout per level.
:::

## The action (a stateful entity)

What an `action_rule` raises and runs. Like an alarm it is **stateful** and holds
its own state directly (status, current step, delivery), not event-sourced:

- **kinds**: `notify` (in-app), `webhook`, `email`, `run` (execute a command; the
  edge realization is in [templates](/architecture/templates/) / [nodes](/architecture/nodes/)).
- a **simple** action carries delivery state (`queued / sent / failed / retried`,
  the at-least-once JetStream consumer);
- a **multistep** action is a **flow**: it carries workflow state (current step, waiting,
  branches), exactly like an alarm's lifecycle.

The `action` row carries identity, kind, config, and current status.

## The `action_rule` (decoupled subscription)

Detection and response are kept **separate**, the discipline that avoids Zabbix's
action/operation tangle: the `event_rule` does not contain its response. Instead an
`action_rule` is a **NATS consumer** on the CDC-published event / alarm stream, selecting
with an Expr predicate, so one action rule can serve many alarms. Subscriptions are **indexed by event key and label**, so dispatch
evaluates only the predicates whose key or label already matches, not every rule on every event;
an action rule may carry **multiple triggers** and fires if any matches (including a label or
wildcard trigger, e.g. any event labeled `room=boardroom-a`). It is a subscription, not a fourth datapoint-pipeline
rule family (the derivation rules, calc and event, produce data; the
`action_rule` wires the resulting events and alarms to actions):

```yaml
action_rule:
  on: alarm
  when: 'transition == "open" && alarm.severity >= "high" && component.type == "device"'
  action: pagerduty-notify
```

The **source is polymorphic** but guarded (see *Cycle safety*): an alarm transition
(open / resolve, each an `event` carrying the `alarm_id`), a scheduled fire (an
`event` with `origin=scheduled`, not a separate source), an operator (manual), and
the declarative runbook step-list. Bodies are **Go templates** and sink auth is a
**credential reference**, both per [expressions](/architecture/expressions/) and
credentials.

## Storm and dependency suppression

The alarm grain stays **`(event_rule, owner)`**: one upstream fault still fans out to
one alarm per affected owner. Two primitives keep that fan-out from becoming a page
storm without collapsing the grain.

**Dependency suppression** mutes a child alarm whose owner's **parent entity on the
[exclusive-arc](/architecture/datapoints/) structural tree** is itself down. When the
parent is in a `down` health state, the child alarms beneath it are held suppressed
(open, but not dispatched), so one upstream failure does not emit N child pages. It is
expressed over the exclusive-arc tree: the same arc that owns a datapoint and its
alarm gives the parent walk for free, no separate dependency graph.

**Action-level grouping** coalesces alarms sharing **owner / label / `correlation_id`**
into one **action dispatch**: one ticket with N members, not N tickets. The alarms stay
distinct rows at the `(event_rule, owner)` grain; grouping happens at the dispatch edge
in the `action_rule`, so a storm becomes one notification carrying the member list.

A **system-scoped `event_rule`** is the sanctioned upstream-cause dedup lever. Because a
system-scoped rule reads member data and fires a **system-owned** alarm for the
room-level cause (above), it names the actual fault once at the level that owns it.
Worked example: a switch reboot downs 20 endpoints. A system-scoped rule owns the
room-level cause as a single system-owned alarm; dependency suppression mutes the 20
child endpoint alarms whose parent (the room or switch) is down; action-level grouping
coalesces whatever child alarms remain into one dispatch. The operator sees the cause,
not 20 symptoms.

## Durability and egress

Action state is **PG-first + CDC-out**: the action's step transition is written to the
`action` row in a Postgres transaction, and the leader-elected CDC publisher fans the
committed change onto JetStream. The **external send** is then a **JetStream consumer**
(retry with backoff, dead-letter), not a Postgres `SELECT ... FOR UPDATE SKIP LOCKED`
relay. External sends are **at-least-once**; sinks tolerate dupes or we add an
**idempotency key** (alarm + action + transition) so the outcome is exactly-once.
Pipeline order: **render the body, then apply auth over the rendered bytes** (HMAC signs
the rendered body), then send. **Egress safety** is always on: block internal / metadata
IPs, verify TLS, bound timeouts, control redirects.

:::caution[Open question]
The dead-letter surface and the operator retry of failed actions.
:::

:::caution[Open question]
The observed-use auth-failure feedback from actions into credential health (paired with the
credential-health model in [config and credentials](/architecture/variables/)).
:::

## Cycle safety in the action layer

The `collection -> datapoint -> alarm` core is acyclic by construction (see *Cycle
safety*), and **only data authors events**: an `event_rule` over datapoints (plus the
clock's `origin=scheduled`) is the *only* way an event enters the log. Flows and
actions never manufacture events, so the response layer cannot inject into the event
graph at all. That leaves a single possible loop, the **data-mediated control loop**
(an action commands a device, the device's new state arrives as a datapoint, which
opens an alarm, which fires the action again), closed with three rules:

1. **Alarms are terminal upstream** (they never write datapoints), so detection cannot
   feed itself directly.
2. **ack / snooze transitions do not match `action_rule`s** (only open / resolve do),
   which breaks the `action -> alarm(ack) -> action` loop.
3. **The control loop is lineage-guarded at dispatch.** Before an `action_rule` runs
   an action, the engine walks the triggering event's **causation lineage** (carried on
   **NATS message headers** across the bus); if the same `(action, owner)` already
   appears upstream it is suppressed, with a depth bound as a backstop. Flows are finite
   by construction (a step list, per-open-alarm, gated on open, cancelled on resolve /
   ack).

The walk crosses the command-to-device round trip because the command **stamps its
originating `correlation_id` onto the intended write and onto the adaptive-poll's
observed datapoint** ([datapoints](/architecture/datapoints/)). The `event_rule` that
fires off that observed datapoint inherits the id, so the lineage walk follows a real
carried id across the device edge rather than an assumed lineage; the depth bound stays
as the backstop.

The carrier crosses the lane boundary, not one continuous header hop. The `event_rule`
writes the triggering datapoint's `correlation_id` **and a `caused_by_event_id` parent
edge** onto the `event` row it creates (the record lane, [events](/architecture/events/)),
and the CDC publisher re-emits both into the record-lane message header. So "carried on
NATS headers" is really header (data lane) -> PG column -> header (record lane): the walk
is unbroken because each hop copies the pair forward.

So no edge can close a loop: events come only from data, alarms are terminal toward
datapoints, the response layer cannot author events, operator transitions never
re-trigger, and the one real-world control loop is lineage-bounded.

### Correlation id: the trace of a causal chain

The same causation lineage the cycle guard walks also powers a read-side **trace**: a
**correlation id** that threads a whole causal chain end to end. A datapoint fires an
**event**, which opens an **alarm**, which triggers a **flow / action**, which runs a
**function / command**, which may change a value and clear the alarm. The `alarm_id` links
one alarm's own open / clear events; the **correlation id** links the *entire* chain, the
originating event through every downstream event and action it caused. It is built on the
existing causation lineage (an id stamped at the head and propagated along each caused
edge, riding **NATS message headers** across the bus), pure DX and observability sugar: it
lets an operator see the chain at a glance and query "everything this one event set in
motion."

The caused edge that crosses the device is carried explicitly: when an action runs a
command, the command **stamps its `correlation_id` onto the intended write and onto the
adaptive-poll's observed datapoint** ([datapoints](/architecture/datapoints/)), so the
chain stays threaded through the device round trip and the `event_rule` that fires off
the returned datapoint inherits the id.

It is **not** a datapoint kind and **not** a stored span subsystem, just an id carried on
the chain and queryable. No new tables, no tracing backend.

## Flows: the multi-step action

A **flow** is a bounded multi-step action: a **DAG of steps** (`notify`, `command`, `wait`, `branch`,
`parallel`) over one alarm. It is **instantiated per open alarm**, gated on the alarm staying open,
**cancelled on resolve or ack**, **cycle-guarded** by the same causation-lineage walk documented
above, and **finite** by a depth / step cap. It depends on the durable per-incident timer
([time](/architecture/time/)) for its `wait` steps.

The canonical case is an **escalation**: remediate, `wait`, re-check the **real datapoint** the alarm
is built on, and escalate if it is unchanged ("run the fix, wait 10m, if the datapoint still trips
page a human, wait 1h, page the next tier"). Two more cases fall out of the same engine: a **time-bound
access grant** (grant, `wait`, revoke) and an **AI-troubleshooting flow** that fetches more data, has
the model analyze it, and routes on the verdict.

This is the platform's programmable layer, and it is deliberately a **bounded** workflow engine, not a
Turing-complete one: a finite step list, lineage-guarded (above), with a depth / step cap as defense.
A flow does **not** author events; it acts, and any effect it has on the world returns as ordinary
data (which is the only edge that could re-open its alarm, and that edge is lineage-bounded). A
drag-and-drop editor edits flows.

**The single-step and multi-step shapes are one model.** An action is one or many steps: a
single `notify` or `command` is the simple case, and a multi-step **flow** is the same step list
grown past one over the same engine. Nothing in the design distinguishes them but the length of the
list.

## Namespacing

Event rules, actions, and severity levels carry the same **`official` boolean**
and `UpsertOfficial` as the rest of the registries. `official: true` rows ship
vetted; `official: false` rows are operator-authored and central to component templates (the
concrete way to notify or to run a command against a given device class).

## Storage

`alarm` and `action` are **stateful entities that hold their current state directly** (not event-sourced); the physical layout (the owner arc, partitioning) lives on [storage](/architecture/storage/).

| Table | Key columns | Notes |
|---|---|---|
| `alarm` | **id**, event_rule, owner arc, **status, severity, opened_at, resolved_at, acked_by** | a stateful entity, **one incident, new row per open**; holds current state directly (not event-sourced); the ITSM anchor. History = events + audit by `id` |
| `action` | id, **steps (ordered: notify/command/wait/branch)**, status, current_step | a stateful entity; delivery and step state; driven by events/alarms |

A command is **not a table**: it is a `component_template_version.spec` declaration (the interface `commands` block); a command instance is an `action` row with `kind=command`. The `event_rule` / `action_rule` config rows live with the [rule families](/architecture/calculations/).

## Model-keyed command cascade

An abstract action (`reboot`) resolves to different concrete commands by component type / model
through the cascade. The edge dispatch and the `command` declaration live in
[templates](/architecture/templates/) / [nodes](/architecture/nodes/); the cascade is the
abstract-to-concrete resolution layer above them, so one `reboot` action targets a heterogeneous
fleet and each device runs the command its model declares.

---

# API

URL: /architecture/api/

The API contract: AIP-style resources and :verb methods, cursor lists, a problem+json error envelope, idempotent writes, and long-running operations carried by the action row.

The contract is **two typed surfaces, one source of truth**. The **public HTTP / OpenAPI contract** (this
page) is the north face: every operator action, every integration, the SPA, the CLI, and the
[MCP](#also-an-mcp-surface) server go through it, and it is the only caller of the
[Storage Gateway](/architecture/storage/). The **internal and edge transport is a sibling NATS subject
contract** (subjects, message schemas, request-reply, JetStream stream and consumer definitions), the
service-to-service and node wire; it is typed and versioned the same way and lives in
[messaging](/architecture/messaging/). This page is the **contract every HTTP route honors**. The doctrine
behind it (the API is the source of truth, the clients are generated from it) and the generation pipeline
live in [API first](/contributing/api-first/); this page is the conventions that doctrine points at.

## Shape: resources and `:verb` methods

Everything lives under `/api/v1`. The path shape is derivable, not special-cased:

- **Plural resource collections**, standard methods by primary key (AIP-style): `POST` creates (409 on
  PK collision), `GET` reads, `PATCH` partial-updates (AIP-134), `DELETE` removes. No upsert shortcuts.
- **Custom methods carry a colon**, `:verb` not `/verb`, for anything that is not CRUD:
  `/alarms/{id}:ack`, `/components/{name}:apply`, `/views/{id}:run`. The verb
  is also the **permission**: `:ack` is gated by `alarm:ack`, so the route and the
  [authorization](/architecture/identity-access/) check share one vocabulary.
- **Singular kind sub-segments** for the typed families: `/rules/calc`, `/datapoints/metric`,
  `/types/component`.

## Lists: filter, order, page

A list takes `filter`, `order_by`, `page_size` (capped by a server maximum), `page_token`, and `fields`:

- **Cursor pagination, never offset.** A list returns a `next_page_token`; the client echoes it on the
  next call. The token is opaque and stable under concurrent inserts, where an offset would skip or
  repeat rows.
- **`filter` is one [Omniglass expression](/architecture/expressions/)** over the resource's fields, the
  same language as rule scopes and dynamic groups, so an operator learns it once.
- **`filter`, `order_by`, and `fields` name fields, not raw SQL.** Every field resolves through the
  gateway's generated-column allow-list (an unknown field is a 400), and values are bound parameters, so
  none of the three can inject SQL ([storage](/architecture/storage/)).
- **Every list runs through the scoped gateway**, so results are already scope-filtered: a list never
  returns a row outside the caller's visible set, and the page count is over visible rows only.

## Partial responses: field masks

The `fields` parameter selects a subset of the response (a read field mask, AIP-157); the default is the
full resource. `PATCH` carries a **write mask implicitly**: only the fields present in the body change,
so a partial update never clobbers an omitted field.

:::caution[Open question]
Field-mask depth: top-level fields only, or nested paths (`a.b.c`), and whether a list's `fields` and a
get's `fields` share one grammar.
:::

## Errors: one problem+json envelope

Every error is **RFC 9457 `application/problem+json`**: `type`, `title`, `status`, `detail`, `instance`,
plus an Omniglass `code` (a stable machine string) and, for validation, a `violations` array of
`{field, message}`. One shape, so the generated client and the CLI render every failure uniformly. The
status mapping:

| Status | Meaning |
|---|---|
| 400 | malformed request (bad JSON, an undeclared param) |
| 401 | unauthenticated |
| 403 | **action denied on this target**: the principal lacks the capability entirely, or can read the target but not perform this action on it (below) |
| 404 | not found, **including out-of-read-scope** (below) |
| 409 | conflict: PK collision, a stale conditional write, or an idempotency replay mismatch |
| 422 | semantic validation (the `:apply` unmet-required-inputs case) |
| 429 | throttled |

**The 403/404 split is three-way, by where the target sits in the caller's
[per-action scope](/architecture/identity-access/).** (a) The action is in **no** grant the principal
holds: **403**, capability missing entirely. (b) The target is in the caller's **read-scope** but outside
`visible_set(P, action)` for the requested action (the principal can `GET` it but cannot `:ack` it):
**403**, which leaks nothing because the caller can already read the row. (c) The target is **outside the
caller's read-scope** entirely: **404**, so the API never discloses that an entity exists outside the
caller's visible set. Out-of-read-scope is the only 404 case; a readable-but-not-actionable target is a
403, never a 404.

## Idempotency and concurrency

- **`Idempotency-Key`** is accepted on `POST` and on state-changing custom methods. The server records
  the key with its **effect** (the created or changed resource) for a retention window; a retry with the
  same key returns the original outcome, not a duplicate, so a flaky network never produces two components
  or a double `:ack`. **Only successful (2xx) outcomes are memoized.** An authorization result
  (401 / 403 / 404) is **never** stored against the key; it is re-evaluated against current grants on every
  call, so a denial recorded before an access change is not re-served, and a success is never replayed
  after a grant is revoked: a replay **re-enters the authorization and gateway path** before the memoized
  effect is returned. Re-evaluation guards the replay, not the original effect, which already committed.
- **Optimistic concurrency**: a conditional update carries the resource version (an `ETag` / `If-Match`);
  a write against a stale version is a 409, never a silent last-writer-wins.

:::caution[Open question]
The idempotency-key retention window, and whether it is uniform or per-method.
:::

## Long-running operations: the action is the handle

Some operations are not instantaneous: a `command` against a device, a reconcile `:enforce`, a
credential rotation, a multi-step flow. These do **not** block the request and do **not** introduce a
parallel `operations` resource. The custom method **returns an [`action`](/architecture/alarms-actions/)
row** (its id and status), the same stateful entity the response layer already uses, and the caller polls
`GET /actions/{id}` through `queued -> sent -> done` / `failed`. The action **is** the operation handle,
so "fire and follow" is one model whether the trigger was a rule or an API call. A fast operation may
inline its result when it finishes within the request, but the handle is always returned, so a slow
device never holds the connection open. The action row is ABAC-owned by its target's exclusive-arc owner,
so polling `GET /actions/{id}` is read-scoped to whoever can see the target, independent of the per-action
scope that launched it.

The HTTP method is the front door; the **dispatch is over NATS**. The command stays HTTP-exposed (returns
the handle, poll `GET /actions/{id}`), but the work is carried on the internal NATS contract: the action
fans out through [messaging](/architecture/messaging/) to the responsible consumer or node, and the result
flows back the same way to advance the row. The caller sees one model, the transport is the bus.

## Writes are audited and scoped

- Every write emits an [`audit_log`](/architecture/audit/) row in the **same transaction** as the
  change, a gateway responsibility, so it cannot be forgotten or bypassed.
- Every route **declares its permission** (checked before the handler runs) and every query **carries the
  caller's scope** (injected by the gateway). Both are [identity and access](/architecture/identity-access/)
  invariants, and the API is the gateway's only caller, so there is no unscoped path.

## Reads beyond one resource are views

A single resource reads through its typed `GET`. Anything richer, a dashboard, an explorer, the cascade
"why did this value win" view, goes through a **[view](/architecture/views/)**: a named query returning a
uniform `ViewResult` (`{columns, rows}`), bound by declared params at `/views/{id}:run`, executed through
the same scoped gateway. Views are part of the public API; an operator never gets raw SQL. A **live** read
(a tile that streams) may upgrade from polling `:run` to a **server-relayed [SSE](/architecture/messaging/)
stream** over the same scoped, permission-gated seam: the subscribe is **capability fast-rejected** at open
(not authorized there), then the server holds the internal subscription and re-runs the gateway scope per
message, filtering by `visible_set(P, read)` against each message's owner and pushing only visible deltas.
The operator never connects to the bus,
so the live path adds no second authorization model.

## Versioning and evolution

The path carries the major version (`/api/v1`). Within a version, change is **additive only**: new
fields, new optional params, new resources, never a removal or a meaning change; a breaking change is a
new major version, not a silent edit. Because the [OpenAPI 3.1 document is generated](/contributing/api-first/)
from the Go structs and the clients are generated from that, the contract cannot drift from the
implementation: a drift check fails the PR if a route changed without regenerating.

## Also an MCP surface

The same OpenAPI document that generates the typed SPA client and the CLI also generates an **MCP
server**, one more [generated client](/contributing/api-first/) over the same gateway, so an AI
[agent](/architecture/ai/) drives the platform through the exact seams a human does: every tool call is
the same route permission, the same gateway scope, the same same-transaction [audit](/architecture/audit/).
It is **not a side channel**.

The binding is mechanical, but the **tool catalog is curated, not a raw one-method-per-tool dump**:
task-oriented tools, the [views](/architecture/views/) exposed as search and query tools (the richest
reads), pagination and the problem+json errors shaped for a model to consume. The MCP server runs under
the **authenticated `human` or `service` principal's** credential
([identity and access](/architecture/identity-access/)), so its reach is exactly that principal's grants,
scoped and audited like any caller ([AI](/architecture/ai/)).

## The node path is the NATS contract

Nodes do **not** speak HTTP. The edge is a NATS client over the WAN: a node publishes telemetry to a
JetStream stream, consumes its commands from a durable server-side JetStream command queue, and is enrolled by a NATS
JWT/nkey, all on the sibling **NATS subject contract**, not this page's routes. The old node HTTP custom
methods (the heartbeat, the telemetry post) are gone; their wire is now subjects and message schemas. The
proto definitions survive **as the NATS message schema**, the typed shape on the bus. That contract,
subjects, request-reply, stream and consumer definitions, JWT-scoped subject permissions, is documented in
[messaging](/architecture/messaging/) and on the [node](/architecture/nodes/) page; the same AIP spirit,
error envelope, and idempotency described here carry across to it (the idempotency key per message, the
problem-shaped reply on request-reply).

## Self-describing

The running server serves `GET /api/v1/openapi.json`, `/openapi.yaml`, and a human reference page, so the
public contract is discoverable live against any deployment, not only in these docs. The internal NATS
subject contract is self-describing the same way: its subjects, message schemas, and stream and consumer
definitions are published from the running server, the sibling of OpenAPI for the bus.

Related: [API first](/contributing/api-first/) (the doctrine and the generation pipeline),
[messaging](/architecture/messaging/) (the sibling NATS subject contract and the bus),
[identity and access](/architecture/identity-access/) (permission + scope), [audit](/architecture/audit/)
(the write-time record), [UI](/architecture/ui/) (the views BFF and the renderer contract), and
[expressions](/architecture/expressions/) (the `filter` language).

---

# Audit

URL: /architecture/audit/

The who-did-what record, written once in the same transaction as the change it describes.

The audit log is how an operator answers "who changed this, and to what?" without trusting memory: every mutation is recorded once, at the source.

## The model

`audit_log` is **ground truth** (not derived): one row per mutation, carrying `actor`, `verb`,
`resource_kind`, `resource_id`, and the `old -> new` diff.

- **Write-time mandatory.** Every API write emits one `audit_log` in the **same transaction**
  as the data write, a storage-layer responsibility, not per-handler discipline, so it cannot
  be forgotten or bypassed.
- **The actor** is resolved by IAM ([identity and access](/architecture/identity-access/)): the
  human, service, or node.
- **An AI-accepted suggestion is one row.** An AI tool acts via OAuth as a `human` or `service`
  principal, so the actor is **that principal**, attributed and audited like any caller; the AI-sourced
  marking rides alongside the row ([AI](/architecture/ai/)).
- **Ground truth a backtest reads.** Operator-driven transitions and config changes are not
  recomputable from collected data, so the audit log is what a rule backtest reads for them: alarm ack and
  snooze ([alarms and actions](/architecture/alarms-actions/)), and every config change a
  reconcile consumes.

## Reads

- **Secret decrypts are always audited and never filterable.** Every read of secret material
  emits an `audit_log` (a credential decrypt), and that subset cannot be filtered away.
- **Other reads are not audited at the storage layer.** Optional read-audit is config-driven at
  the API layer (per-resource opt-in or a verbosity setting), off by default.

:::caution[Open question]
The read-audit granularity: per-resource opt-in versus a global verbosity setting.
:::

## Retention and integrity

Audit carries the **longest retention** of any ground-truth log (compliance), range-partitioned
by `ts` like the others. It is append-only by construction.

:::caution[Open question]
Tamper-evidence (a hash-chain or signed audit) for high-assurance deployments.
:::

## Who consumes it

- **Backtest**: a rule backtest reads operator transitions and config changes from here, since they are not recomputable.
- **Reconcile**: config changes arrive as `audit_log` rows, so reconcile reacts to them.
- **The alarm projection**: ack and snooze come from audit.

---

# Calculations

URL: /architecture/calculations/

The rule families that run server-side over typed datapoints, and calc_rule in detail: cross-key and system-level derivation.

Parsing a raw payload into datapoints is the **edge function** ([collection](/architecture/collection/)), not a server-side rule: a function extracts, keys, and normalizes on the node and emits resolved datapoints. The rules that run server-side over the typed datapoints are two derivation families plus a subscription, and this page is the home of the calc family.

The rule families run as **JetStream consumers on the data lane**: confined datapoints arrive on the NATS **trusted** `datapoints` stream (an admission consumer owner-confines raw ingress first, [messaging](/architecture/messaging/)), and the calc and event families consume them directly from NATS (rules never wait on Postgres). A calc consumer reads datapoints and **publishes** its derived datapoints back onto the trusted stream directly, as a trusted server producer (no admission pass; the calc owner comes from the validated `calc_rule` scope); an event consumer reads datapoints and, on fire, writes the event and alarm transition to Postgres in one transaction (the record lane), which CDC then publishes. The two lanes share the one JetStream bus; see [datapoints](/architecture/datapoints/) for the data lane and [events](/architecture/events/) for the record lane.

## Rules: calc, event, action

- **calc_rule**: datapoints to datapoint (calculated). The subject of this page (below).
- **event_rule**: datapoint change to event. Lives on [events](/architecture/events/) and [alarms and actions](/architecture/alarms-actions/): it carries a required `fire_criteria` and an optional `clear_criteria`, with the fire/clear pair opening and resolving an alarm.
- **action_rule**: a subscription wiring events and alarms to actions. Lives on [alarms and actions](/architecture/alarms-actions/).

An alarm is not produced by a different rule; it is an event rule whose events are paired (open, close), so there is no `alarm_rule` and no `condition_rule`. Ownership for a templated function is stamped at the edge (the component is known); shared-interface ingress is owner-bound server-side. A **`discovery_rule`** (observed data creates entities) rounds out the family; see the spine's rules section.

## calc_rule: cross-key and system-level derivation

A **calc_rule** runs as a calc consumer: it reads datapoints from the data lane (NATS) and publishes a datapoint back onto the trusted stream (provenance **calculated**, a trusted producer with no admission pass), where downstream calc and event consumers see it like any other datapoint. It owns inputs, a reduce (worst / majority / average / Expr), an output key, and a scope. It is for **cross-key** and **system-level** derivation: a 5-minute average, a system rollup, `room.in_use` derived from display power + codec call-state + occupancy. Same-key multi-source reconcile is the key's `fusion_policy`, not a calc (see [Fusion](/architecture/datapoints/#fusion)).

The calculated value it writes is parallel to observed: both are machine-derived, distinguished by the **`provenance` column**, both carrying `source_rule` + `source_rule_version` on the row. See [calculated](/architecture/datapoints/#calculated-derived-by-a-calc-rule) for how the row records its lineage.

Calc folds **every** instance of an input key into the reduce: a rule reading `fan.speed` from a component gets one candidate per fan, so `worst` / `average` / `count` / Expr aggregate across all of them. Calc **outputs** stay aggregate (`instance = ''`); per-instance outputs (one health per fan, a group-by) are a separate capability, output owners default to the singleton. See [the instance dimension](/architecture/datapoints/#the-instance-dimension-many-values-of-one-key-on-one-owner) for the full instance model.

Calc is one half of [fusion](/architecture/datapoints/#fusion): cross-key / system-level fusion is the only fusion that authors a rule (a `calc_rule`), deriving a higher-order fact (a new key) rather than reconciling one key across sources.

## The DAG invariant

Calc rules read observed and calculated values as truth; they never treat an intended value as truth to infer a new fact. That is what keeps the pipeline acyclic. The invariant is stated in full on [datapoints](/architecture/datapoints/#the-dag-invariant).

## Storage

The three rule families share one config shape, versioned so a backtest can pin the rule version; the physical layout lives on [storage](/architecture/storage/).

| Table | Key columns | Notes |
|---|---|---|
| `calc_rule` / `event_rule` / `action_rule` | **(id, version)**, scope, spec (jsonb: Expr + params) | config, named for function; versioned so a backtest can pin the rule version. `calc_rule` = cross-key/system-level derivation; `event_rule` = fire_criteria + optional clear_criteria ([events](/architecture/events/), [alarms and actions](/architecture/alarms-actions/)); `action_rule` = a subscription (an Expr predicate over events). Parsing is the edge function, not a rule; `discovery_rule` rounds out the family |

Related: [datapoints](/architecture/datapoints/) (the data model calc reads and writes), [events](/architecture/events/) (the `event_rule`), [alarms and actions](/architecture/alarms-actions/) (the `action_rule` and the response layer), and [the glossary](/architecture/glossary/).

---

# Cascade

URL: /architecture/cascade/

How effective settings (config, variables, tags, rule-sets) are resolved for any entity and how the resolve view explains why a given value won.

The cascade lets an operator set a value once, high up, and have it apply everywhere below while still being overridable on any one entity, and then explain exactly why a given value won. It resolves the effective settings (config, variables, tags, rule-sets) for any entity.

## What it resolves

The effective **config and variables** ([config and credentials](/architecture/variables/)), **tags**, and **rule-sets** for any entity. A
first-class **resolve view** explains every effective value: the winning source
and what it shadowed. The order is deliberately hand-tuned (not derivable from a
single rule), so the resolve view is the safety net, not an afterthought.

## User stories

What the cascade is for, in operator terms:

- **Set once, override where it matters.** I set the standard poll interval on the
  Room Kit template, but HQ needs its devices on a different credential, so I set it
  at the HQ location and it overrides the template for everything there, no template
  edit; a room three levels down just inherits the nearest setting. *(structural
  chain: deployment beats template defaults; deepest location wins)*
- **Composition tightens the part.** My "Standard Huddle Room" template polls codecs
  every 30s, tighter than the codec's own 60s default, and every codec placed in a
  huddle room picks that up. *(system_template beats component_template)*
- **"Why did it get this?"** RM204 polls every 5 minutes and I don't know why; the
  resolve view shows it is the "Old-firmware Room Kits" group (weight 450) shadowing
  the template's 30s. *(the effective-config resolve view)*

The cross-cutting cases (a fleet-wide fix that auto-clears, a broad policy as a floor, a hand-picked
set) are the [group](/architecture/groups/) stories, where the group is placed by weight on this same
scale.

## The structural chain

General defaults to specific deployment; most-specific (deepest) wins:

```text
global                fixed
component_template    the leaf entity's template defaults
system_template       the leaf's owning-system template defaults
location tree         campus -> building -> floor -> room      (deepest wins)
system tree           parent system -> subsystem -> ...         (deepest wins)
component tree         chassis -> card -> ...                    (deepest wins, = the leaf)
```

- Resolution runs **for the leaf** over its containment path; a non-leaf entity
  (a chassis, a floor) resolves over its own shorter subset.
- **Philosophy: deployment beats type/template defaults.** HQ's location credential
  overrides the Room Kit template default; "Standard Huddle Room" overrides the
  bare Room Kit default.
- **Templates are the leaf's base.** Ancestor nodes in the trees contribute their
  **instance** bindings, not their templates (a chassis hands a card its
  chassis-wide credential; the card keeps its own template).
- The three structural segments are **variable-depth trees** (parent-reference
  nesting, arbitrary depth); the deepest node wins. Weight-free, pure depth.

## Combinators (by what is resolved)

- **config / variables -> scalar override**: the deepest/highest source wins; one value.
- **tags -> union on name, override on value**: names accumulate; for a given
  name, the winning source's value wins.
- **rules** (`calc_rule` / `event_rule`) -> **additive
  accumulation + explicit suppression**: a leaf is governed by the union of rules
  from every layer; a layer removes one by name with a suppression.

## Groups overlay the cascade, placed by weight

The structural tree handles config by position and kind. **[Groups](/architecture/groups/)** handle
config by attribute or by a hand-picked set, cutting across the tree. The cascade does not define
groups (see [groups](/architecture/groups/) for the primitive); it consumes them by **weight** on the
one specificity scale.

**One specificity scale.** Structural layers auto-derive a specificity from position, weight-free, the
operator never tunes it: `global` lowest, then the templates, then the location / system / component
trees by depth, then the entity's own **instance** at the ceiling. A **group's weight is its
specificity on that same scale**, so a group sits wherever its weight lands relative to the structural
bands: a high weight beats deployment (a must-apply override), a low weight loses to it (a default that
deployment overrides). The instance ceiling beats any group; equal specificity breaks by creation
order. A typed group applies at its own level: a component-group to components directly; a system-group
reaches a component **through the system layer** of its cascade.

So however many groups an entity is in, the group band collapses to one weighted list on the shared
scale, fully predictable, and the resolve view names the winner.

**The comparison key is segmented, not a single number.** Precedence is a lexicographic key
`(segment_rank, depth, group_weight, creation_order)`, compared field by field in that order. The
`segment_rank` orders the structural bands (global, templates, location / system / component trees, the
instance ceiling); `depth` orders within a variable-depth tree (deepest wins); `group_weight` places a
group relative to the structural bands; `creation_order` breaks an exact tie. Because the segment is the
first field, a structural segment never overruns into another regardless of how deep a tree runs or how
many group weights stack: a deeper node or a heavier group raises a later field, never the leading
`segment_rank`. The single specificity numbers shown elsewhere on this page (e.g. `0`, `100`, the `300s`,
the `400s`) are a **presentation-only** flattening of that key for the resolve view, not the comparison
key itself.

A **`_type`** (device/app, AV-System, room) is not a cascade layer: it is a classification attribute,
resolved by a [group](/architecture/groups/) filter (`type == X`) placed by weight, never a tree
position. The tree is structural; attributes are groups.

## The registry is outside the cascade

`datapoint_type` defines **identity** (kind, unit, validation, fusion_policy)
for every datapoint key, which the cascade never overrides (policy, not ontology).
Ship-with **default policy** lives at `global`, the floor of the chain.

## Structural multi-membership (a component in N systems)

Distinct from group membership. On the resolution side, the **primary-system
pointer** is the single system chain that feeds the cascade for a component that
sits in more than one system; the genuine "config differs per system" case is
answered by **per-system effective views** computed on demand, not by merging
chains into the resolution. See [core entities](/architecture/core-entities/) for
the membership model.

## The resolve view

For a target entity and a key, return:

- the **effective value**;
- the **winning source** (a tree node, or a group + its weight);
- the **ordered shadowed bindings** it beat (source + value).

For rule-sets: the accumulated set, each rule tagged with its source and any
**suppressions**. One view explains both override (variables / tags) and accumulation
(rules).

## Worked example

RM204 codec (Room Kit Pro, fw 11.2) at Room RM204 -> Floor 3 -> HQ Building ->
HQ Campus, in the Huddle Room AV system. Member of two groups: **Old-firmware
Room Kits** (weight 450) and **PCI-scope** (weight 250). Specificity bands are
illustrative: `0` global, `100/200` templates, `300s` location by depth, `400s`
system by depth, `500` the instance. These single numbers are a presentation
flattening of the segmented key `(segment_rank, depth, group_weight, creation_order)`,
not the comparison itself.

```text
RM204 - cascade precedence            most-specific (highest) wins
===================================================================
 spec  source                              poll_interval   credential
 ----  ----------------------------------  -------------   ----------
 500   component RM204  (explicit)          -               -          <- ceiling
 450   group: Old-firmware Room Kits        5min  *         -
 440   system: Huddle Room AV system        -               -
 340   location: Room RM204                 -               -
 330   location: Floor 3                    -               vault-B  *
 320   location: HQ Building                -               -
 310   location: HQ Campus                  -               vault-A
 250   group: PCI-scope                     -               vault-C
 200   system_template: Std Huddle Room     -               -
 100   component_template: Room Kit Pro     30s             -
   0   global                               60s             -
===================================================================
 effective:  poll_interval = 5min    (group 450; shadowed template 30s, global 60s)
             credential    = vault-B (location Floor 3 @330; shadowed PCI-scope @250, Campus @310)
```

The two columns are the point of the shared scale:

- **`poll_interval`**: the **Old-firmware group (450)** sits *above* deployment, so
  its `5min` workaround beats the template / global defaults, what a fleet-wide bug
  fix needs.
- **`credential`**: the **PCI-scope group (250)** sits *below* deployment, so the
  specific **Floor 3 (330)** setting beats it, the case a fixed band would get
  wrong.

`component RM204 (500)` would top everything if it set a value directly. Additive
rules accumulate down this same ladder, and a group can **suppress** one by name
(the Old-firmware group suppresses the false-firing `high_memory` alarm).

## Resolution, in one line

Build the entity's ordered layer path, place matching groups on it by weight, fold
variables (override) / tags (union + override) / rules (additive + suppress) down the
combined specificity order, and emit effective values with provenance.

---

# Data collection

URL: /architecture/collection/

Data collection is built from functions: a versioned component template declares interfaces and a set of functions, each a trigger plus a DAG of steps that runs at the edge and parses on the spot.

Collection is built from **functions**. A versioned `ComponentTemplate` declares how to reach a
class of device (its interfaces) and a set of **functions**, each a discrete unit of device logic
that runs on the edge node, reaches the device over an interface, and parses the answer right
there into datapoints. One authoring schema covers everything from a single SNMP read to a
multi-step, cross-interface, branching procedure.

## Model overview

Three strongly decoupled levels, plus a typed parameter surface and template metadata:

```text
ComponentTemplate (apiVersion, kind, metadata.labels)
  inputs       typed parameters; required = the :apply gate
  interfaces   connections, declared once, may be persistent/stateful, own liveness
  functions    each = one trigger + a DAG of steps
    steps      typed operations (kind), gated by an interface, schema-validated
```

- **Authoring compiles to a runtime unit.** The hand-authored template is the contract. A
  compiler lowers it to the per-node execution unit the node runs: it resolves inputs and
  variables, validates the DAG, and bakes each datapoint's `kind` into the unit so the edge
  routes to the right table with no runtime registry lookup, the kind riding the published
  datapoint. The runtime unit is internal, never hand-authored.
- **Edge-local execution.** A function runs per component on the node, in one tick, with zero
  server round trips: every interface sits on the one component, all reachable from the node,
  so a step can branch on a value a prior step just collected, straight from node memory.
- **Two data planes, split by access pattern.** Timeseries [datapoints](/architecture/datapoints/)
  (observed and calculated) are append-heavy and history-bearing. Current-value config and
  credentials live in the separate [config and credentials](/architecture/variables/) store (sargable
  point-lookups); config is keyed to a datapoint as its observed side.
- **Kubernetes-style versioning.** `apiVersion: collection.omniglass.dev/v1alpha1` plus a
  `kind` (`ComponentTemplate`, with `SystemTemplate` / `LocationTemplate` reserved in the same
  apiVersion). The parser gates on `apiVersion` and converts older versions forward.

A **function** is the device-level unit. The platform-level workflow that *responds* to data,
the thing that opens tickets, notifies, and orchestrates, is a [flow](/architecture/alarms-actions/);
a flow can call a function, but the two live at different layers.

## Interfaces: connections, declared once

A top-level `interfaces` array, each a named connection. The connection is **decoupled from the
work**: a function's steps reference an interface by `id`; the interface owns the connection, not
the step. Declaring it once removes per-step duplication, and the decoupling lets a
**persistent session outlive any single function run**, so subscriptions and inbound streams
attach to a connection established once.

```yaml
interfaces:
  - id: snmp
    type: snmp                     # interface_type registry entry (param schema)
    host: ${input.ip}              # references INPUTS, not $var: directly (see Inputs)
    version: "2c"
    auth: ${input.snmp}            # snmp_community shape; community field is secret (masked, audited)
    liveness: { oid: 1.3.6.1.2.1.1.3.0 }   # reachability gate, per interface
  - id: cli
    type: ssh
    host: ${input.ip}
    credentialRef: ${input.ssh}    # ssh_credential shape, bound to a $var: at apply
    persistent: true               # stateful session, outlives function runs
```

- **Type is an `interface_type` registry entry**: the registry knows which protocol adapters exist
  and carries each one's connection-param schema. It covers `snmp`, `http`, `ssh`, `telnet`, `tcp`,
  `icmp`, `webhook`, `mqtt`, `syslog`, and `websocket`. The per-type schema is registry-driven, so
  config lints against exactly the adapter the registry holds.
- **`liveness`** is the per-interface reachability gate; it decides whether the interface's
  functions run. See [nodes](/architecture/nodes/).
- **`persistent: true`** keeps a session open across function runs (interface lifecycle contains
  a function run, which contains a step). Scheduled functions borrow it to send; listen functions
  wake on its inbound.
- **Codec and framing.** Raw-TCP AV control planes wrap payloads non-trivially (line
  terminators, length prefixes, NUL framing, JSON-RPC or TTP envelopes). An interface carries
  encode/decode controls that lock raw to shape: the codec frames outbound payloads and parses
  inbound ones to the declared envelope, so a step sees structured content, not wire bytes.
- **Node placement is not declared here.** It is server-assigned from the component's location.

## Functions: a trigger plus a step DAG

A top-level `functions` array. Each **function** is one trigger and a DAG of steps, from trivial
(one SNMP step reading 20 OIDs) to a multi-step branching procedure. A function is a discrete
unit of device logic: it does one thing to or for a component.

A function's **trigger** is one of three kinds, and the three unify what used to be separate
primitives, a poller, a listener, and a command:

| Kind | Fires when | This is |
|---|---|---|
| `schedule` | an interval elapses, or `onStart` once when the interface comes up | a poll (and `onStart` arms subscriptions) |
| `listen` | inbound data arrives on a `source` (`webhook` / `trap` / `syslog`, or `subscribe` / `stderr` / `session-line` on a persistent interface) | a listener for pushed data |
| `command` | invoked on demand, by an operator or by a [flow](/architecture/alarms-actions/) | an action you run against the device (`reboot`, `set-input`) |

A `command` function takes typed `args` and is the imperative path: it is how the platform *acts*
on a device, and how a reconcile pushes a declared config back (the **set** function, see [config](/architecture/variables/)).
A function has exactly one trigger. `triggers` is modeled as a list to admit the multi-trigger
case, a scheduled function that is also command-invocable for a targeted refetch.

### Two axes: task mode and interface transport

A **task** is a node's unit of collection work. Two independent axes describe it, and keeping them
separate is what keeps the model clean.

- **Task mode** (a property of the task): **poll** (we ask for each datum) or **listen** (we wait
  for it to arrive). Stated from *our* perspective on purpose: "pull/push" inverts depending on
  whose frame you take, because the component pushes exactly when we pull. `poll` and `listen` are
  verbs *we* perform.
- **Transport** (a property of the interface): **stateless** (a throwaway connection per shot) or
  **stateful** (a held-open connection, which becomes a `session` and emits `session_log` rows for
  connect/auth/drop/reconnect).

These are orthogonal. All four cells are real:

| | **poll** (we ask) | **listen** (we wait) |
|---|---|---|
| **stateless** | SNMP get, HTTP GET | webhook, SNMP trap, syslog |
| **stateful** | SSH-exec or xAPI `xStatus` on a held session | MQTT subscribe, xAPI feedback |

Waiting for a frame is a single mode (**listen**) regardless of transport; a held-open connection
is a property of the interface, not a separate mode. So there are two task modes, and statefulness
lives on the interface.

**Native push.** First-class data pushed by smart senders (control-system programmers instrumenting
directly) is self-describing (it carries its key), so its edge parse is a near-identity
pass-through, marked `shape=native`. As with any function, a failed parse keeps the raw on a
`collection.failed` event.

## Built interface types and their config

The poll types and listeners in the `interface_type` registry and the operator config they read.
The node translates each stored task + interface into a poller the collection engine runs; how the
node *executes* these (tick scheduling, reachability gating, the task queue) is [nodes](/architecture/nodes/).

### Built poll protocols and their config

| interface type | shape | host/target | per-task params | datapoints |
|---|---|---|---|---|
| `icmp` | inline probe | `task.params.target` | `count`, `timeout` | `icmp.reachable`, `icmp.rtt_avg` (fixed) |
| `tcp` | inline probe | `task.params.target` (`host:port`) | `timeout` | `tcp.open`, `tcp.connect_time` (fixed) |
| `snmp` | held connection | `interface.endpoint` (`host[:port]`, port defaults 161) | `task.params.oids` (comma-separated `name=oid`); `interface.params.version` (default `2c`), `interface.params.community` | one datapoint per OID, `name` = the datapoint key |
| `http` | held connection | `interface.endpoint` (base URL) | `task.params.path` (joined onto the base URL), `method` (default `GET`), `timeout` (default `5s`), `body`, `extract` (comma-separated `name=json:<dot.path>`); `interface.params.header_*` (request headers, prefix stripped) | `http.reachable`, `http.status_code`, `http.response_time` (fixed) + one per `extract` entry |
| `raw-tcp` | held connection | `interface.endpoint` (`host:port`) | `task.params.command` (sent verbatim + line ending), `timeout`, `extract` (comma-separated `name=re:<pattern>`); `interface.params.line_ending` (default `\r\n`), `read_delim` (default `\n`), `connect_timeout`, `read_timeout` | `rawtcp.reachable`, `rawtcp.response_time` (fixed) + one per `extract` entry |
| `telnet` | held connection | `interface.endpoint` (`host:port`) | as `raw-tcp`, plus `interface.params.username`/`password` (drive the default `login:` / `Password:` chain; `login_expect`/`password_expect` override the prompts) | `telnet.reachable`, `telnet.response_time` (fixed) + one per `extract` entry |
| `ssh` | held connection | `interface.endpoint` (`host:port`) | as `raw-tcp` (the command runs as a one-shot `exec`), plus `interface.params.username` and `password` and/or `private_key` (inline PEM) | `ssh.reachable`, `ssh.response_time` (fixed) + one per `extract` entry |

`icmp`/`tcp` are inline probes (the target rides the task); `snmp`, `http`, and
the text transports (`raw-tcp`/`telnet`/`ssh`) are held connections, so the
connection (host/port/version/community for snmp, base URL + headers for http,
address + framing + auth for the text family) lives on the interface and the task
names what to read.

Every fixed built-in name (`icmp.reachable`/`icmp.rtt_avg`, `tcp.open`/
`tcp.connect_time`, `udp.open`, `snmp.reachable`, `http.reachable`/
`http.status_code`/`http.response_time`, and `<proto>.reachable`/`<proto>.response_time`
for the text family) is a **registered canonical `datapoint_type`** in the ship-with registry,
so probe/liveness results persist as datapoints, not only as raw wire
bytes. They are owner-agnostic measurements like any other: unregistered,
reject-not-project would drop them at ingest. `registry.seed_validation_test`'s
`liveness_builtins_present` locks the registry to exactly the names the node
emits, so a rename on either side fails the build instead of silently going
un-derived.

For `snmp`, each OID is carried in its **native SNMP type**: numeric OIDs as
numbers, string OIDs (OctetString / IPAddress / OID) as text, so a string-valued
OID (an enum or label) lands as a `state` datapoint and a numeric one as
`metric`. The owning table is decided at ingest from the key's `datapoint_type`
kind. Per-OID declared typing and richer collection specs live on the component
template (the template declares the OID set, demoting `task.params.oids` to an
override). SNMP runs v2c with a plaintext community or v3 with auth/priv; the
community resolves from the interface params directly or through an `auth_secret`
credential.

Every extract spec (`oids`, the http/text `extract`) shares one name grammar: a
name may carry a trailing **`key[instance]`** suffix to distinguish several values
of the *same* canonical key on one owner (`fan.speed[intake]=<oid>`,
`fan.speed[exhaust]=<oid2>`). The bracket is stripped into the datapoint's
reserved `instance` label, so the canonical registry still matches the bare key
and the value lands in the `instance` column ([the instance dimension](/architecture/datapoints/#the-instance-dimension-many-values-of-one-key-on-one-owner)). A name without a bracket is a singleton (`instance = ''`).

For `http`, `http.reachable` is `1` whenever the request completes a round trip
(`0` on a transport failure: DNS, refused, timeout, TLS), and `http.status_code`
carries the HTTP status separately, so reachability and a `>= 500` status are
distinct alarm signals (a non-2xx response is still reachable). `extract` pulls
values from a JSON body by dot-path (`name=json:data.0.temp`): a number or bool
leaf becomes a `metric`, a string leaf a `state`; a missing path, a
container/`null` leaf, or an unreachable endpoint yields no datapoint. Auth rides
as `header_*` interface params (e.g. `header_authorization: Bearer ...`), resolved
from a plaintext param or an `auth_secret` credential. Carry auth in
`header_*`, never in the URL or body: the request `body` param is **not** stamped
as a datapoint label, and the `target` label is the request URL with its query
string (and any userinfo) stripped, so a token placed in the path query does not
leak into attributes (but is still a bad idea). `method`/`body` support POST/PUT.

For the **text family** (`raw-tcp`/`telnet`/`ssh`), the poll is one ephemeral
round trip: connect, optionally authenticate, send `task.params.command` followed
by the line ending, read the reply (to the `read_delim` for raw-tcp/telnet, to
EOF for ssh's `exec`, bounded by `read_timeout`), extract, close. `<proto>.reachable`
is `1` once the transport opened and the command round-tripped (`0` on a transport
failure: refused, timeout, or rejected credentials, which are connection health, not
errors), and `<proto>.response_time` is absent when unreachable. `extract` pulls
values by **regex named capture** (`name=re:<pattern>`, parallel to http's
`json:`): each named group routes to the datapoint of the same name, or to the lone
datapoint when the pattern has exactly one group; a captured value that parses as a
number becomes a `metric`, otherwise a `state`; a non-matching pattern (or an
unreachable endpoint) yields no datapoint, while a pattern that fails to compile is
a configuration error. Auth resolves from interface params (telnet/ssh `password`,
ssh inline `private_key`) or an `auth_secret` credential, the same posture as snmp's
community and http's `header_*`, and ssh pins the host key. Credentials live on the
interface and are never labelled; the `target` label is the command. The transport is
swappable behind one boundary, so a `raw-udp` request/response poll (datagram in,
reply out) slots in as a fourth kind without new machinery; UDP **listen**
(unsolicited inbound: syslog, snmp-trap) is a different shape and belongs to the
listener runtime. A held session (the stateful transport) carries the same text
family over a persistent connection, with multi-line prompt-expect beyond the first
delimiter, command echo handling, and Q-SYS-style frame/checksum framing; ssh runs
its commands as a one-shot `exec`.

### Built listeners and their config

A **listener** is inbound: rather than us polling, **we wait for pushed data**
(`mode: listen`). That data can arrive several ways, a webhook POST, an
MQTT/subscribe stream, an SNMP trap or syslog line, or a line on a held stateful
session; a webhook is one transport, not the definition. A `webhook` listener is
**server-hosted**: `placement: central` makes the server the
endpoint for inbound external webhooks, so a webhook listen-task is **server-executed
and unassigned** (`node_name IS NULL`); the server's `POST /webhooks/{path}` route is
its runtime, not a node tick.

| field | where | meaning |
|---|---|---|
| `path` | `interface.params.path` | the opaque, unguessable token in the inbound URL (`/webhooks/{path}`); a bearer locator, not the interface name |
| `secret` | `interface.params.secret` | shared secret the sender presents in the `X-Omniglass-Token` header (or `?token=`), constant-time compared |
| `component` | `interface.component` | when set, datapoints pre-bind to that component (trivial owner); when empty, shared-interface ingress is owner-bound server-side by labels |
| `extract` | `task.params.extract` | comma-separated `name=json:dot.path`; number/bool -> metric, string -> state (same extractor as the http poller) |
| `raw_log` | `task.params.raw_log` | optional key to store the whole raw frame under (as JSON when the body parses, else text), the holding-pen an event_rule can later promote |

One or more `mode: listen` tasks bind to a webhook interface; each inbound POST
runs every enabled one, parsing its points under that task's id and **publishing**
them to the JetStream [datapoints](/architecture/datapoints/) stream, the same data
lane the node publishes to (so owner attribution resolves server-side, and the rule
engine and calc rollups react from the stream). The ingest is **owner-confined by the
admission consumer** against the interface's **declared owner**, keyed off the trusted
server-set `interface` label (not a payload claim), so a leaked path secret can publish
only under that interface's owner, never an arbitrary one
([identity and access](/architecture/identity-access/)).

**Response contract** (webhook senders retry on non-2xx): **202** = durably
accepted; **401** bad/absent secret, **404** unknown path, **413** body over the
1 MiB cap (4xx = sender fault, don't retry); **5xx** = our fault, please retry. A
`GET`/`HEAD` to the path answers the endpoint-verification ping some providers
send, echoing a `?challenge=` value. The body cap, JSON-only parsing, and
"non-JSON body makes declared extractions absent (not an error)" mirror the http
poller.

**Auth and spoofing**: the shared secret resolves from a plaintext `interface.params`
value or an `auth_secret` credential (same posture as snmp's community), and the
sender may instead present an HMAC signature verified behind the auth seam. The route
stamps a trusted, server-set `interface` label on every datapoint and copies body
fields into attributes **only** via the declared `extract` set, so a body field cannot
impersonate another interface; shared-interface ingress should scope on
`event.labels.interface`, and per-component interfaces (server-assigned owner)
are preferred for high-trust sources. A listener also runs node-hosted for
LAN-local sources, with idempotency/dedup and form-encoded bodies.

## Steps: the DAG

A step is a typed **operation**: a `kind` (the operation it performs) that runs on a referenced
`interface` and, for a read, produces datapoints through a typed extractor.

```yaml
steps:
  - id: poll
    interface: snmp
    kind: snmp.get                  # gated by the interface type; schema-validated
    when: "$dp.power.state == 'on'" # optional guard = explicit branch
    datapoints:
      oid:
        - { key: cpu.utilization, oid: 1.3.6.1.4.1.55540.2.1.0, value: "raw / 100.0" }
```

- **Dependencies are data references, not array order.** A step reads `$steps.<id>.*`
  (ephemeral scratch: a session id, a token, a list element, never emitted) or `$dp.<key>` (a
  real measurement, emitted and readable for branching). The set of references *is* the DAG;
  array order is cosmetic, so a function editor can round-trip the graph.
- **`when`** is the explicit branch: an expression guard over the in-scope context. A false
  guard skips the step and its dependents.
- **`forEach`** is the step-level fan-out: a step iterates a located collection, the element
  bound as `$steps.<id>.item`, and downstream steps run per element (a list-then-detail chain).
  Distinct from an extractor's `each`, which fans one response's array into many datapoints
  inside a single step.
- **`kind` is interface-gated and registry-driven.** Valid kinds depend on the target
  `interface_type` (`snmp.get`, `snmp.walk`, `http.request`, `ssh.send`, `ssh.subscribe`, the
  interface-agnostic `extract` and `blend`). Each kind's param schema lives in the registry,
  one entry per adapter.

In-scope reference namespaces within a function run: `$var:<key>` (config and secret values,
resolved through the [cascade](/architecture/cascade/)), `$dp.<key>` (datapoints), `$steps.<id>.*`
(ephemeral scratch), `$event` (the inbound payload of a `listen` function), and the
extractor-local inputs a step prepares for its `value` leaf (`raw`, `groups`, `node`, `item`).

### Extractors: locate, then optionally transform

Each extractor is a typed section that locates a raw value with its protocol-specific field,
then optionally transforms it with a single [Expr](/architecture/expressions/) expression in
`value` (default identity).

**One interpolation convention.** Wherever a config, label, or template field could hold either a
computed value or a fixed one, an **interpolated** value (an expression evaluated against the
in-scope context) is wrapped `${...}`, and a **literal** is a bare string. So `${node.index}` reads
the current element's index, while `"main-display"` is the literal text. The `value` leaf is always
an Expr expression by definition, so it needs no wrapper.

```yaml
datapoints:
  oid:
    - { key: device.uptime, oid: 1.3.6.1.2.1.1.3.0, value: "raw / 100.0" }  # centiseconds to seconds
  regex:
    - { key: fan.speed, match: 'fan \(rpm\)\s*:\s*(\d+)', value: "int(groups[1])" }
  jsonpath:
    - { key: channel.gain, each: $.channels[*], value: "node.gain",
        labels: { channel: ${node.index}, name: ${node.name}, role: "main-display" } }
```

The extractor names a `key`. What that key *means* (kind, value type, unit, validation,
fusion) lives on the [`datapoint_type`](/architecture/datapoints/#the-datapoint_type-registry) registry at some
[scope](/architecture/datapoints/#key-scope-template-org-official): a template declares its own keys at
**template** scope (no registry friction), or references an **org** / **official** key. Compile-time
validation resolves every key to a reachable scope (template keys self-resolve; referenced org/official
keys must exist); an unresolved key is reject-not-project at ingest, so a template never silently
collects a measurement no scope knows.

## Inputs: the template's typed parameters

A template takes typed `inputs`: shape-typed parameters it references internally, never a
hardcoded `$var:`. That is the decoupling, a template needs an `ssh_credential`, not specifically
`$var:crestron.ssh`. At `:apply` each input is **bound** to a value, either a literal or a
[variable](/architecture/variables/) reference (`$var:<name>`), with an optional default the
template ships. Required inputs are the apply gate; the UI renders the form.

```yaml
inputs:
  - group: connectivity
    fields:
      - { key: ip,   type: ipv4,           required: true, label: "IPv4 address" }
  - group: auth
    fields:
      - { key: snmp, type: snmp_community, required: true, label: "SNMP community" }
      - { key: ssh,  type: ssh_credential, default: $var:crestron.ssh, label: "SSH login" }
```

The template body references `${input.snmp}` / `${input.ssh}`; the bindings resolve at apply
and are overridable per component. So `$var:` lives at the **binding layer** (apply, and input
defaults), not scattered through the template body, and the template stays reusable with any
value of the right shape. Each input `type` is a `variable_type`, so per-field secrecy comes
from the shape.

## Execution: parse at the edge

A function runs the parse at the **edge**, not server-side:

- **Function steps parse, extract, and normalize on the node** and **publish** resolved
  datapoints to the JetStream [datapoints](/architecture/datapoints/) stream (the data lane),
  not to the typed tables directly. The node is a NATS client publishing observed datapoints;
  a [persistence consumer](/architecture/datapoints/) batch-writes them to the typed tables as
  an async sink, idempotent on `(series, ts)`, while the [rule engine](/architecture/alarms-actions/)
  consumes the same stream live. The compiler still bakes each datapoint's `kind` into the
  runtime unit, so the routing to `metric_datapoint` versus `state_datapoint` is decided at the
  edge with no runtime registry lookup, and rides on the published message.
- **Raw payloads are not stored**, the datapoint is the source: a dev raw-mode taps the wire bytes
  live while developing, and a parse or validation failure emits a `collection.failed` event
  carrying the raw. There is no telemetry table.
- **Owner attribution:** a single-owner function lands its datapoints on its own component,
  identity stamped at the edge (the component is known, the function runs for it). A function
  that reports for many devices (a management platform) publishes datapoints for multiple owners,
  resolved server-side from the emitted identity labels (below).
- **Placement-scoped writes.** A node publishes only the owners in its **placement visible_set**
  (the owners of the tasks assigned to it). That visible_set expresses as **NATS subject
  permissions** on the node's account, the `node` gateway mode in
  [identity and access](/architecture/identity-access/). At ingest, an emitted owner label
  **outside** that visible_set is **never an authoritative write**: it is treated as an
  **orphan / discovery candidate** and feeds the `discovery_rule` stream (below), so a
  compromised node cannot manufacture writes for owners it was never placed on. The
  perspectives / `disagree` model is the backstop for the other case, a legitimately-placed but
  compromised node reporting bad values for owners it **does** cover; bounding the visible_set
  and corroborating across perspectives are complementary, not the same defense.
- Because parsing is the edge step, there is **no separately authored transform rule**. Routing
  is the template's fan-out, and cross-entity rollups are [calc](/architecture/calculations/)
  datapoints on system and location templates. The server-side work that remains is
  shared-interface owner-binding and untemplated raw ingress.

### Raw sampling: an opt-in re-parsable window

The default is that raw payloads are not retained. An opt-in **`raw_sample`** policy keeps a
bounded window of raw frames so a corrected extractor can re-derive its datapoints over that
window, without reintroducing a telemetry table.

`raw_sample` is **cascade-resolved**, settable on an interface, a task, or a template, and
resolves to one of three values:

- **`off`** (default): no raw retained.
- **`all`**: every frame the matched task collects is buffered.
- **`1-in-N`**: one frame in every N is buffered (sampled), bounding volume on a high-cadence
  source.

The kept frames carry the **immutable function version** that parsed them, so the buffer is
**re-parsable against that exact version**: a corrected extractor re-runs over the retained
window and re-derives the datapoints, retroactively correcting them. The residual is stated
honestly. **Outside** the kept window a wrong-but-conforming parse (one that produced a valid
datapoint from a misread frame) is **forward-fixable only**: the fix applies to new collection,
the already-parsed history is not retroactively corrected because the raw is gone.

The buffer preserves the no-telemetry-table economics: off by default, bounded, sampled, and
short-lived. It is a short-TTL holding pen, range-partitioned and cold-tierable like the
[metric](/architecture/datapoints/) partitions and the [storage](/architecture/storage/) layout
describe, not a parallel history of record.

## Shared-API collection: one component, many owners

Some sources describe **many entities at once**: a SaaS / UCC platform (Zoom, Teams), a controller
fronting many devices, a building gateway. Modeling each described entity as its own component is the
legacy-platform reflex. Here the API is **one component** (one interface, one credential) and its data
**fans out** to the entities it describes.

- The API component's function pulls the batch (all rooms, all devices) in one call and **labels each
  emitted datapoint with the external identity** it belongs to (a Zoom Room ID).
- The function does **not** stamp the owner, it is the conduit, not the owner. Ownership is **resolved
  server-side**: the identity is matched against a declared **identity config** (`zoom.room_id` on the
  target) and the datapoint is bound to that entity. This is the same shared-ingress owner-binding the
  model uses for webhooks and traps; a pull-side batch is the same shape.
- **The owner can be a system, not only a component.** SaaS state that is telemetry *of* a room (no
  physical device) maps to **system-owned datapoints** directly. Reserve a virtual component for the
  genuine *member* case (its own node in the topology, a `health_role`, a lifecycle). Rule of thumb:
  **member -> component, telemetry -> system.**
- **Unmatched identities are orphans**, a discovery candidate. The `discovery_rule` is the
  onboarding win: point it at the API and it auto-creates the entities and sets their identity, so you
  never hand-map.

**Best practice.** Map SaaS / cloud telemetry to **system-owned datapoints**, and **wire it into
system health** with a system-scoped `event_rule`. Treat the vendor's own status as an **input to that
judgment, not the verdict**: a UCC platform reporting "offline" is one source's opinion, so corroborate
it (against the codec, occupancy) before downing the room. See [health](/architecture/health/).

### Identity binding: the value-to-owner index

A multiplexed source emits a row tagged with an external identity (a Zoom Room ID, a controller's
slot number); binding that row to an Omniglass owner is a lookup against a **value-to-owner index**.
The index is an **identity arc** on identity config: a `(datapoint_type, value) -> owner` mapping,
where `datapoint_type` is the **match key** (the canonical identity key, e.g. `zoom.room_id`) and
`value` is the external identity the source emitted. The index resolves **in the cascade scope** the
identity config is set at, so an identity declared at a system or location scope binds the rows of
every member below it.

Two sides can supply the match value, and **precedence** is explicit:

- A **declared identity config value** (an identity the operator set on the target) **wins**.
- It falls back to the **observed identity datapoint** that shares the same key (a value the device
  itself reported under that `datapoint_type`).

So ownership resolution reads the **resolved identity** for the key (declared over observed), matches
the emitted `(datapoint_type, value)` against the index, and binds the row to the owner the index
names. The [datapoints](/architecture/datapoints/) ownership-resolution machinery reads this same
index.

### discovery_rule: orphans become candidates

A `discovery_rule` turns the **orphan / unmatched stream** into proposed entities. Its **input** is
every emitted identity that the value-to-owner index does **not** resolve: an unmatched
`(datapoint_type, value)` from a shared-API batch, plus the **out-of-placement labels** a node emits
for owners outside its placement visible_set (above). Pointing a `discovery_rule` at a source is the
onboarding win: it auto-creates the entities and sets their identity, so you never hand-map.

- **What it creates.** Candidate components or owners, each seeded with the identity that surfaced it
  (the `(datapoint_type, value)` becomes the new entity's identity arc), so the next batch from the
  same source resolves through the index instead of orphaning.
- **Idempotent on re-discovery.** Re-seeing an identity the rule already materialized does **not**
  create a duplicate: the rule keys on the `(datapoint_type, value)` it already bound, so a steady
  stream of the same orphan resolves to one candidate.
- **Scope and standing.** A `discovery_rule` carries a cascade **scope** and an `official` / private
  standing like the other rule families (`event_rule`, calc), so a ship-with `official` rule and an
  operator's private rule compose without colliding.

## Storage

The connection registry, the declared connections, and the node's units of work; the physical layout lives on [storage](/architecture/storage/).

| Table | Key columns | Notes |
|---|---|---|
| `interface_type` | name, **built**, direction (in/out), param_schema (jsonb) | the protocol-and-style registry (`ssh`, `http`, `snmp`, `mqtt`, `webhook`, ...); generates the template config schema |
| `interface` | name (per component), interface_type, **component** (nullable: set = pre-bound, null = shared/match-key), params (jsonb), **node** (server-assigned placement) | the connection, declared once ([nodes](/architecture/nodes/)) |
| `task` | **id = content hash**, interface, **mode (poll/listen)**, spec (jsonb), enabled | a node's unit of collection work; dedupes identical work. Parsing to datapoints is the **edge function**, not the task's job |

---

# Core entities

URL: /architecture/core-entities/

The estate model: component, system, location, and node as the structural entities, the variable-depth trees, and the exclusive-arc owner.

Core entities are the things an operator actually manages, the component, system, location, and node, and giving each its own identity is what lets every datapoint, event, alarm, and config name exactly one of them as its owner. This page covers the structural entities, how they
nest, and how everything else names one of them as owner. The shapes these entities pin are [templates](/architecture/templates/); the data they own is
[datapoints](/architecture/datapoints/); the physical tables are [storage](/architecture/storage/).

## The estate: four structural entities

Three nouns describe what you operate, plus the edge process that collects for them.

- A **component** is a deployed device, app, or service: a display, a codec, a DSP, a control
  processor, a cloud UCC service. It owns datapoints, pins a `component_template_version`, and is
  classified by `component_type`.
- A **system** is a set of components that work together to do one job. A meeting room is a system.
  So is a classroom, a video wall, a broadcast chain. The word is deliberately universal: a system
  is the unit you actually care about, whatever shape it takes. It pins a `system_template_version`,
  is located at a location, and is classified by `system_type`.
- A **location** ties systems and components to a physical place (campus, building, floor, room).
  It is classified by `location_type` and, unlike component and system, has **no template**: for a
  location the type is the only shape-definer.
- A **node** is the edge process (`omniglass --mode node`) that pulls work, reaches components over
  interfaces, and ships results ([nodes](/architecture/nodes/)). It is structural because it is a
  first-class **owner**: a node owns its own self-health telemetry and can carry a node-owned alarm.

A component belongs to a system; a system sits in a location.

```d2
direction: down
classes: { node: { style.border-radius: 8 }; key: { style: { border-radius: 8; bold: true } } }
location: location { class: node }
system: system { class: key }
c1: component { class: node }
c2: component { class: node }
c3: component { class: node }
location -> system
system -> c1
system -> c2
system -> c3
```

Above the four sits the singleton **`global`** estate root: the top owner above every location where
estate-wide health and KPIs roll up, and the top of the [cascade](/architecture/cascade/). One per
deployment, no FK.

| Entity | What it is | Key columns |
|---|---|---|
| `component` | a deployed instance (`dsp-boardroom-3`) | name (unique), type, **parent_id** (self-ref tree), display_name; pins a `component_template_version`; classified by `component_type` |
| `system` | a composition of components / subsystems (the service tree) | name (unique), type, **parent_id** (self-ref tree), display_name; pins a `system_template_version`; carries `location_id`; classified by `system_type` |
| `location` | a place tree | name (unique), type, **parent_id** (self-ref tree), display_name; no template (the `location_type` is the only shape-definer) |
| `node` | the edge process | name (the identity); carries labels, last_heartbeat_at, and its bound credential ([identity and access](/architecture/identity-access/)) |

## The variable-depth trees

`component`, `system`, and `location` are each a **variable-depth tree**: a `parent_id` self-reference
that nests to arbitrary depth (campus -> building -> floor -> room; parent system -> subsystem; chassis
-> card). The trees are the structural backbone of the [cascade](/architecture/cascade/): resolution
runs over an entity's containment path and the **deepest node wins**, weight-free, pure depth.

```d2
direction: right
classes: { node: { style.border-radius: 8 }; key: { style: { border-radius: 8; bold: true } } }
component: component { class: node }
component_type: component_type { class: node }
location: location { class: node }
system: system { class: node }
component -> component_type: classified by (N:1)
component -> component: parent (tree)
system -> location: located at (N:1)
system -> system: parent (tree)
location -> location: parent (tree)
```

A non-leaf node in a tree (a chassis, a floor, a parent system) contributes its **instance**
bindings down the cascade, not its template: a chassis hands a card its chassis-wide credential while
the card keeps its own template ([cascade](/architecture/cascade/)).

### Sub-components and sub-systems

The `parent_id` self-reference is **same-kind nesting**: a **system may have a parent system**
(sub-system nesting) and a **component a parent component** (sub-component nesting), both over the same
variable-depth `parent_id` trees. A chassis with line cards is a parent component over child
components; a building-wide AV system composed of room subsystems is a parent system over child
systems.

This nesting feeds two mechanisms. It feeds the **cascade**, where the **deeper node wins** down the
component and system trees (a sub-system's bindings override its parent's, a sub-component's override
the chassis'). And it feeds the **health rollup**: a **sub-system's health rolls into its parent
system**, and a **sub-component's into its parent component**, the same role-aware composition that
runs up the rest of the tree.

The **practical starting depth is 3 levels** (parent / child / grandchild) for both trees, a guidance
default, **not a hard cap**: the `parent_id` trees support arbitrary depth, and we revisit the
guidance if a use case needs more. The depth-resolution and rollup semantics themselves live in
[cascade](/architecture/cascade/) and [health](/architecture/health/).

## Ownership: the exclusive-arc

Everything observed, asserted, or set in Omniglass attaches to exactly one structural entity, through
the **exclusive-arc**. Every datapoint table, plus `event`, `alarm`, and `variable`, carries:

- an **`owner_kind`** enum, plus
- the **matching typed FK** (`component_id` / `system_id` / `location_id` / `node_id`, or none for the
  singleton `global`), plus
- a **CHECK** that exactly the column matching `owner_kind` is set (or all null for `global`).

This makes **system-, location-, node-, and global-level datapoints first-class** (e.g. `health` is a
`state_datapoint` owned by a system, estate-wide availability is owned by `global`, and a node's
self-health is owned by the node), the fix for a monitoring tool that can only put state on a single
host. The same arc owns the `event` and `alarm` rows a datapoint produces, so a system-owned datapoint
yields a system-owned alarm. The full pattern and the storage DDL are on [storage](/architecture/storage/).

## Structural multi-membership (a component in N systems)

A shared device legitimately belongs to more than one system, which would make the system layer a DAG.
Keep it a **tree with a primary-system pointer** (which system chain feeds the cascade); a truly shared
device **skips the system layer**. The genuine "config differs per system" case is answered by
**per-system effective views** on demand, not by merging chains into the resolution
([cascade](/architecture/cascade/)).

The binding itself is the **`system_member`** table: the **instance assignment** that ties a
`component` to a `system` under a specific role, satisfying a `system_template_member` from the frozen
`system_template_version` (key columns: `system_id`, `component_id`, `role`, plus the pin to the
`system_template_member` it satisfies).

A `system_template_member` declares, per role, a **requirement** (the canonical datapoints and commands a
member must provide) plus its `health_role`; any component whose template meets the requirement can fill
the role, validated on assignment. Detailed on [templates](/architecture/templates/).

## Operational mode: active, maintenance, disabled

Every entity has an **operational mode**, a cascade-resolved state that says how much the platform backs
off, set on the entity or inherited from a parent ([cascade](/architecture/cascade/)):

- **active** (default): collect, evaluate, act, enforce drift, count toward SLA. Normal.
- **maintenance**: **keep collecting**, but **suppress the consequences** of what is seen. Planned work:
  watch, but do not act.
- **disabled**: **stop collecting and interacting** with the device entirely (the Zabbix host-disable).
  The entity stays in the model, dormant, until re-enabled.

Maintenance and disabled are the **same suppression**, differing on one knob, **collection**: maintenance
suppresses consequences but keeps watching (so an operator can verify the work); disabled also goes dark
(no polling, no commands). Both suppress the same four consequences:

- **action dispatch is held**: an alarm may still open (you see it), but no `action_rule` pages or opens a
  ticket ([alarms and actions](/architecture/alarms-actions/)).
- **drift is observed, not enforced**: the set function never fires, so a tech mid-swap is not fought
  ([config](/architecture/variables/)). The device-swap case (a brief declared-identity authority,
  [datapoints](/architecture/datapoints/)) is just maintenance suppressing drift.
- **health rolls up no impact**: a member in maintenance or disabled does not sink its parent's
  [health](/architecture/health/); it surfaces as "down (maintenance)", the truth plus the mode, not a
  fifth health value.
- **SLA does not count it**: the window is excluded from availability and the SLO.

Maintenance is **time-bound**: a window (start / end, [time](/architecture/time/)) that **auto-exits**, or
open-ended until cleared, with a recurring window expressed as a schedule. Disabled is held until
re-enabled. Entering or exiting either is an **audited** operator action ([audit](/architecture/audit/)),
so "no page at 2am, it was in the patch window" is always explainable. Because the mode is
cascade-resolved, maintenance on a system covers its components.

## Decommission and delete

"Delete" is **decommission** by default, a **soft delete**: the entity is tombstoned, drops out of
placement, worklists, and default views, but its **history is retained** (datapoints, events, alarms,
audit), attributed to the tombstone and aging out by [retention](/architecture/storage/). An observability
and control plane must not let "remove this projector" erase the incident record, and a decommissioned
entity can be **re-commissioned** if the device returns. **Purge** is the privileged, audited hard erase
for a genuine mistake (a test component); for a high-volume entity it runs as a background job over the
partitions, while the cheap path is decommission plus letting retention age the firehose out.

Decommissioning runs the **in-flight cleanup**, reusing mechanisms that already exist: collection stops
(the worklist drops it, sessions close), **open alarms auto-resolve** (reason "decommissioned"),
**pending commands and running flows cancel** (the durable command queue, and flows are already gated on
their alarm staying open), and config / tag / credential / group bindings drop. The entity leaves its
parent's health rollup.

The cascade is **not** "delete everything below", because containers do not own their members:

- a **component** (leaf) decommissions as above;
- a **system** delete **unbinds its members** (the `system_member` rows) but **does not delete the
  components**; they become standalone, re-homeable;
- a **location** delete is **refused by the API while occupied** (it returns what is placed there); the
  console offers re-homing before the delete (move the systems and components, then delete the empty
  location);
- a **node** delete **re-places its tasks** (to the server or another node, or surfaces the components as
  uncollected) and revokes the node credential ([identity and access](/architecture/identity-access/));
  `node.*` history is retained.

---

# Datapoints

URL: /architecture/datapoints/

The core data model: datapoints and their three kinds, provenance, the registries, key scope, divergence, fusion, and how a value reads back.

This is the heart of the authoritative data model: what a datapoint is, the two axes that define it, how we know a value (provenance), and how values reconcile, diverge, and read back. The physical layout (tables, partitioning, the lineage CHECK, tiering) lives in storage; the spine is [the architecture overview](/architecture/). Events, calc rules, and the response layer get their own pages: [events](/architecture/events/), [calculations](/architecture/calculations/), and [alarms and actions](/architecture/alarms-actions/).

Datapoints are the **data lane**: observed and calculated datapoints are NATS-native, published to a JetStream `datapoints` stream and consumed live by the rule engine, with a persistence consumer batch-writing them to the PG tables as an async sink. Datapoints are the firehose and never wait on Postgres. Events, alarms, and actions are the **record/state lane**: born in a PG transaction and fanned out by change-data-capture (CDC). The two lanes share one bus (JetStream); this page is home for the data lane, and points at [events](/architecture/events/) and [alarms and actions](/architecture/alarms-actions/) for the record lane.

## Datapoints: one family, three kinds

A **datapoint** is an observation: a value of one key, on one owning entity (component, system, or location), at one time. The row shape is the same for all three kinds: `(owner, key, instance, ts, value, provenance, source, lineage)`. They are three physical tables only because they index and retain differently, not because they are different concepts.

- **metric** (`metric_datapoint`): numeric (float), carries a **unit**. Continuous, aggregatable. Has a current value.
- **state** (`state_datapoint`): categorical, text, or a structured object. Discrete, dwell-measurable. Has a current value (`last()` is meaningful).
- **log** (`log_datapoint`): a component's own words, the value is the log line (text or jsonb), keyed by log type (`log.system`, `log.os`, `log.app.<name>`). A stream, not a current value, but still an observation with a value at a time, so it is a datapoint, not a separate primitive. In practice only components emit logs.

Treating log as a datapoint removes the usual special case: an alarm on a log line is just an event rule whose condition matches a `log_datapoint` value, no different in shape from a metric threshold.

**An event is not a datapoint.** A datapoint is an observation (a value we recorded); an **event** is *our semantic assertion that something happened*, in our vocabulary. Datapoints are what rules read; events are what event rules produce. See [events](/architecture/events/).

### Ownership: the exclusive-arc

A datapoint attaches to a **structural entity**, not only a component. The owner is the **exclusive-arc**: an `owner_kind` enum plus the matching typed FK (`component_id` / `system_id` / `location_id` / `node_id`, or none for the singleton **`global`** estate root) with a CHECK that exactly the column matching `owner_kind` is set. The same arc owns `event` and `alarm` rows. This makes **system-, location-, node-, and global-level datapoints first-class** (e.g. `health` is a `state_datapoint` owned by a system, and estate-wide availability is owned by `global`), the fix for Zabbix's inability to put state on a group of hosts. See Ownership on the spine for the full pattern and the storage DDL.

### The instance dimension: many values of one key on one owner

One owner can hold several distinct values of the *same* canonical key: three fan speeds on a switch, per-port counters, per-channel audio levels. The canonical registry deliberately holds **one** `datapoint_type` per measurement (`fan.speed`, not `fan.speed.intake`), so the discriminator lives outside the key, as an `instance text NOT NULL DEFAULT ''` column on all three datapoint tables. Series identity is therefore **`(owner, datapoint_type, instance, provenance)`**: each instance is its own series, while a singleton (`instance = ''`) is the default. Aggregation stays clean (group by `key`, ignore `instance`); per-instance trends stay distinct.

The instance rides the pipeline as a reserved **`instance` label** on the collected datapoint: the collection extract spec authors it as a `key[instance]` suffix (`fan.speed[intake]=<oid>`, `fan.speed[exhaust]=<oid2>`), the parser strips the bracket into the label so `registryAllows` / `kindFor` still match the bare canonical key, and the derive step reads `instance` into the column. Calc folds **every** instance of an input key into the reduce: a rule reading `fan.speed` from a component gets one candidate per fan, so `worst` / `average` / `count` / Expr aggregate across all of them (a singleton key yields one candidate). An input filter can select one instance (`instance == "intake"`). Recompute needs no instance granularity: a calc consumer reacting to a `fan.speed` datapoint on the stream recomputes over the current state of every instance for `(owner, key)`, so two fan changes in close succession converge on one correct recompute. Calc **outputs** stay aggregate (`instance = ''`); per-instance outputs (one health per fan, a group-by) are a separate capability, not a silent gap, output owners default to the singleton.

### The has-a-value-now razor (datapoint vs event)

A datapoint records a value; an event records an occurrence.

- `"input is 1"` is a value, so it is a **datapoint** (state).
- `"call started"` is an occurrence, "what is call-started now?" is meaningless, so it is an **event**. See [events](/architecture/events/).

A raw occurrence we have not normalized (a syslog line, a raw webhook frame) lands as a **`log_datapoint`** (observed, value = the line). An event rule can then **promote** it into a normalized event. So the log table is also the holding pen for un-normalized occurrences until a rule recognizes them.

## Kind and provenance: the two axes

Every datapoint sits on two independent axes:

- **Kind** answers *what kind of thing is this?* It is fixed per **key**, decided once when the key is defined (`power.state` is always a state), so kind is a property of the **key**, the three kinds above.
- **Provenance** answers *how do we know this particular value?* It varies per **row**: the same `power.state` can be observed or intended at different moments. (A *declared* desired value is not a provenance; it lives in [config](/architecture/variables/), keyed to the signal.) Provenance is a property of the **row**, detailed below.

Kind is set by the key, provenance by the row, and the two never depend on each other.

## The datapoint_type registry

A datapoint and an event are different shapes (a datapoint has a value; an event is an occurrence), so each gets a registry named for what it holds. The event half is [`event_type`](/architecture/events/); the datapoint half is `datapoint_type`. We do **not** force them into one universal registry, that would be the false unification the rest of this model avoids.

**`datapoint_type`** describes every datapoint key: `(name, scope, template_id?, kind, value_type, unit, precision, fusion_policy, validation)`, with the **`scope`** (`template` / `org` / `official`) deciding where the name is unique (see [Key scope](#key-scope-template-org-official)). One registry across all three datapoint kinds (metric/state/log). The kind is decided by the key, not at runtime: the compiler bakes each key's kind into the edge unit, so a value routes to the right table with no runtime lookup, the same way at every scope. `fusion_policy` is the key's read-time **default** for reducing multiple perspectives, a hint rather than a mandate (see [Fusion](#fusion)). A key names a **measurement, never its owner** (`temperature`, not `room.temperature`), with snake_case segments in a dot hierarchy and the **canonical unit** in the `unit` field (`fan.speed` + `unit: rpm`, not `fan_rpm`); the ship-with official set lives in `internal/registry/defaults.yaml`. Adding or naming one: the `canonical-datapoint` skill.

The naming convention is consistent: a `_type` registry defines what a thing *is*, named for the thing (`datapoint_type`, `event_type`, like `component_type`, `interface_type`). `datapoint_type` spans the three datapoint kinds, and events get their own registry because an event is a different shape.

**Datapoint key naming is owner-agnostic.** A key names a *measurement*, never its owner: `temperature` is a Celsius reading whether a codec's thermals or a room's ambient sensor produced it, and the owner (component / system / location / node / global) plus a template's labels and the function that collected it give it context. So there is no `system.` / `device.` / `room.` prefix; keys group by measurement domain (`cpu.utilization`, `power.state`, `video.input`, `audio.level`, `network.icmp.rtt`). This is the normalization the product hinges on: one canonical path means one comparable signal across every vendor, which is what makes cross-fleet dashboards and AI useful. The official set is seeded from `internal/registry/defaults.yaml` following OpenTelemetry semantic conventions for the IT leaves (`cpu.utilization`, `memory.usage`; semconv's own `system.` prefix is dropped to avoid colliding with the `system` entity type) and the [OpenAV minimum-device-functionality guidelines](https://github.com/OpenAVCloud/specifications/blob/main/min-device-functionality/OAVC-AV-Device-Minimum-Functionality-Guidelines.md) for AV signals. A template declares its datapoints at **template** scope, or references an **org** or **official** key: the distro mints **official** keys, the deployment mints **org** keys, and a template mints its own **template**-scoped keys (the Zabbix model, where two templates can both declare an `input` with no collision).

### One identity, three shapes

`datapoint_type` is **one registry, not three**. A key's **kind** is intrinsic and fixed (one key is one kind, forever), so identity, [scope](#key-scope-template-org-official), and the promotion ladder live on one row, and `(scope, name)` is unique across all kinds: a name is a metric or a state, never both. What differs by kind is the **shape** the row carries:

- **metric**: `value_type` (float), a **unit** and optional **precision**, and a numeric range (`validation: {min,max}`). The full numeric shape.
- **state**: a **value domain**, the allowed set (`validation: {values:[...]}`); no unit, no precision.
- **log**: almost nothing. There is **no `log_type`** worth the name: a log's "type" is its **key namespace** (`log.system`, `log.os`, `log.app.<name>`, the hardware / service families), plus a level. You never give it a unit, a domain, or fusion.

These shapes ride **inline** on the one row today: kind-conditional columns (unit / precision, on metrics) and the `validation` jsonb that reads as a range for a metric and a domain for a state. The kind is decided once and compiled into the edge unit, and the registry is cached in-process, so reading a type's shape is a map lookup, never a per-datapoint join.

If the metric and state shapes grow, they may later move to **1:1 per-kind sidecar tables** (`metric_type`, `state_type`) keyed by the `datapoint_type` id, exactly as the IAM [`principal`](/architecture/identity-access/) splits into per-kind `human` / `service` / `node` tables (`log` keeps no sidecar). That is a cold-path normalization, one cheap PK join when reading the shape, and the registry is cached either way, not a hot-path change: the firehose never joins the type registry.

**Validation on insert, under a policy.** Every datapoint is typed by a `datapoint_type` row (the FK is **non-null**: a template-scoped key is a real `datapoint_type` row at `scope=template`, not an inline-only shape), so insert checks two things, plus an optional third. **(1) The key resolves** to a `datapoint_type` at a reachable scope: a template-scoped key self-resolves; a referenced org or official key must exist, checked at template compile time. An unresolved key is **reject-not-project**: kept out of the typed series, a `datapoint.validation_failed` event raised, the raw carried on a `collection.failed` event so nothing is lost (backfillable once the registry or template is corrected). **(2) The value conforms** to the type's kind and domain: the type's `validation` (`{min,max}` for a metric, `{values:[...]}` for a state). Optionally **(3) the owner kind** must be one the type allows. All three are governed by the `validation_policy` config mode: **bypass** (skip), **audit** (the default: write the value but emit the event), or **enforce** (hold the value back from the typed series, emit the event). The point is visibility: an out-of-range or unmapped value means a template author declared a type the device disagrees with, so the violation surfaces as an owner-attributed event operators and admins see. The mode resolves per-entity down the cascade (global, location, system, component), so a noisy device class can run in `audit` while the rest of the fleet enforces.

## Units: one canonical unit per key

**Unit is a metric concept.** Only a **metric** carries a unit (and the display `precision` below): a number needs a unit to mean anything, and even a dimensionless metric has one (`ratio` or `count`). A **state** (categorical) and a **log** (text) have **neither**. For metrics, **storage is canonical-at-rest**: every metric `datapoint_type` declares **one canonical unit** in its `unit` field (a registered unit, see below), and stored values are **always** in that unit. The firehose is single-unit, so every threshold, calc, and fusion compares like with like. We never store mixed units, and we never put the unit in the `instance` dimension: `instance` discriminates co-existing values of one key on one owner (three fan speeds), not the unit those values are expressed in. A genuinely different measurement is a different `datapoint_type`, not a unit variant: units only convert **within one family**.

**The `unit` registry.** Units live in a `unit` registry grouped by **family** (dimension): temperature, data-size, bitrate, ratio, and so on. Each family declares one **canonical unit** plus zero or more **alternate units**, and each alternate carries a **`to_canonical`** and a **`from_canonical`** transform: **affine** (a factor plus offset) for the common case, or an **Expr** for the rare nonlinear one (dB). The registry is **official / org scoped** like the other registries. Example: the temperature family is canonical `celsius`; `fahrenheit` carries `to_canonical: (v - 32) * 5/9` and `from_canonical: v * 9/5 + 32`.

**Dimensionless is still a unit.** A **ratio** is not "a number with no unit", it is the `ratio`
family: canonical `ratio` (`0..1`) with `percent` as an alternate (`ratio * 100`), so `cpu.utilization`
is **stored** as `0.9` and authored or shown as `90%` through the same convert path, never stored as a
percentage. A bare **count** (people, error tallies) is a cardinal `count`, distinct from a ratio. So the
`unit` field is exactly what separates a ratio from a quantity carrying a physical unit (`celsius`,
`rpm`, `bps`): both are `metric` kind, and the **unit (its family)** is the discriminator, dimensionless
or dimensioned. (`kind` answers *metric / state / log*; `unit` answers *which dimension, if any*.)

Conversion happens only at the two edges and in expressions; the rows in between stay canonical.

**Normalize-in at the edge.** When a device reports a non-canonical unit, the component template's **alignment value-transform** (the existing "align to a canonical key, plus an optional value transform") converts native to canonical **before** the datapoint is emitted. A Fahrenheit display's template emits `celsius`. The device's native unit is a [collection](/architecture/collection/)-time fact carried by the [template](/architecture/templates/), never a storage fact.

**Convert-out on read.** Showing a non-canonical unit to an operator is a **presentation** concern: the [UI](/architecture/ui/) and [views](/architecture/views/) convert canonical to the operator's display unit (a per-user / per-locale preference), looked up from the `unit` registry, exactly as a severity level's label and color resolve client-side. Storage is untouched: one operator reads Celsius, another reads Fahrenheit, off the same rows. Because the `datapoint_type` declares the canonical unit, this conversion is automatic.

**Display precision is part of the type.** Alongside the unit (and, like the unit, only on a metric), a
`datapoint_type` carries an optional **`precision`** (significant digits to render), a presentation default the same way the canonical unit is:
a temperature shows `21.5`, a utilization `90%`, a link `1.2 Gbps`. It governs **rendering only**. The
stored `value_type` (float8) keeps full precision, `precision` never truncates a stored value, and the
[UI](/architecture/ui/) or a locale can override the default. (Dropping noise at *ingest* is a separate
collection-time rounding, not the type's display precision.)

**Expressions: `convert(value, "<unit>")`.** A stdlib function in Omniglass [expressions](/architecture/expressions/). The **source unit is inferred** from the bound datapoint's canonical unit; the **target** is a registered unit that must be in the **same family** (a compile error otherwise); the conversion comes from the `unit` registry. So `convert(value, "fahrenheit") > 100` lets an operator author a threshold in Fahrenheit while storage stays Celsius. It is data-driven and general, chosen over a per-unit method like `value.toFahrenheit()` that would need a method per unit, and is available wherever expressions run: event rule and alarm criteria, calc leaves, and list filters.

## Key scope: template, org, official

A datapoint key carries a **`scope`**, the axis that decides where its name is unique and where its trust comes from. Three layers:

| scope | identity (uniqueness) | trust | who defines it |
|---|---|---|---|
| **template** | `(template_id, name)` | local | the template author |
| **org** | `name`, unique within the deployment | local custom canonical | the org / operator |
| **official** | `name`, globally | shipped with the distro | the distro |

`official` is just `scope == official` (the prior pass's `official` boolean folds into this enum as its top value). **Conflicts are impossible** at template scope because a template-scoped key is identified by `(template_id, name)`: two templates can both declare an `input` datapoint with no collision (the Zabbix model). Trust still comes from **distribution**, not a label: an official key is trusted because it is **in the release**, the same `video.input` across every vendor, not spoofable. An org key is a deployment's own custom canonical, authoritative within that one database (per-database isolation makes it unambiguous, one database is one tenant). A template key is local to the template that minted it.

**Every datapoint is typed by a registry row, just at some scope.** The datapoint -> `datapoint_type` FK is **non-null**: template-scoped keys ARE `datapoint_type` rows (`scope=template`, with a `template_id`), not inline-only shapes, so there is no nullable type FK and no dual identity. Kind, unit, and validation live on the type row at **every** scope, so the edge compiler bakes the kind and routes to the right table (metric / state / log) the same way for all three layers. Series identity is `(owner, datapoint_type, instance, provenance)`.

**The promotion ladder is template -> org -> official.** Each step is a cheap **re-scope or re-point**, not a migration: lift a template's `input` to an org-canonical `video.input` and re-point the template's datapoint at it; later it gets blessed **official** by being shipped in the distribution (the one real way trust is earned, not a flag an operator sets). Datapoints already collected keep resolving. This is the "shift to normalized over time" path.

**Normalization is therefore optional but encouraged.** A template ships using template-scoped keys with zero registry friction; aligning a key to org or official is what buys cross-fleet comparability, dashboards, and AI. The shipped official set covers the common AV/IT signals, so most templates align by just **referencing** one. Sharing happens at the **template level** (a repo or marketplace of templates): an imported template is **linked** (tracks upstream) or **copied** (forked, diverged), and the keys it introduces land at **template** or **org** scope, not as a federated signal trust tier.

**Governance is curation, not runtime enforcement.** Omniglass is a Postgres database an operator runs, so nothing stops a self-hoster inserting an org-scoped row or editing an official one in their own database. We vouch only for what we **ship**; you vet what you import, and you own the risk. Commands sit at **template** scope the same way (functions live on the template); a canonical command type follows the same promotion ladder (see [templates](/architecture/templates/)).

## Provenance: how we know a value

Provenance is the second axis, stamped per datapoint row. The same key, with the same value, can be known three ways. Each provenance points at the immutable ground-truth record that produced it (its **lineage**), and the lineage column populated is mutually exclusive per provenance, enforced by a CHECK constraint.

| Provenance | How we know it | Lineage points at |
|---|---|---|
| **observed** | measured from a component | on-row: `source_rule` (+ version), the edge function that parsed it |
| **calculated** | derived from other datapoints | on-row: `source_rule` (+ version), the calc_rule |
| **intended** | the declared effect of a command we issued, pending reconciliation | `event_id` (the command event) |

A value of any provenance is still a metric/state/log (the kind is fixed by the key); provenance only records *how it got there*. All three land in the same datapoint tables, side by side for the same key, which is what makes divergence detection free. Declared intent is the fourth value an operator can assert, but it lives in [config](/architecture/variables/), not in the datapoint tables, and can be compared against an observed datapoint for drift.

A separate **`source`** column records *which sensor or path* produced an observed value (`codec.cec` vs `display.lan` vs `control.system`). Source is distinct from provenance: provenance is *how we know* (observed), source is *which sensor told us*. Three sensors reporting one display's power are three observed rows on one key, differing only in source. This is what makes multi-source corroboration and [fusion](#fusion) possible.

**Trace columns, orthogonal to lineage.** Each datapoint table also carries a nullable **`correlation_id`** and an optional **`caused_by_event_id`**. These are **trace** columns, not lineage: they record *what causal thread this row belongs to*, not *what immutable record produced this value*, so they sit outside the mutually-exclusive lineage CHECK and never count toward it (a row may carry both its on-row `source_rule` lineage and a `correlation_id` with no conflict). An action's command **propagates the originating `correlation_id`** onto the adaptive-poll's observed datapoint, so the `event_rule` that fires off that observed value **inherits** the same id, and the cycle-guard walk crosses the command -> device -> observed-datapoint round trip on a real carried id rather than an assumed lineage. See [alarms and actions](/architecture/alarms-actions/) for the cycle-safety mechanic.

### observed: from a component, via an edge parse

"Measured from a component," not "from a device", every device is a component, but not every component is a device. The node parses the payload at the edge and **publishes the observed datapoint to the JetStream raw ingress subject** (admission confines its owner before it reaches the trusted stream); it does not write to Postgres. The observed datapoint carries its own lineage on the row: `source_rule` + `source_rule_version` (which function and template version made it, the backtest hinge). The verbatim payload it parsed is **not** kept (no telemetry table); raw surfaces only on a `collection.failed` event or a dev raw-mode tap, or is retained for a bounded window under the opt-in `raw_sample` policy ([collection](/architecture/collection/)), which is still not a telemetry table. There is no separate execution table, a derived datapoint is itself the evidence of the function's run, exactly as an event/alarm/action row self-describes.

### calculated: derived by a calc rule

A calculated value (a 5-minute average, a system rollup, a fused consensus) is parallel to observed: both are machine-derived. The difference is the input: an edge function parses a device payload, a calc rule reads **other datapoints**. A calc consumer reads datapoints **off the trusted JetStream `datapoints` stream** and **publishes its derived datapoint back onto it directly** (a trusted server producer, no admission pass), so calculated values re-enter the data lane exactly like observed ones (and are themselves available to downstream calc and to the rule engine). Both carry `source_rule` + `source_rule_version` on the row, so they are distinguished by the **`provenance` column** (an edge function versus a calc_rule), not by a pointer. The exact inputs a calc read are reconstructable from the rule version (that is what backtest does); if an immutable input snapshot is ever needed it is a nullable `inputs jsonb` column, not a table. The rule itself lives on [calculations](/architecture/calculations/).

### intended: the declared effect of a command

When the action layer issues a command, it records the command as an **event** and writes the **intended** state it expects, in one step. The command and its event are born in the record/state lane (PG-first, CDC-out); the intended datapoint **re-enters the data lane** on the `datapoints` stream **under the command's target owner** (the target was scope-checked at dispatch; the action layer is trusted server-internal, so it publishes to the trusted stream directly, no admission pass), so the command's expected effect rides the same stream as observed and calculated values and reconciles against the observed value that the device round trip produces. The intended datapoint's lineage is that command event. The name is deliberate: **intended vs observed** is the central razor, intent-in-progress versus measured reality.

```text
1. command issued:  "power on display-5"  -> recorded as an event
2. intended write:  display-5 power = on, provenance=intended, lineage=<the command event>
                    a bet: intended, not measured
3. adaptive poll:   the command triggers a poll sooner than the normal interval
4. observed arrives:
     observed = on  -> reconciled (the bet paid off)
     observed = off -> divergence (the command did not land)
```

There is no separate "mapping" primitive. Which state a command intends lives on the command definition. **Only commands set intended state** (intended's lineage is always a command event). An external event that implies a state ("meeting started, so the room is occupied") is not intended state: it is observed reality, so it lands as an **observed** state through the ordinary edge-parse path, not the command lane.

Not every log-to-state path goes through a command. The split is measured fact vs pending intent:

| The source says | Means | Path |
|---|---|---|
| "eth0 **is** down" | a component reporting measured reality | edge parse, then **observed** state, directly |
| we sent "**power on**" | intent in progress, not yet confirmed | command, event, then **intended** state |

### declared values are config

mac, ip, serial, locked-input, anything an operator *sets* is declared intent, and declared intent is **not** a datapoint provenance. It lives in [config](/architecture/variables/): keyed to the same canonical signal as its observed side, resolved through the scope cascade, never in the datapoint tables. There is no separate property store: config is the declared side of a signal plus the cascade. Ownership resolution reads the resolved identity (a declared identity config value, or the observed identity datapoint that shares its key) to bind observed data to components, through the [identity-binding index](/architecture/collection/) (a `(datapoint_type, value) -> owner` arc) that collection maintains.

### Precedence: spec versus status lives in config

When declared intent and observed reality disagree, which one wins is a **per-config-item `reconcile` policy** ([config](/architecture/variables/#drift-and-reconcile)), not a per-key datapoint attribute:

- **observed wins** is `reconcile: observe` (or `warn`): the declaration was a hint or stale guess, reality is truth. A device reporting a different MAC than the declared one is a divergence to surface (silently under `observe`, as an alarm under `warn`); adopting the observed value as declared is a separate one-shot import action.
- **declared wins** is `reconcile: enforce`: the declaration is the spec, reality should conform. Observed input HDMI2 against a declared HDMI1 means the world is wrong, converge via the set function (self-healing, the Kubernetes spec-and-status pattern), and alarm if the set fails.

Among datapoint provenances there is no precedence contest: intended is a pending bet that observed confirms or refutes (reconciliation, see [intended](#intended-the-declared-effect-of-a-command)), and observed supersedes it on arrival. The spec-versus-status decision is config's reconcile policy, not a per-key datapoint attribute. Device-swap (where a declared MAC is briefly authoritative before the device reports it) is handled by a component's [maintenance mode](/architecture/core-entities/#operational-mode-active-maintenance-disabled), which suppresses drift.

## Ground truth versus derived

Distinguished by a property of the table, not a naming suffix.

- **Raw payload: not stored.** Datapoints are emitted at the edge, so the verbatim wire payload is **not persisted** (no `telemetry` table). Raw surfaces only on a **`collection.failed`** event when a parse or validation rejects (diagnosis, and the one backfill-after-fix case) and via a **dev raw-mode** tap; the datapoint is authoritative, its lineage is `source_rule` + version. The opt-in `raw_sample` policy ([collection](/architecture/collection/)) can retain raw for a bounded, sampled, short-lived window, off by default, still not a telemetry table.
- **Live on NATS, durable in PG.** The live datapoint is the message on the JetStream `datapoints` stream; the durable copy in the `metric_datapoint` / `state_datapoint` / `log_datapoint` tables is written by a **persistence consumer** that batch-writes off the stream as an **async sink**, idempotent on series identity. The sink never gates the rule engine: rules read datapoints from NATS, and a slow or paused persistence consumer holds up only the durable record, not the live signal. Datapoints are the firehose, so they reach Postgres through the sink and **do not go through CDC**, unlike the record/state lane (events, alarms, actions), which is born in a PG transaction and fanned out by CDC.
- **Ground truth, logs** (immutable, append-only, the actor's own record): **`log_datapoint`** (a component's words, a datapoint kind), **`audit_log`** (an operator), **`session_log`** (connection lifecycle, node-reported), **`internal_log`** (platform self-narration), and the **`collection_log`** / **`node_log`** companions. Each named for what it is. There is no separate rule-execution table: a derived row *is* the evidence of its rule's run, carrying `source_rule` + `source_rule_version` on the row itself.
- **Derived** (produced by rules, reconstructable in principle from ground truth): **`metric_datapoint`**, **`state_datapoint`**, **event**, **alarm**, **action**.

A datapoint's lineage is `source_rule` + version (the function that made it). The companions extend it: `collection_log` is the cheap per-run execution record (every run, including failures), `node_log` the node's operational narration. A failed parse rides a `collection.failed` event carrying the raw; there is no telemetry table in the chain. See the architecture overview on the spine.

## The DAG invariant

The pipeline must stay acyclic.

> A rule may **read** observed and calculated values as truth. It may **compare** an intended value, or config's declared value, against observed (drift). It may **not** treat an intended value *as truth* to infer a new fact.

This is what makes drift safe: a drift rule reads the *pair* (intended, observed) and emits when they disagree; the intended value is tested, not trusted. The one forward edge command-to-intended-state is terminal (nothing reads back from an intended value to produce more state). Event rules reading only observed/calculated keeps the graph acyclic with no runtime cycle guard required.

The command -> device -> observed-datapoint round trip is the one path where the acyclic structure cannot be read off the static graph (a command can provoke an observed value that fires the rule that issued the command). The propagated `correlation_id` closes that gap: because the command stamps its id onto the observed datapoint, the run that fires off that observed value carries the same thread, and the cycle-guard walk follows a **real carried id** across the round trip rather than inferring lineage. The DAG invariant is therefore enforced, not merely assumed, on the only edge that needs runtime help.

## disagree and divergence

Drift is a condition operator, **`disagree(A, B)`**, usable inside event rule conditions, comparing two provenances (or two sources) of one key:

- `disagree(intended, observed)`: the command did not land (reconciliation)
- `disagree(declared, observed)`: the world drifted from intent (config drift, device swap); the declared side is read from [config](/architecture/variables/)
- `disagree(observed, observed)` across `source`: sensors conflict (a failing sensor)

> Any two provenances of the same key that disagree = an anomaly. One detector.

Command reconciliation, configuration drift, sensor conflict, and hardware-swap detection are not separate features; they are one comparison applied to a key that can hold more than one provenance.

## Fusion

When multiple sources report one signal, they land as **perspectives**: source-tagged observed rows differing only by `source`, **all preserved** (seeing multiple perspectives on one value is itself instructive). A reduce-on-read **policy** produces the effective value. Fusion splits by whether the inputs describe the same key:

- **same-key, many sources** keeps every perspective and reduces on read. The key's **`fusion_policy`** on `datapoint_type` is a **default/hint, not a mandate**: the right reduce often is not knowable a priori at the `datapoint_type` level (for `display-5.power` from codec CEC, display LAN, and the control system, you cannot know how to fuse the value before considering the actual sources). So a policy may **default from the type**, but can be source-weighted, per-instance, or **left open**: keep all perspectives and decide at read time, by an operator, or by AI. When a policy reduces (`mode`: priority / weighted / majority / worst / average / latest, plus tie-break and optional per-source weights), the reduced value is what `current_value` and event_rule evaluation read; the source-tagged perspectives stay, so "which source is wrong" remains queryable. `event_rule` evaluation reduces over the **latest-per-source perspective set** for the owner and key, held from the live `datapoints` stream (a bounded, in-memory set), never a firehose scan of the durable tables. This improves *confidence in a reading*. A `source` registry carries default trust weights, so the simplest case needs no config. Materialize a fused series only if a profile earns it.
- **cross-key / system-level** is a **`calc_rule`** (the only fusion that authors a rule): `room.in_use` derived from display power + codec call-state + occupancy. This *derives a higher-order fact*, a new key, not a same-key consensus. See [calculations](/architecture/calculations/).

Conflict detection (`disagree(observed, observed)` across sources) is the complementary operation: even when an effective value is usable, a perspective disagreeing beyond tolerance is itself a signal.

## Reads: current value is a view

Current value (latest per owner / key / **instance** / **provenance**, reduced across the source perspectives per the effective `fusion_policy`) is a **view** over the persisted tables, correct and zero-maintenance. It is keyed per-provenance because "current observed power" and "current intended power" are different values for the same key, and the divergence model depends on seeing both. A materialized `current_value` table is a measured optimization, earned when a read profile proves the view too slow: the driver is **operator and fleet-dashboard reads**, not the rule engine, since the rule engine evaluates against datapoints live off the JetStream `datapoints` stream and never reads the view. The same view-by-default discipline as storage applies. Ownership resolution reads resolved identity config (the declared value, else the observed [identity datapoint](/architecture/collection/)) by targeted indexed lookup, not a full scan, so it does not by itself justify the materialized table.

## The datapoint tables

The three kinds are three physical tables only because they index and retain differently; the [physical layout, partitioning, and the lineage CHECK](/architecture/storage/) live on storage.

| Table | Key columns | Notes |
|---|---|---|
| `metric_datapoint` | id, ts, **owner_kind, component_id/system_id/location_id/node_id**, key, **instance**, **value float8**, provenance, source, **source_rule, source_rule_version, event_id**, **correlation_id?, caused_by_event_id?** | the firehose; BRIN on ts; numeric aggregation. `instance` (`''` default) discriminates many values of one canonical key on one owner. `correlation_id` / `caused_by_event_id` are nullable trace cols, outside the lineage CHECK |
| `state_datapoint` | id, ts, owner arc, key, instance, **value text/jsonb**, provenance, source, + same lineage and trace cols | sparse, transition-only; time-in-state and dwell. [Config](/architecture/variables/) is keyed to one as its observed side |
| `log_datapoint` | id, ts, owner arc, key, instance, **value text/jsonb (the line)**, level, provenance, source, + same lineage and trace cols | GIN / tsvector full-text; also the holding pen for un-normalized occurrences |

Common datapoint columns (all three kind-tables): `ts`, the **owner arc** (`owner_kind` plus `component_id` / `system_id` / `location_id` / `node_id`), `key, provenance, source`, the on-row lineage `source_rule, source_rule_version, event_id`, and the nullable trace columns `correlation_id, caused_by_event_id` (outside the lineage CHECK); only the value column differs (float8 / text-jsonb / line). A `datapoint` view UNIONs the common columns for "all datapoints for owner X".

The key registry that types these tables is `datapoint_type` (one registry across all three kinds), detailed at [the datapoint_type registry](#the-datapoint_type-registry):

| Table | Key columns | Notes |
|---|---|---|
| `datapoint_type` | name, **scope** (template/org/official), **template_id?**, kind (metric/state/log), value_type, unit, **precision**, **fusion_policy**, validation (jsonb) | the one key registry across all datapoint kinds; `scope` decides where the name is unique (`(template_id, name)` at template scope, `name` at org/official); referenced by templates, which also mint their own template-scoped rows. `unit` is the **canonical** unit, a row in the `unit` registry below; `precision` is a display hint (significant digits), not a storage truncation. Both apply to **metrics**; a **state** or **log** has neither |
| `unit` | name, **family** (temperature/data-size/bitrate/...), **canonical** (bool), **to_canonical**, **from_canonical** (affine factor+offset, or Expr), **scope** (official/org) | the unit registry: one canonical unit per family plus alternates each carrying its conversion transforms; the `datapoint_type.unit` canonical unit references it, and `convert(value, "<unit>")` resolves same-family targets through it |

## The pipeline, end to end

```d2
direction: down
classes: {
  node: { style.border-radius: 8 }
  key: { style: { border-radius: 8; bold: true } }
  group: { style.border-radius: 8 }
}
edge: "Edge (node)" {
  class: group
  task: "task\npoll · listen\nstateless / stateful" { class: node }
  fn: "function\nextract → key → normalize" { class: node }
  task -> fn
}
raw: "raw ingress\nnode · webhook (untrusted)" { class: node }
admit: "admission consumer\nowner-confine per class\n(system mode)" { class: node }
ds: "JetStream\ntrusted datapoints stream" { class: node; shape: queue }
failed: "collection.failed\n(carries raw)" { class: node }
calc: "calc_rule consumer\ncross-key · system-level" { class: node }
erule: "event_rule consumer\nfire_criteria (+ optional clear_criteria)" { class: node }
persist: "persistence consumer\nbatch sink (async)" { class: node }
tables: "metric · state · log\ndatapoint tables" { class: node; shape: cylinder }
sched: "schedule + timer\n(leader-elected clock)" { class: node }
pg: "event · alarm\n(PG)" { class: node; shape: cylinder }
alarm: "alarm\none incident · new row per open\n(event_rule, owner)" { class: node }
cdc: "JetStream\nrecord/state lane" { class: node; shape: queue }
actions: "action_rule consumer\nnotify · command\nremediate-verify-escalate" { class: node }
itsm: "ITSM (action target)" { class: node }
operator: operator { class: node }
config: "config\ndeclared (spec)" { class: node }
audit: audit_log { class: key }
divergence: divergence { class: node; shape: hexagon }
edge.fn -> raw: "observed · lineage on row\n(source_rule)"
edge.fn -> failed: "parse / validation fail" { style.stroke-dash: 4 }
raw -> admit
admit -> ds: "confined"
ds -> calc
calc -> ds: "calculated · trusted producer\n(direct, no admission)"
ds -> erule
ds -> persist
persist -> tables: "durable copy"
sched -> erule: "origin=scheduled"
erule -> pg: "PG-first: event + alarm in one tx"
pg -> alarm: "alarm transition"
pg -> cdc: "CDC (logical decoding)\nleader-elected publisher" { style.stroke-width: 3 }
cdc -> actions
actions -> ds: "command's effect · provenance=intended\n(trusted, direct)" { style.stroke-dash: 4 }
actions -> itsm: "ITSM: open->ticket · update->comment · resolve->close" { style.stroke-dash: 4 }
actions -> edge.task: "command + adaptive poll" { style.stroke-dash: 4 }
operator -> config: "declares (PG-first)"
config -- tables: "links · drift" { style.stroke-dash: 4 }
operator -> audit: "audit" { style.stroke-dash: 4 }
cdc -- divergence: "disagree(A,B): drift / conflict" { style.stroke-dash: 4 }
```

Two lanes, one bus. The **data lane** is the JetStream **trusted** `datapoints` stream. Untrusted publishers (the edge node, an external webhook) land on a **raw ingress** subject; an **admission consumer** owner-confines each datapoint against the publisher's placement (or the webhook interface's declared owner) and re-publishes only confined points to the trusted stream, so a forged owner is dropped before the live `event_rule` can act on it ([identity and access](/architecture/identity-access/)). **Trusted server producers** (calc output, a command's intended write) publish to the trusted stream directly, no admission pass. The `event_rule` consumer evaluates against the trusted stream live, and a **persistence consumer** batch-writes the three datapoint tables (`metric_datapoint`, `state_datapoint`, `log_datapoint`) as an async sink (datapoints never go through CDC). The **record/state lane** is PG-first: an `event_rule` fire writes the event and alarm transition to PG in one transaction, and a leader-elected **CDC publisher** (logical decoding of the WAL) fans those committed changes onto JetStream, where `action_rule` consumers react. A command's intended datapoint re-enters the data lane (the device round trip). The teal node is `audit_log`, the ground-truth record of operator writes (including config changes); observed and calculated carry `source_rule` on the row, intended points at the command `event` (via `event_id`). The raw payload is not stored: a parse or validation failure rides a `collection.failed` event. [config](/architecture/variables/) holds declared intent (PG-first), keyed to a state datapoint as its observed side.

Related: [events](/architecture/events/) (the event family and `event_type`), [calculations](/architecture/calculations/) (calc rules and the rule families), [config and credentials](/architecture/variables/) (declared config, drift, reconcile), [collection](/architecture/collection/) (how telemetry arrives), [alarms and actions](/architecture/alarms-actions/) (alarm lifecycle, actions), and [the glossary](/architecture/glossary/) (every term defined once).

---

# Events

URL: /architecture/events/

Events: our semantic assertion that something happened, the event_type registry, and the four ways an event arrives.

An **event** is *our semantic assertion that something happened*, in our vocabulary: a discrete, point-in-time occurrence the action layer reacts to, owned through the same exclusive-arc as a datapoint. It is **not** a datapoint (a datapoint records a value; an event records an occurrence, see [the has-a-value-now razor](/architecture/datapoints/#the-has-a-value-now-razor-datapoint-vs-event)). Datapoints are what rules read; events are what event rules produce. The rules that produce events live on [calculations](/architecture/calculations/); the alarms paired events drive, and the actions that respond, live on [alarms and actions](/architecture/alarms-actions/).

## The event_type registry

A datapoint and an event are different shapes (a datapoint has a value; an event is an occurrence), so each gets a registry named for what it holds. The datapoint half is [`datapoint_type`](/architecture/datapoints/#the-datapoint_type-registry); the event half is `event_type`. We do **not** force them into one universal registry, that would be the false unification the rest of the model avoids.

**`event_type`** describes every event key: `(name, display_name, payload_schema, scope, ...)`, with the same **`scope`** (template / org / official) as the datapoint registry; a template can define a template-local event. Declaring event types (`call.started`, `cable.unplugged`, `command.sent`) is first-class and valuable: it gives events a known schema, makes them inspectable, and is what lets an event rule promote a raw log line into a *registered* event. An event key is registered here; an unregistered occurrence stays a `log_datapoint` line until a rule promotes it.

The naming convention is consistent: a `_type` registry defines what a thing *is*, named for the thing (`datapoint_type`, `event_type`, like `component_type`, `interface_type`). Events get their own registry because an event is a different shape from a datapoint. The `scope` axis works the same way as for datapoints: see [key scope](/architecture/datapoints/#key-scope-template-org-official).

## Events: caught, caused, derived, scheduled

An event arrives one of four ways; none is auto-manufactured from a state flip (a transition is already two consecutive datapoint rows, derivable by query).

1. **caught**: a structured occurrence arrives (xAPI Event channel, a webhook, a trap), or an event rule **promotes** a `log_datapoint` line into a normalized event.
2. **caused**: we issued a command, recorded as an event; this is what opens an [intended](/architecture/datapoints/#intended-the-declared-effect-of-a-command) datapoint.
3. **derived**: an event rule fuses signals into an operator-meaningful fact ("codec in-call + traffic spike + room booked, so meeting started"), inferred without instrumenting the control system.
4. **scheduled**: the clock fired a schedule. A schedule fire *is* an event with `origin=scheduled`, manufactured by the clock (a leader-elected singleton held via a NATS KV CAS lock, exactly one active, failing over on death); there is no separate schedule log table. So `action_rule` subscribes to events uniformly (**schedule to event to action**: digests, synthetic checks, SLA resets are all schedule fires an action subscribes to).

Caught/caused/derived/scheduled is the event's **origin**, a small vocabulary on the event table; it is not the same enum as datapoint provenance. The discipline that keeps an event-driven system from rotting is that events are declared (registered event keys) and rules are inspectable (the blast-radius preview in the UI).

## Storage

The `event` row is the semantic-occurrence log; `event_type` is its key registry. The physical layout (partitioning, the owner arc, lineage) lives on [storage](/architecture/storage/).

An event is **born in a Postgres transaction**, on the record lane. When an `event_rule` fires, the consumer writes the `event` row and its paired alarm transition to PG in one transaction (the alarm edge is serialized per `(event_rule, owner)`); the event is the durable record, the alarm is the stateful edge. The event is **not** published directly from the rule (no dual-write): a leader-elected CDC publisher (logical decoding of the WAL) fans the committed change out to JetStream, where the `action_rule` consumers react. Postgres is the system of record; JetStream carries the committed event onward. This is the opposite lane from datapoints, which live on NATS and sink to PG asynchronously (see [datapoints](/architecture/datapoints/)).

| Table | Key columns | Notes |
|---|---|---|
| `event` | id, ts, key, **origin** (caught/caused/derived/scheduled), owner arc, payload (jsonb), correlation_id, **caused_by_event_id** (nullable), **alarm_id** (nullable), + lineage | the semantic-occurrence log; a momentary event has null `alarm_id`, an alarm edge carries it; `caused_by_event_id` is the parent edge: the **durable, read-side** causation pointer (the live cycle guard at dispatch walks the NATS header chain, this is its persisted form), while the flat `correlation_id` threads the chain. A schedule fire is an event with `origin=scheduled` (no separate schedule table) |
| `event_type` | name, display_name, **payload_schema (jsonb)**, **scope** | the event-key registry; lets an event_rule promote a raw log line into a registered event. `scope` (template / org / official) works the same way as `datapoint_type` |

Related: [calculations](/architecture/calculations/) (the `event_rule` that produces events), [alarms and actions](/architecture/alarms-actions/) (alarms and the response layer), [datapoints](/architecture/datapoints/) (the data events read), and [the glossary](/architecture/glossary/).

---

# Expressions

URL: /architecture/expressions/

Omniglass expressions: one engine built on Expr and extended with Omniglass functions, behind every operator-authored expression leaf.

Expressions let an operator reshape and judge collected values in plain text wherever the platform needs a small computation, and there is exactly one language to learn for all of them. Omniglass evaluates these small operator-authored expressions in many places: an extractor's
`value` leaf, a step's `when` guard, an `event_rule`'s fire/clear
criteria, a `calc_rule`'s reduce escape, a rule's `scope` predicate, a view/list `filter`,
and a dynamic group's membership filter. All of these go through **one engine, Omniglass
expressions**, built on **Expr** ([expr-lang/expr](https://github.com/expr-lang/expr)) and
**extended** with Omniglass functions.

## One engine, built on Expr and extended

There is one expression engine. It is **Expr** at the core, chosen because it is
transform-oriented, fast, and sandboxable: it is expression-oriented with a rich built-in
function and operator set well suited to reshaping collected values (arithmetic, string ops,
slicing, mapping over arrays, null handling, things collection extractors do constantly like
`raw / 100.0`, `int(groups[1])`, `node.gain`, `groups[2] == 'true'`), it compiles to a fast
program, and it is straightforward to sandbox.

On top of that base we add **Omniglass functions**: helpers the platform needs that Expr does
not ship, including frame **`encode` / `decode`** and the output-format helpers (**hex /
ascii / base64**) that binary and raw-TCP protocols need to pack and unpack wire bytes. The
engine is **not pluggable**: there is one dialect everyone authors in, and a compiled program
is cached by `(source, env-shape)` so compile cost is paid once. Keeping it to one engine is
deliberate (YAGNI on multiple engines); where an expression is not even needed, prefer a
straightforward native path over reaching for the engine at all.

## Unit conversion: `convert(value, "<unit>")`

Stored values are always in their `datapoint_type`'s **canonical unit**, so an operator who
wants to author against a non-canonical unit converts at the expression. **`convert(value,
"<unit>")`** is the stdlib function for this: the **source unit is inferred** from the bound
datapoint's canonical unit, and the **target** is a registered unit that must be in the
**same family** (a compile error otherwise, since units only convert within one dimension).
The conversion itself comes from the [unit registry](/architecture/datapoints/#units-one-canonical-unit-per-key): the target's
`to_canonical` and `from_canonical` transforms, **affine** (a factor plus offset) for the
common case or an **Expr** for the rare nonlinear one. So an operator can write
`convert(value, "fahrenheit") > 100` while storage stays in canonical celsius: the threshold
reads in Fahrenheit, the firehose never changes unit. The function form is chosen over a
per-unit method like `value.toFahrenheit()` (which would need a method per unit); it is
data-driven and general, available wherever expressions run, including `event_rule` /
`alarm` criteria, `calc_rule` leaves, and view/list filters.

## Where expressions are used

| Site | Leaf | What it evaluates |
|---|---|---|
| extractor | `value` | reshape a located raw value into the typed datapoint value |
| step | `when` | the explicit branch guard (a false guard skips the step and dependents) |
| `event_rule` | `fire_criteria`, `clear_criteria` | open/close an alarm-paired event off a datapoint change |
| `calc_rule` | `reduce` (escape), `filter` | the named reducers (`worst` / `majority` / `average`, plus windowed `time_in_state` for SLIs) and the Expr escape, with per-input filters |
| rule | `scope` | which instances a rule fires for (the Expr scope escape) |
| views / list | `filter` | the structured-query predicate operators compose |
| dynamic group | membership `filter` | recomputed membership |

Because `filter` is the same engine everywhere, an operator who can write a group filter can
write a list filter and a rule scope. One language across the surface.

## In-scope bindings

Within a function run the engine environment exposes the documented namespaces: `$var:<key>`
(config/secret through the cascade), `$dp.<key>` (datapoints, emitted and readable for
branching), `$steps.<id>.*` (ephemeral scratch), `$event` (a listen payload), and the
extractor-local inputs a step prepares for its `value` leaf (`raw`, `groups`, `node`,
`item`). Rule and view contexts bind their own documented environments (the candidate
entity, the datapoint, the resource row).

## Safety

Expressions are **sandboxed**: no I/O, no network, no unbounded loops, bounded execution.
Operator-supplied configuration values are bound as **data in the environment**, never spliced
into expression text, so a hostile value is evaluated literally and never executed. Secret
fields rendered into a request are masked at interpolation time and never surface in a log
line, error string, or datapoint label.

---

# Files and blobs

URL: /architecture/files/

A searchable file handle over a content-addressed blob store, behind the Storage Gateway.

Files let an operator keep the opaque bytes that go with an estate, a firmware image, a config dump, a runbook, a packet capture, searchable and deduplicated, with a searchable **`file`** handle over a content-addressed **blob** store, behind the same Storage Gateway as everything else.

## Two layers: the file handle and the blob

- **`file`** is **indexable metadata**: name, content-type, size, `sha256`, tags. The searchable
  handle an operator references and finds (a firmware image, a device config dump, a runbook doc,
  a screenshot, a packet capture). It owns no bytes; it points at a blob by hash.
- **the blob store** holds the **bytes**, **content-addressed by `sha256`**. The hash is the key,
  so identical bytes are one blob.

Splitting them means search and inventory operations (list, filter, tag) never touch bytes, and
the same blob can back many file handles.

`file` tags reuse the `tag` **key** registry (the same tenant-wide governed vocabulary, so `category`
means the same thing on a firmware image as on a component, [config and credentials](/architecture/variables/)),
but bind as a **flat per-file set**: a file is not on the structural exclusive-arc, so there is no parent
to cascade from. The vocabulary is shared; the cascade is not.

## Content-addressing earns four properties

A blob is keyed by the hash of its bytes, not a UUID, which buys:

- **dedup**: identical bytes collapse to one blob (two operators uploading the same firmware, the
  same `raw` payload seen twice);
- **integrity**: the hash verifies the bytes on read, tamper-evident by construction;
- **immutability**: bytes cannot change without changing the key, like the append-only
  ground-truth logs;
- **backtest-stability**: an event referencing a hash still resolves under a backtest, because the hash is
  stable across a backtest.

So **rows reference a hash, never inline bytes.** Inline `bytea` would kill the hash-ref stability
property and bloat the firehose row. Small structured values (a datapoint, its labels) stay inline
in the row's jsonb; **large or opaque payloads become a blob hash-ref** (a dedicated **indexed**
`blob_sha256` column on the referencing row, so GC can probe it, not buried in jsonb): a big `log_datapoint`
body, and especially a **`collection.failed` event's raw** when the
wire payload is large (a full SNMP walk, a big HTTP body, a capture). Raw stays inline when small;
the size threshold is the switch.

:::caution[Open question]
The inline-versus-blob size threshold: one global cutoff, or per-kind (`raw` versus log body versus
operator upload).
:::

## Dedup is database-scoped

The blob key is **`sha256`**, the bare content hash. There is no `tenant_id`: isolation is
per-database (a database per tenant), so each tenant's blobs live in a separate database and dedup
is global *within* that database. One tenant can never detect another's content by hash collision,
because the blobs never share a store. The efficiency cost of not sharing bytes across databases is
the right price for physical isolation.

## Backends, swappable behind the gateway

The bytes live behind the Storage Gateway, so the backend swaps with no model change (the same
seam as the columnar and object tiers):

- **default: `pgblobs`** (a dedicated Postgres blob table), the single-binary,
  no-external-dependency story;
- **scale: an S3-compatible object store**;
- **disk** for local and dev.

The `file` and the hash reference are identical across backends; only `storage_ref` resolution
differs.

:::caution[Open question]
Chunking and streaming for very large blobs (firmware images, captures) on the `pgblobs` backend.
:::

## Reference-counted GC, not age-based

A blob is collectable **only when no live reference points at its hash AND a grace or retention
floor has passed**. Age-based GC alone is wrong: dedup means a blob uploaded long ago can be the
one a *recent* event references, so collecting by the blob's own age would orphan a live hash.
References come from:

- a **`file`** handle;
- a large `log_datapoint` body;
- a `collection.failed` raw hash-ref;
- an **attach event** (a `state_datapoint` or `audit_log` recording "this component was attached
  to this file at T").

References disappear two ways: a `file` is deleted, or a referencing **event ages out** (a
retention partition drop). So GC is **coupled to retention**: dropping a partition releases its
references, after which a now-unreferenced blob past the grace floor is collectable.

**Mechanism: index-probe mark-sweep by default.** GC enumerates blobs past the grace floor and,
for each, probes the indexed hash-ref columns on the referencing tables; a blob with no live
reference is collected. A **maintained refcount column or `blob_ref` table is a measured
optimization**, earned only if the per-blob probes profile too expensive (the same
ship-the-simple-thing discipline as the storage projections). The grace floor is the safety
margin against an in-flight reference, so GC never races a just-written event.

:::caution[Open question]
The grace-floor duration relative to the backtest window (long enough that a prospective backtest
re-deriving over the window cannot reference a collected blob).
:::

## Storage

The handle and the content-addressed bytes; the physical layout (the gateway, GC) is above and on [storage](/architecture/storage/).

| Table | Key columns | Notes |
|---|---|---|
| `file` | id, name, content_type, size, **sha256**, tags | searchable metadata handle; points at a blob by hash |
| `blob` | **sha256**, bytes / storage_ref, size, content_type | content-addressed bytes; dedup; backend pgblobs / S3 / disk behind the gateway; reference-counted GC |

---

# Glossary

URL: /architecture/glossary/

The authoritative glossary: every official term in the architecture, defined once.

This is the **authoritative glossary**: every official term in the architecture, defined once. The other pages introduce these terms in **bold** as the story reaches them; this is where you look any of them up.

| Term | Definition |
|---|---|
| **node** | Edge process (`--mode node`); pulls and runs tasks and commands over interfaces; carries placement, heartbeat, bound credential. |
| **node mode** | The Storage Gateway's placement-scoped write mode for node-driven ingest, between `scoped` and `system`: visibility is the node's placement-derived `visible_set`, not all-visibility. See [identity and access](/architecture/identity-access/). |
| **placement visible_set** | The owners a node may write, derived from its materialized worklist (the owners of the tasks assigned to it). A node's writes are confined to this set; an emitted owner outside it is an orphan / discovery candidate, never an authoritative write. See [collection](/architecture/collection/). |
| **function** | A trigger plus a DAG of steps, declared in a component template; the unit of edge collection. Triggered by a schedule (poll), incoming data (listen), or a command. See [collection](/architecture/collection/). |
| **flow** | A multi-step **action** (branching, parallel steps, waits); an escalation is the canonical case. See [alarms and actions](/architecture/alarms-actions/). |
| **task** | A node's unit of collection: **poll** (we ask) or **listen** (we wait), over a stateless or stateful (session) interface. Content-addressed. |
| **interface** | A connection to a component, declared once per protocol; transport stateless or stateful (to a session). |
| **interface_type** | Protocol-and-style registry (ssh, http, snmp, mqtt, webhook...); built-flag + param schema. |
| **session** | A stateful interface's live held-open connection; a current-state view over `session_log`. |
| **collection.failed** | The event emitted when a parse or validation rejects; carries the raw payload for diagnosis and backfill-after-fix. There is no stored telemetry table; raw is not otherwise persisted (a dev raw-mode taps it live). |
| **raw_sample** | An opt-in raw-retention policy, cascade-resolved on interface / task / template: `off` (default), `all`, or `1-in-N` (sampled). Short TTL, range-partitioned and cold-tierable like metric partitions. The kept window is re-parsable against the immutable function version, so a corrected extractor re-derives it; outside the window a wrong-but-conforming parse is forward-fixable only. Bounded, sampled, short-lived: not a telemetry table. See [collection](/architecture/collection/). |
| **datapoint** | An observation: a key's value on one owning entity at one time, with provenance + source + on-row lineage. Kinds: metric, state, log. |
| **metric_datapoint** | Numeric (float8) datapoint. Continuous, aggregatable. The firehose. |
| **state_datapoint** | Categorical/text/object datapoint. Discrete, dwell-measurable. [Config](/architecture/variables/) is keyed to one as its observed side. |
| **log_datapoint** | A component's own log lines; value = the line. A stream; also the holding pen for un-normalized occurrences. |
| **kind** | What a key is: metric, state, or log. Fixed per key at definition. |
| **key** | The identity of what is measured or asserted; registered in `datapoint_type`. |
| **canonical signal** | A registered, owner-agnostic measurement name (`power.state`, not `room.power`); one comparable signal across every vendor. |
| **owner / owner_kind** | A datapoint/event/alarm's subject, the exclusive-arc: `owner_kind` + the matching typed FK (`component_id`/`system_id`/`location_id`/`node_id`), or the singleton `global` (no FK), + CHECK. |
| **datapoint_type** | Registry for datapoint keys: name, `scope`, kind, value_type, unit, fusion_policy, validation. `scope` (template / org / official) decides where the name is unique: `(template_id, name)` at template scope, `name` at org/official. Every datapoint is typed by one (the FK is non-null). Promotes template -> org -> official by re-scope/re-point. |
| **canonical unit** | The one `unit` a `datapoint_type` stores in: stored values are always in it, so the firehose is single-unit and every threshold / calc / fusion compares like with like. Native unit is a collection-time fact (normalized in by the alignment value-transform), display unit a presentation fact (converted out on read); neither is stored. See [datapoints](/architecture/datapoints/). |
| **unit registry** | A `unit` registry grouped by family / dimension (temperature, data-size, bitrate...), each family one **canonical unit** plus alternates; each alternate carries a `to_canonical` / `from_canonical` transform, **affine** (factor + offset) or an **Expr** (the rare nonlinear case, dB). Official / org scoped. Drives both edge normalization and read-side display conversion. See [datapoints](/architecture/datapoints/). |
| **`convert(value, "<unit>")`** | The expression stdlib conversion fn: returns the value in a registered same-family unit (a compile error otherwise). Source unit inferred from the bound key's canonical unit, target looked up in the unit registry, so `convert(value, "fahrenheit") > 100` authors a threshold in F while storage stays C. Available wherever expressions run (event_rule / alarm criteria, calc leaves, list filters). See [expressions](/architecture/expressions/). |
| **scope** | A key's uniqueness-and-trust axis on `datapoint_type`: **template** (`(template_id, name)`, the template author's, local), **org** (`name` within the deployment, the operator's custom canonical), **official** (`name` globally, shipped with the distro). `official` = the top scope (folds in the prior `official` boolean). |
| **template-scoped / org-scoped** | A key minted at `scope=template` (local to one template, `(template_id, name)`) or `scope=org` (a deployment's own canonical, unique by `name`). The promotion ladder lifts template -> org -> official. |
| **event_type** | Registry for event keys: name, display_name, payload_schema, `scope`. Supports the same template / org / official `scope` as `datapoint_type` (a template can define a template-local event). |
| **provenance** | How we know a value: observed, calculated, intended. Per row. Declared intent is [config](/architecture/variables/). |
| **observed** | Measured from a component. On-row lineage: `source_rule` (+ version), the edge function. |
| **calculated** | Derived from other datapoints by a calc_rule. On-row lineage: `source_rule` (+ version), the calc_rule. Distinguished from observed by the `provenance` column. |
| **intended** | A command's declared effect, pending reconciliation. Lineage: the command `event_id`. Only commands set it. |
| **source** | Which sensor/path produced an observed value; distinct from provenance; enables multi-source rows + fusion. A `source` registry carries default weights. |
| **correlation_id (datapoint) / caused_by_event_id** | Nullable trace columns on the datapoint tables, orthogonal to the exclusive-lineage CHECK (not lineage pointers). A command propagates its originating `correlation_id` onto the adaptive-poll's observed datapoint, so the `event_rule` that fires off it inherits the id and the cycle-guard walk crosses the command -> device -> observed round trip. Distinct from the read-side [correlation id](/architecture/datapoints/) trace. See [datapoints](/architecture/datapoints/). |
| **perspectives** | The source-tagged observed rows for one signal: multiple sources reporting one value, all preserved; a reduce-on-read policy produces the effective value, while every perspective stays queryable. |
| **fusion_policy** | Per-key reduce-on-read **default/hint** for multi-source observations (mode + tie-break + source weights), not a mandate: a policy may default from the type but can be source-weighted, per-instance, or left to read time (keep all perspectives, decide on read). Applied on read. |
| **fusion** | Reading one effective value from multiple **perspectives** on a signal: same-key multi-source reduces by a policy (read-time, defaulting from the key's fusion_policy); cross-key/system-level = a calc_rule. Perspectives are always preserved. |
| **config** | The declared side of a canonical signal: an operator-set value keyed to a `datapoint_type`, reconciled against the observed datapoint via the template's get/set functions and a per-item `reconcile` policy. See [config and credentials](/architecture/variables/). |
| **credential** | An access secret with a structured shape, a pluggable `SecretProvider` (inline or external), and a lifecycle (refresh / rotation / expiry); read is `secret:read`-gated and every decrypt audited. Template-driven. |
| **variable** | A free interpolated value (a macro): `$var:<name>`, resolved global→template→instance down the cascade; org-keyed, not signal-bound, no observed side. |
| **drift** | The gap between config's declared value and its observed datapoint, on one signal key. |
| **reconcile** | Per-[config](/architecture/variables/) item policy for drift, one of three modes: `observe` (record drift, no alarm), `warn` (alarm at warning severity), `enforce` (call the set function to converge, alarm on set failure). Adopting the observed value as declared is a separate one-shot import action, not a mode. |
| **cascade** | Resolves the effective config / variable value (declared or template default): global, component_template, system_template, then the location / system / component trees (weight-free, pure depth); most-specific (deepest) wins. Type is not a layer (it resolves via a group filter); groups are placed by weight on the same specificity scale. |
| **segmented precedence key** | The cascade's precedence comparator as a segmented / lexicographic key `(segment_rank, depth, group_weight, creation_order)`, so a structural segment never overruns into another regardless of tree depth or stacked group weights. The presentation numbers (e.g. 0 / 100 / 300s / 400s) are presentation-only, not the comparison key. See [cascade](/architecture/cascade/). |
| **edge parse** | A function parses a raw payload into datapoints on the node, the edge half of [collection](/architecture/collection/). There is no server-side transform rule. |
| **calc_rule** | datapoint(s) to datapoint (calculated): cross-key / system-level derivation. (Same-key multi-source reconcile is the key's fusion_policy.) |
| **event_rule** | datapoint change to event: fire_criteria + optional clear_criteria (clear makes events alarm-paired); an optional `health` impact lets its alarm move the owner's health. No separate alarm or condition rule. |
| **for_clear** | A recovery sustain on an `event_rule`, mirroring the fire-side `for`: `clear_criteria` must hold for `for_clear` before the alarm resolves, so a source flapping at the cadence boundary does not churn open/clear. Default 0 (immediate). See [alarms and actions](/architecture/alarms-actions/). |
| **action_rule** | A subscription (Expr over events; alarms via edge events) wiring occurrences to actions. |
| **identity binding** | How a shared-API / multiplexed source's emitted rows bind to the right owner: a value->owner index `(datapoint_type, value) -> owner` (an identity arc on identity config), resolved in a cascade scope. Precedence: a declared identity config value wins, falling back to the observed identity datapoint sharing its key. See [collection](/architecture/collection/). |
| **discovery_rule** | observed data creates components/systems/locations + their identity config; carries the `official` boolean. Input is the orphan / unmatched stream (including out-of-placement labels), idempotent on re-discovery (re-seeing the same identity does not duplicate). See [collection](/architecture/collection/). |
| **event** | A discrete semantic occurrence the action layer reacts to. Keyed, point-in-time, owned via the arc. Not a datapoint. |
| **origin** | How an event arose: caught, caused, derived, scheduled. |
| **alarm** | One open-to-close incident: a stateful row driven by an event_rule's paired events; new row per open; keyed (event_rule, owner); optionally health-impacting while open. Not event-sourced. The ITSM anchor. |
| **dependency suppression** | Muting a child alarm whose owner's parent entity (on the exclusive-arc structural tree) is itself down, so one upstream failure does not emit N child pages. Expressible over the exclusive-arc tree. See [alarms and actions](/architecture/alarms-actions/). |
| **action grouping** | Coalescing alarms sharing owner / label / `correlation_id` into one action dispatch (one ticket, N members), so a storm is one notification, not N. See [alarms and actions](/architecture/alarms-actions/). |
| **severity** | An alarm's alert importance, set to a **severity level** by id; distinct from health (a different axis). Rules and action_rule predicates compare by level (resolved via the level's order). |
| **severity level** | A registry row: `id`, `label`, `color`, and an integer `order` (for comparison only). Official defaults ship spaced; an operator can add, relabel, or recolor. Carries the `official` boolean. |
| **action** | An ordered sequence of steps (`notify`, `command`, `wait`, `branch`). A single-step `notify` or `command` is the simple case; a multi-step shape (including remediate-verify-escalate) is a **flow**. |
| **command** | A `run`-action declaration in a component_template version (not a table); an instance is an `action` with `kind=command`. |
| **disagree(A,B)** | A condition operator comparing two provenances or sources of one key. Drift, config drift, conflict. Keeps the DAG. |
| **divergence** | Any two provenances or sources of one key that disagree. The universal anomaly signal. |
| **lineage (on-row)** | A derived row carries its own lineage; no execution table. The rule version is the backtest hinge. |
| **correlation id** | A read-side trace id threading one causal chain end to end: the originating event through every downstream event and action it caused (event -> alarm -> flow/action -> command). Built on the causation lineage; `alarm_id` links one alarm's open/clear events, the correlation id links the whole chain. DX/observability sugar, not a datapoint kind or a stored span subsystem. |
| **schedule** | Config: a recurring definition (cron/rrule + IANA tz + what it triggers). |
| **timer** | The clock singleton's pending-fire working set (schedule-tick / for-sustain / runbook-wait / watchdog); a Postgres table scanned by the leader-elected clock, each fire realized onto its lane; not history. |
| **component** | A deployed instance (device/app/service); owns datapoints; a variable-depth tree; pins a component_template_version; classified by component_type. |
| **component_type** | Classification + field schema + type-level defaults. Carries the `official` boolean. |
| **component_template / _version** | The device shape (collection, commands, datapoint_types, defaults, alarms); the **immutable version** instances pin. |
| **system** | A composition of components/subsystems (the service tree); pins a system_template_version; located at a location; classified by system_type. |
| **system_template / _version** | The system shape; the immutable version is the snapshot instances pin. Carries a frozen BOM: per role, its requirement (required canonical datapoints + commands) + health_role. |
| **template signature / attestation** | An optional author signature on a `template_version`, verified on import; authenticity (who authored it), distinct from the content-hash integrity (that it is unaltered). The hosted / marketplace path verifies signatures regardless of the self-host runtime stance. See [templates](/architecture/templates/). |
| **capability manifest** | A declaration on a template of which write-commands and credential shapes it exercises; shown and approved at `:apply`, and the gate behind which `latest` / channel auto-update for device-mutating templates requires an explicit operator re-pin. See [templates](/architecture/templates/). |
| **role requirement** | What a `system_template_member` declares for a role: the canonical datapoints and commands a member must provide (plus `health_role`). A component qualifies when its template aligns the required set; pairing filters to qualifiers and the API validates on assign. No allow-list of templates: declare what you need, any qualifying component fills it. See [templates](/architecture/templates/). |
| **location** | A place tree; classified by location_type; no template. |
| **global** | The singleton estate root: the top owner above every location where estate-wide health and KPIs roll up, and the top of the cascade. One per deployment, no FK. |
| **operational mode** | A cascade-resolved entity state: **active** / **maintenance** / **disabled**. Maintenance keeps collecting but suppresses consequences (no action dispatch, no drift enforce, no health rollup impact, no SLA count); disabled is the same suppression but also stops collecting (the Zabbix host-disable). Maintenance is windowed and audited. See [core entities](/architecture/core-entities/). |
| **decommission / purge** | Delete is **decommission** by default (soft delete: tombstone, retain history, re-commissionable, in-flight cleanup); **purge** is the privileged hard erase. The cascade does not delete members: a system delete unbinds members, an occupied location delete is refused (re-home first), a node delete re-places its tasks. See [core entities](/architecture/core-entities/). |
| **KPI** | A shipped derived datapoint (a calc / SLI) owned at system / location / global: availability (health over time) and the utilization family (occupancy, time, booking, ghost). An official default set with an escape hatch. |
| **SLI** | Service Level Indicator: a `time_in_state` calc datapoint over a window (e.g. `system.availability`). See [health](/architecture/health/). |
| **SLO** | Service Level Objective: the target config value the SLI must hold (availability >= 99.9%). See [health](/architecture/health/). |
| **SLA** | Service Level Agreement: meeting the SLO, an `event_rule` firing on breach; compliance over the window is itself an SLI. See [health](/architecture/health/). |
| **tag** | An operator `key: value` label. The key is a tenant-wide governed vocabulary (the `tag` registry; new keys need `tag:create`, autocompleted in the UI); values bind per entity (`tag_binding`) and resolve **union on key, override on value** down the cascade. See [config and credentials](/architecture/variables/). |
| **group** | A named set (component/system/location/principal), static or dynamic, weighted; a cascade overlay + access scope. A `principal_group` is the principal-subject case. |
| **health** | The first-class operational state of every entity (ok/degraded/down/unknown), carried as a *calculated* state_datapoint: `worst` over its open health-impacting alarms, rolled up the system tree role-aware. A model, not just a rule. See [health](/architecture/health/). |
| **health impact** | An optional `down`/`degraded` tag on an `event_rule`: while the alarm it opens is open, it moves its owner's health by that much. What makes health alarm-sourced. |
| **health_role** | A member's role in its system's health rollup (required / redundant / informational), declared on the system_template_member; the knob for the built-in role-aware rollup. |
| **health coverage / uncovered** | Whether any health-impacting `event_rule` resolves against an entity's datapoint_types. Covered + none firing + data fresh resolves `ok`; uncovered (no health-impacting rule resolves) resolves `unknown`, not falsely green. See [health](/architecture/health/). |
| **unknown reason** | A discriminator carried as metadata on health `unknown`, leaving the ordered domain (`ok < degraded < down`, `unknown` off-order) unchanged: `stale` (had data, went stale), `uncovered` (no health-impacting rule resolves), `no-data` (covered but never reported). See [health](/architecture/health/). |
| **baseline reachability alarm** | A health-impacting reachability alarm seeded per collected component (via the collection / template default), so a freshly-collected device is covered immediately and resolves `unknown -> ok`/`down` on first poll; bare `unknown(uncovered)` is then the rare honest "you have not told me what failure looks like" state. See [health](/architecture/health/). |
| **view** | A named query returning a uniform `{columns, rows}`; the read side, executed through the scoped gateway. |
| **Storage Gateway** | The single door to the database; every read and write goes through it, and scope is injected here. |
| **audit_log** | Who-did-what ground truth; one row per operator write, same-tx; the lineage target for operator writes, including config changes. |
| **session_log** | Connection-lifecycle transitions (node-reported, diagnostic). |
| **internal_log** | Platform self-narration (startup, reconcile, migration, node-reg, config-sync). |
| **ground truth** | Immutable append-only records: log_datapoint, audit_log, session_log, internal_log. |
| **principal / role / grant** | IAM subject (kind `human` / `service` / `node`; identity is an opaque uuid, never a name); an RBAC capability set crossed with a scope. The base `principal` holds identity + kind only; a human's `display_name` lives on the `human` per-kind table. A `principal_group` is a group of principals used as a grant subject. An AI tool acts via OAuth as a `human` / `service` principal (first-class agent identity is deferred, [identity and access](/architecture/identity-access/)). See [identity and access](/architecture/identity-access/). |
| **secret:read** | The IAM permission to read a credential in plaintext; gated per role, and every decrypt is audited. |
| **file / blob** | Searchable metadata over content-addressed bytes (pgblobs/S3/disk); dedup. |

---

# Groups

URL: /architecture/groups/

Named sets of component, system, location, or principal: static or dynamic membership, weighted, a cascade overlay and an access scope.

A group lets an operator gather entities that the structural trees keep apart, so you can configure or grant access to "all AV displays" or "everything in this pilot" by attribute or by hand. A **group** is a named set of entities that cuts
across the structural trees. The structural tree handles config by position and kind; groups handle
config by attribute or by a hand-picked set. One "set of entities" primitive serves two jobs: a
[cascade](/architecture/cascade/) overlay and an [access](/architecture/identity-access/) scope, which
an anonymous predicate never could.

## What a group is

A group:

- is **component / system / location / principal** kind (matching the structural levels, plus
  principals for access);
- has **static** membership (an explicit list) or **dynamic** membership (a filter, re-evaluated live
  as attributes change, so a device leaves the moment it stops matching);
- has a **weight** (its specificity on the shared scale, see *Placement*; the only weights in the
  system);
- carries **variable / tag / rule** bindings, with the same per-kind combinators the cascade uses;
- is also the unit of **access control**: a visibility / permission scope (see
  [identity and access](/architecture/identity-access/)).

| Table | Key columns | Notes |
|---|---|---|
| `group` | id, kind (component/system/location/principal), membership (static list or dynamic filter), **weight** | cascade band and access scope ([cascade](/architecture/cascade/), [identity and access](/architecture/identity-access/)) |

## Placement: one specificity scale

Structural layers auto-derive a specificity from position, weight-free, and the operator never tunes
it: `global` lowest, then the templates, then the location / system / component trees by depth, then
the entity's own **instance** at the ceiling. A **group's weight is its specificity on that same
scale**, so a group sits wherever its weight lands relative to the structural bands: a high weight
beats deployment (a must-apply override), a low weight loses to it (a default that deployment
overrides). The instance ceiling beats any group; equal specificity breaks by creation order. A typed
group applies at its own level: a component-group to components directly; a system-group reaches a
component **through the system layer** of its cascade.

## Multiple membership

An entity belongs to a flat **set** of groups. Collect all their bindings and fold by specificity
(weight): highest wins for variables and tag-values; rules accumulate, with weight resolving any
add-vs-suppress conflict; equal weights break by creation order. There is no second precedence axis.
So however many groups an entity is in, the group band collapses to one weighted list on the shared
scale, fully predictable, and the [resolve view](/architecture/cascade/) names the winner.

## No nesting

Groups do not contain groups. A dynamic filter already expresses a union (`type in (codec, display)`)
and multiple membership covers the rest, so nesting would earn only a narrow "DRY union of static
sets" case, not worth its transitive-membership and cycle-guard cost. Whether to add it is an open
question, gated on that case actually biting.

## Types are not layers

A `_type` (device/app, AV-System, room) is a classification attribute, resolved by a **group** filter
(`type == X`), never a tree position. The tree is structural; attributes are groups. This is the bridge
from the structural [cascade](/architecture/cascade/) to type-based policy: instead of a type layer in
the tree, you author a dynamic group filtered on type and place it by weight.

## What groups are for

In operator terms, the same user stories the cascade serves, the group-specific ones:

- **A fleet-wide fix that auto-clears.** Cisco firmware 11.2 has a memory leak, so I make a dynamic
  group `model == "Room Kit Pro" && firmware < "11.5"` at high weight, slow its poll and suppress the
  false high-memory alarm; the 23 affected codecs across 6 floors get it at once, and each drops out
  the moment it upgrades. *(dynamic group above deployment, live membership, rule suppression)*
- **A broad policy as a floor, not a ceiling.** I put baseline stricter thresholds on a low-weight
  "PCI-scope" group, so a lab on Floor 3 that needs its own values still wins; the policy is a default
  the specific deployment overrides. *(low-weight group below deployment, the shared specificity scale)*
- **A hand-picked set no filter can name.** I drop the 5 executive briefing rooms into a static "Exec
  Rooms" group for premium escalation, and grant the exec-support team visibility to that same group.
  *(static membership; groups double as the access scope)*

---

# Health, KPIs, and service levels

URL: /architecture/health/

Health as a first-class state rolled up to a global estate top, the KPIs every estate should track (availability, utilization), and SLI / SLO / SLA.

Health gives an operator the one answer that matters most, "is this system working right now?", as a first-class state on every entity that rolls up the service tree, not something you have to assemble out of raw rules. Omniglass is **opinionated about health**: it is a
**first-class capability**, not a byproduct of a customizable rules engine. The *model* is
deliberate (an ordered state, a health impact on alarms, a role-aware rollup up the system tree);
the *carrier* is the ordinary datapoint pipeline, so health is stored, queried, trended, and alarmed
on like any other signal, with no parallel subsystem.

## Health is a first-class model, carried as a datapoint

`health` is a built-in **state of every structural entity**, not a datapoint type an operator
happens to author. But its **representation** is an ordinary derived `state_datapoint`
(provenance=calculated), so it inherits the whole pipeline for free: stored, queried, projected to
current-value, trended, and able to raise alarms. The model is opinionated; the carrier is reused.
There is **no separate health store, no `health_event`, no parallel service subsystem**, because
pulling health off the datapoint stream is exactly the Zabbix services bolt-on this design rejects.

What is **first-class about the model** (not ordinary):

- **Intrinsic.** Every component, system, and location *has* health, automatically, moved by its
  open health-impacting alarms when it is covered and `unknown` until then (below). No entity is
  health-less, and none waits on someone authoring a rule.
- **A built-in ordered domain** (below), not a user-defined value type.
- **Alarm-sourced.** Health is computed from open **alarms**, not measured or extracted (below).
- **A built-in role-aware rollup** up the structural tree (below): engine behavior, not an editable
  reducer.

What is **reused from the carrier**: storage, history, `current_value` projection, the SLI
(`time_in_state` over health history), alarming (an `event_rule` on `health`), and backtest. An
operator who understands datapoints and alarms already understands health.

## Health is built from alarms

Health is a **state**, and a state must be built from something stateful. An **event is a stateless
edge**: it just happens, so health cannot hang off events. It hangs off the **alarm**, the stateful
PROBLEM that holds open as long as its condition does (the Zabbix-trigger model). The chain:

> datapoints -> an **`event_rule`** decides "something we care about" -> an **alarm** opens -> the
> alarm *optionally* carries a **health impact**.

An `event_rule` (the alarm's definition) declares an optional **`health` impact**: `down`,
`degraded`, or none (the default). Most alarms carry none, because a lamp-hours warning is worth an
alert but does not down the device; the few that do are the device's actual failure conditions.

- A **component's health** is the **worst** over its **open health-impacting alarms**. Reachability
  is just one such alarm (an "unreachable" trigger, impact `down`): everything is an alarm, with no
  parallel datapoint-calc.
- Health is **ack-independent**, because ack is not close. An alarm stays open (acknowledged) while
  its condition holds; only the **clear event** (the data recovered) closes it. Acking annotates; it
  never makes a down room look healthy.

So "twelve alarms open, system fine" falls out for free: if none of the twelve is health-impacting,
health never moved.

### No firing alarm splits by coverage

When **no** health-impacting alarm is open, the honest answer depends on whether anything would have
caught a failure. **Coverage** is the question "does any health-impacting `event_rule` resolve
against this entity's datapoint_types?":

- **covered, none firing, data fresh -> `ok`.** Something measures what "down" means here, it is
  watching, and it is silent: genuinely healthy.
- **not covered -> `unknown`.** No health-impacting rule resolves, so nothing here knows what failure
  looks like. Reporting `ok` would be a false green; the entity is `unknown`, not healthy.

`unknown` carries a **reason** discriminator as metadata, so an operator can tell a measurement gap
from a coverage gap. The ordered value domain (below) is **unchanged**: `unknown` stays off the
order, and the reason is descriptive metadata, not a new state. The reasons:

- **`stale`** -- the entity had data and it went stale (the no-data machinery in
  [time](/architecture/time/)).
- **`uncovered`** -- no health-impacting rule resolves against its datapoint_types (this concern).
- **`no-data`** -- a rule covers it, but it has never reported, so the rule has nothing to evaluate.

To keep `uncovered` the rare, honest resting state rather than the default, every **collected
component** is **seeded with a baseline reachability health-impacting alarm** (an "unreachable"
trigger, impact `down`, via the collection / template default). A freshly-collected device is
therefore covered the moment it is collected, and resolves `unknown -> ok` or `unknown -> down` on
its first poll. Bare `unknown(uncovered)` then means exactly one thing: "you have collected this, but
you have not told me what failure looks like beyond reachability," a deliberate gap to fill, not a
silent hole.

## The health-state vocabulary

Health is a **state** with a small **fixed ordered** value domain, declared as its datapoint_type:

```text
ok  <  degraded  <  down            unknown = no signal (not on the order)
```

It is **distinct from severity**. Severity is alert importance (a named level by id,
[alarms and actions](/architecture/alarms-actions/)); health is entity operational state. They map
(a `down` health typically raises a `high` or `disaster` alarm) but they are different axes, so health
is not a severity level. The order exists so the `worst` reducer can pick the worst member;
`unknown` is off the order (carrying a reason of `stale`, `uncovered`, or `no-data`, above) and is
surfaced, not silently folded as `down`.

## Rollup up the structural tree

Health composes recursively, always **up the structural tree** (which has no cycles):

```text
component health   (worst over the component's own open health-impacting alarms)
   -> system health      (role-aware rollup of members + the system's own health-impacting alarms)
      -> location health  (role-aware rollup of its systems)
         -> global health  (rollup of every location: the estate top)
```

The headline is the **system's `health`**, the rollup of its members, the service view operators
care about; the **`global`** rollup is the estate-wide view leadership cares about.

**One owner-agnostic `health` key.** Following the measurement-not-owner naming model
([datapoints](/architecture/datapoints/)), there is a single registered `health` datapoint_type; a
component owns its own `health`, a system the rollup of its members, a location the rollup of its
systems, and the singleton **`global`** owner the estate-wide rollup (the top of the tree, above
every location). The owner gives the reading its level, so the same key flows up the tree. The calc
engine routes a changed `health` by the owner's level against each rule's source (a component's
`health` only feeds the system rollup, a system's only the location, a location's only the global),
so the shared key never cross-triggers.

## System health: rollup plus the system's own alarms

A system's health is the **worst of two inputs**:

1. the **role-aware rollup** of its members' health (below), and
2. the system's **own open health-impacting alarms**, raised by **system-scoped `event_rule`s** over
   member data.

The second input is what lets a system see what no single component can. The canonical case: a
display sitting on **input 2** is a perfectly normal state *for the display* (no alarm), but in a
specific room it means the wrong source is on screen. A system-scoped event_rule ("this display must
be on input 1") opens a **system-level alarm** with a health impact, dropping system health while the
display's own health stays `ok`. The system template owns the conditions only the system cares about;
the component stays generic.

The same discipline governs **SaaS and vendor status** (a UCC platform like Zoom, mapped to
system-owned datapoints, [shared-API collection](/architecture/collection/)): a vendor's reported
"offline" or "in a meeting" is an *observed signal from one source*, not a verdict on the room. Author
the system condition over it, **corroborated** where you can (against the codec, occupancy), rather
than trusting it. The vendor's opinion is an input to health, not health itself, the same way no
single component's state is.

This is the symmetry: **component-level events and alarms** and **system-level events and alarms**,
the same machinery on each arc, distinguished by which entity owns the arc (the exclusive-arc owner,
[alarms and actions](/architecture/alarms-actions/)).

The acyclic discipline: an alarm that *feeds* health is impact-tagged; the "system is down" alarm
that fires *off* health (an `event_rule` watching the `health` datapoint) carries no impact. Inputs
are tagged, consequences are not, and health rolls up only, so there is no loop.

## Role-aware rollup: built-in, tuned by role

The rollup is **engine behavior, not an editable rule**. Health is opinionated, so the reducer is
built in and the same everywhere; what an operator tunes is **roles and thresholds**, never the
reducer's guts. Each member carries a **`health_role`**, and the rollup respects it:

- **required** member `down` -> system `down`;
- **required** member `unknown` -> system `unknown` (the system cannot be called healthy when a
  member it depends on is unmeasured);
- a **`stale`** required member folds to `unknown` here under the lost-visibility policy (lost
  visibility, so the system goes `unknown`), or keeps its last value's health under the
  last-value-valid policy (per the datapoint_type's staleness tolerance, [time](/architecture/time/));
- **redundant** member `down` -> system `degraded` (only `down` if *all* redundant peers are down);
- **informational** member -> does not affect system health, including an `informational` member that
  is `unknown` (an unmeasured member that never mattered does not sink the parent).

:::caution[Open question]
The exact `redundant`-group semantics when a system has several independent redundant sets (per-set
quorum versus one pool), and whether `degraded` is one rung or graduated.
:::

A redundant deployment needs exactly this: a failed backup mic must not down the room. **Member
`health_role`** (required / redundant / informational) is declared on the **system_template_member**
(the frozen BOM, where the system template declares each role with its requirement and health_role), not on the
component itself, since the same device can be required in one system and redundant in another. The
instance assignment is the `system_member` row; the `health_role` rides the frozen template version,
so it never expires under an instance. It is shared with KPI calcs.

For the rare case the role logic still gets wrong, an **Expr override at the system-template level**
over the member health states is the escape hatch, reached for rarely.

:::caution[Open question]
Whether a system-template binding may narrow the built-in rollup (a scoped-precedence refinement),
or only the roles and the Expr override are the knobs.
:::

The rollup **runs over the calc engine** (no parallel evaluator) and is **seeded for every system,
location, and the global top**, so health rolls up out of the box without per-system authoring:
`system-health` reduces a system's members' `health`, `location-health` a location's systems',
`global-health` every location's into the estate top (each treated as required above the system
level: any down child sinks the parent). It is the model's behavior, not a rule operators rewrite.

:::caution[Open question]
A single `required` `unknown` member already makes the system `unknown` (above). The remaining
question is the all-`unknown` system with no forcing required member (every member `unknown`, or only
`informational` ones reporting): gray, or the parent's prior state. The `unknown` versus `stale`
distinction itself is settled in [time](/architecture/time/).
:::

## SLI: indicator over a window

A **Service Level Indicator** is a `time_in_state` calc over a window (`time_in_state(s)` = the fraction of the window the entity held state `s`, derived from the health-history transitions), emitted as its own datapoint
(the temporal reducer, [expressions](/architecture/expressions/)):

```yaml
# availability = fraction of the last 30 days the system was ok
source: { datapoint: health, over: 30d }
reduce: time_in_state
when: "value.ok / value.total"        # an Expr leaf shapes it into a ratio
# -> emits system.availability
```

An SLI is therefore just another derived datapoint, queryable and trendable like any other.

## SLO and SLA: the target, and meeting it

Three terms, not two. The **SLI** is the *measured indicator* (the `system.availability` calc above).
The **SLO** (Service Level Objective) is the **target**: the number you intend to hold
(availability >= 99.9%), a [config](/architecture/variables/) value on the entity or template, not
machinery. The **SLA** (Service Level Agreement) is **meeting the SLO**: an `event_rule` fires when
the SLI breaches the target, and compliance over the contractual window (the fraction of the period
the SLO held) is itself an SLI.

```yaml
event_rule:
  scope: 'system.template == "standard-boardroom"'
  datapoint: system.availability
  when: "value < $var:availability.slo"   # the SLO target, a config value
  severity: high
```

So the target is config (the SLO), the breach is an event/alarm (the SLA edge), and compliance is a
calc (an SLI over the SLA). No new machinery. Windowing is the SLI's concern: a **rolling** window
(last 30d) for trends, or a **calendar** window (the billing month) for a contractual SLA; the
calendar reset is the one piece that leans on the time primitive.

:::caution[Open question]
The SLA calendar-window boundaries and timezone, co-designed with the time primitive.
:::

## KPIs: what every estate should track

A **KPI** is a derived datapoint (a calc or SLI), registered as a canonical `datapoint_type` and
owned at the level it describes (system, location, or **global**). It is no new primitive: a KPI is a
shipped calc the same way health is. Omniglass ships an opinionated **default set** so the data is
there out of the box, with the escape hatch to author your own.

**Availability** is health over time: the SLI `time_in_state(ok)` above. Health is the substance,
availability is its ratio, so it ships free at every level up to global.

**Utilization** is the AV-native family, over occupancy and booking data:

- **occupancy** -- current people / capacity (an instant ratio);
- **time-utilization** -- used vs idle minutes;
- **booking-utilization** -- booked vs unbooked minutes;
- **ghost** -- occupied vs booked: booked, but nobody showed (the wasted-room signal).

Both inputs are **ordinary components**, no special integration: an occupancy sensor (a component
template emitting `occupancy.*`) and the booking system (a component template whose interface is the
calendar / room-booking API, emitting `booking.*`). The KPIs are then `calc_rule`s over those
datapoints, owned at room / system / location / global like any rollup. A booking API is just an
interface; a ghost meeting is just `occupied < booked`.

The point is a small, opinionated set of the measurements every estate should watch, computed and
rolled up for free.

:::caution[Open question]
The full default KPI set and each one's exact calc. Availability and the utilization family are
named, but the precise reducers and windows are unsettled.
:::

:::caution[Open question]
The `occupancy.*` and `booking.*` canonical signals, and the occupancy-sensor and booking-system
component templates that feed the utilization KPIs.
:::

## Why this is the Zabbix service tree, done right

Zabbix bolts services, SLA, and the service tree on as a separate subsystem. Omniglass does the
opposite: health is **first-class but not separate**. The model is opinionated (an intrinsic state,
health-impacting alarms, a role-aware rollup) and it rides the one datapoint pipeline, so the
**system tree is the service tree**: health is a datapoint, the rollup is built-in, the SLI is a
calc, the SLA is an alarm. One model, composed, instead of a parallel feature. An operator who
understands datapoints and alarms already understands health and SLAs.

---

# Identity and access

URL: /architecture/identity-access/

How principals authenticate, how grants combine roles with scopes, and how the app enforces capability at the route and ABAC scope in the Storage Gateway.

Identity and access is how an operator controls who may call the platform and which slice of the estate each caller can see and act on, enforced entirely in the app so "forgot to filter" cannot happen. Enforcement is **two in-app layers**: the capability check (`<resource>:<action>`) runs as **API route middleware** before the handler, and the **ABAC scope** filter is injected by the **Storage Gateway** (the only path to the database), where a row-level filter holds by construction. Scope is built on the cascade's groups ([cascade](/architecture/cascade/)). This doc says what IAM **is**.

## The model in one breath

A **principal** is the polymorphic subject of authN/authZ. Identity is the principal's opaque uuid, never an email or name. Each principal has one or more **credentials** (how it authenticates). Each principal holds zero or more **grants**, each a `(role x scope)` pair: the role contributes the verbs, the scope contributes the entities. Permissions are **additive** across grants. The API middleware checks RBAC capabilities before the handler runs; the Storage Gateway injects ABAC scope on every query.

## Principal kinds

A principal carries a `kind` value; the same role machinery works across all kinds. Identity is uniform; authN methods and per-kind domain attributes differ.

| kind | what it represents | authN |
|---|---|---|
| `human` | a person | local password + session, OIDC, SAML |
| `service` | scripts, integrations, SDKs, bots | bearer token |
| `node` | the edge daemon running in the field | NATS JWT/nkey credential |

**AI acts as a user; a first-class `agent` principal is deferred.** An AI tool authenticates via **OAuth as a `human` or `service` principal** and acts with exactly that principal's grants, no separate identity. A dedicated `agent` principal kind may be added later; it is not in the initial architecture. Everywhere else AI is simply a scoped, audited user ([AI](/architecture/ai/)).

Each kind that needs structured domain attributes gets a **1:1 per-kind table** linked by `principal_id`: `human`, `service`, and `node`. The base `principal` table holds identity + kind only; the per-kind tables hold the rest, including the kind's human-facing label (a human's `display_name`, a service's label, the node's name).

## Credentials

One `credential` row per authN method per principal. A principal can hold many (a human with a password + an OIDC link; a service with a rotating token). `(method, identifier)` is the lookup key.

| method | identifier | secret_hash | who uses it |
|---|---|---|---|
| `password` | `principal.id` (uuid) | argon2id of the password | humans |
| `oidc` | `iss\|sub` (issuer + subject) | null (IdP verifies) | humans |
| `token` | `sha256(token)` | null (identifier IS the verifier) | service |
| `nats` | nkey public key | null (NATS verifies the signed nonce) | nodes |

The password identifier is the `principal.id` (not the username), so a username change does not invalidate the credential. Service bearer tokens are 256-bit `crypto/rand` payloads with a human-readable prefix (`ogs_`) for secret-scanners and audit clarity; the server only ever stores `sha256(token)`. Cleartext is returned exactly once at mint time. A `node` enrolls with a per-tenant **NATS JWT/nkey** instead: the credential row stores the nkey public key, NATS verifies a signed nonce, and the JWT carries the node's subject permissions (its placement-derived `visible_set`, see [The node path](#the-node-path)).

:::caution[Open question]
OIDC delegates MFA to the IdP; whether to add a local-account TOTP path for installs not on OIDC is
undecided.
:::

## Subjects

`human`, `service`, `node`, and **`principal_group`s**. Roles attach to principals regardless of kind; the same `principal_grant` rows mean the same thing whether the principal is a person, a service, a daemon, or an AI tool acting as one.

## Group kinds

The `group` membership mechanism (static list or dynamic filter) is shared across kinds, but the kinds are kept **distinct** (not one polymorphic primitive yet, because their usage differs):

- **`component` / `system` / `location` groups** are **entity-groups**: they carry config bindings (the cascade) and serve as ABAC **scopes**.
- **`principal_group`** is a collection of principals (SCIM-synced or local): a grant **subject**, carrying no config. It groups over principals, not just humans (members can be any principal kind); in practice it is humans synced from the IdP.

So `group` appears on **both sides of authZ**: `principal_group`s as subjects, entity-groups as object scopes.

:::caution[Open question]
Whether to unify the group kinds into a single polymorphic `group` primitive; revisit if their usage
converges.
:::

## Roles and the role hierarchy

A role is a **capability set**: permissions per `(resource, action)`. Roles live in a `role` table keyed by a globally unique `id`, each carrying an **`official` boolean**:

- **`official: true`**: ship-with the binary, seeded via the boot phase. A release can patch a default permission via `ON CONFLICT DO UPDATE` on the seed.
- **`official: false`**: operator-created via the IAM API.

**No overrides**: a role id is globally unique across both kinds (the create paths refuse an `official: false` role whose id matches an `official: true` one, and the seed phase fails-safe with a loud warning if it would collide with an existing operator role). This is a deliberate divergence from `datapoint_type` (where an org-scoped key may shadow an official one of the same name): role override risks lockout with no compensating use case, so a role id resolves to exactly one row.

### The four official roles

```
viewer    <-  operator  <-  admin  <-  owner
```

Linear inheritance (transitive): each role's effective permissions are the union of its own permissions and all transitively-inherited roles' permissions.

| role | what it can do |
|---|---|
| `viewer` | Read every operator-facing resource within scope. |
| `operator` | viewer + create/update on components, interfaces, tasks, rules, config; ack/snooze/resolve alarms. |
| `admin` | operator + delete on managed resources + manage IAM (principals, credentials, grants, custom roles) + curate registries (`<registry>:create`). IAM management is meaningful only from an `@ all` grant (a scoped `admin @ subtree` keeps the operator powers within its subtree but gets no IAM); registry curation is a plain capability, so a custom role can carry `<registry>:create` alone for a non-admin curator. Cannot delete `official` roles. |
| `owner` | god mode (`*:*`). The unkillable role: at least one active `owner@all` grant must exist at all times (enforced by DB trigger). The bootstrap creates the first owner; only an owner can revoke another owner. |

### Custom roles

Operators create `official: false` roles via the IAM API with a chosen permission set, optionally inheriting from `viewer` (or any other role). Inheritance rules:

- An `official: true` role may inherit only from other `official: true` roles (enforced at seed time).
- An `official: false` role may inherit from any role.

Because of the no-override rule, `inherits: [viewer]` is unambiguous (every id resolves to exactly one role).

### Permission format

Permissions are strings: `<resource>:<action>`. One entry per resource per role; actions are comma-separated; wildcards stand alone.

```
component:read                <- single action
component:create,update       <- multiple actions, one resource
alarm:ack,snooze,resolve      <- domain verbs alongside CRUD
datapoint_type:create         <- a registry curator capability (tag/unit/event_type/severity_level/source likewise)
principal:*                   <- any action on this resource
*:*                           <- any action on any resource (owner only)
```

Actions are HTTP-aligned: `read` (GET), `create` (POST), `update` (PATCH/PUT), `delete` (DELETE), plus resource-specific verbs (`ack`, `snooze`, `resolve` for alarms; future kinds add their own). The aggregate `write` does not exist as an alias; `*` is the wildcard and reads as honestly.

Inheritance composes permissions **per resource by union of actions**:

```
parent: component:create,update
child:  component:delete
child effective:  component:{create, update, delete}
```

There are no negative permissions. To narrow a parent's capability set, define a fresh role rather than inherit.

:::caution[Open question]
Whether to add custom-role permission granularity beyond `(resource x action)` (e.g. a Zoom-style
data-claim suffix `<resource>:<action>:<modifier>`), pending a use case.
:::

## Authorization: grants = role x scope

A principal holds grants in `principal_grant`. Each grant is a `(role, scope_kind, scope_id)` triple. A principal can hold many grants; they are **additive**:

```
canDo(P, action, E)  iff  exists grant g in grants(P) such that
                            action in perms(g.role)
                            AND E in expand(g.scope_kind, g.scope_id)
```

**Action and scope bind per grant, not globally.** The `action` and the `E`-membership test are satisfied by the **same** grant `g`. It is **not** sufficient that the action appears in *some* grant and the entity in *some other* grant: a principal with `operator @ group-A` (which carries `alarm:ack`) and `viewer @ all` (read-only) can ack only alarms whose component falls in `group-A`, never estate-wide, because no single grant pairs `ack` with an all-scope. Flattening permissions into one global set and entities into one global visible set is **not** equivalent to `canDo` and over-permits; the enforcement layers below preserve the per-grant binding.

So the same role applied at different scopes composes naturally; mixing roles (e.g., `operator @ HQ` + `viewer @ all` for a site lead who needs read-only visibility outside their primary site) is the intended pattern. Grants from `principal_group` memberships compose the same way.

### Scopes

| scope_kind | scope_id | expansion |
|---|---|---|
| `all` | null | every entity in the database |
| `location` | location id | subtree(L): L + its systems + their components + descendants |
| `system` | system id | subtree(S): S + its components + descendants |
| `component` | component id | exactly { C } |
| `group` | group id | members(G) at resolution time (dynamic groups re-resolve) |

`expand` realizes a scope to a **bound id set** the gateway injects as a parameterized `owner IN (...)` predicate (or a closure-table join for deep trees), never string-built. The structural-tree walk carries a cycle guard, and the set is **fleet-size-bounded** (entities), so it stays an indexed membership filter.

`scope_kind` is enumerated (`all`, `location`, `system`, `component`, `group`); adding a new kind requires a schema change (CHECK constraint) and a new case in the gateway's `expand` function. `scope_id` is operator data.

:::caution[Open question]
Whether a scope may mix include and exclude (e.g. "all except group X").
:::

## Visibility cascades down the structural tree

A scope of entity E includes E **and everything structurally beneath it** (a location -> its systems -> their components -> their datapoints and alarms). The visible set is **parameterized by action**: `visible_set(P, action)` = the union, over **only the grants whose role carries `action`**, of each scope entity plus its descendants. There is no single global visible set. **`:read` is an implicit floor on every grant**: holding any grant on an entity confers `read` on it, so `visible_set(P, read)` is always the widest set and `visible_set(P, action)` is always a subset of it. The floor is realized as a **capability injection at role-index build** (next): every `<resource>:<action>` permission implies `<resource>:read`, so the implied reads are present in the fast-reject union, in `canDo`'s `perms`, and in `/auth/me.permissions`, not only in the scope layer. A verb-only role (`alarm:ack` without `alarm:read`, no `viewer` inheritance) is therefore **not** hard-403'd on the read. The asymmetry runs one way only: a principal can **read** an entity it cannot **act** on (in `visible_set(P, read)` but outside `visible_set(P, ack)`, via a read-only grant), but never the reverse. So there is no "actionable but not readable" case, and the status split below stays three-way. Dynamic-group scopes recompute as membership changes. Each per-action set is bounded by **fleet size (entities)**, not data volume.

## The owner invariant

At least one active `owner @ all` grant must exist at all times. Enforced as a deferrable constraint trigger in Postgres (fires at `COMMIT`, so the swap-owners pattern works in one transaction):

```
BEGIN;
  INSERT INTO principal_grant (... role='owner', scope_kind='all' ...);  -- new owner
  DELETE FROM principal_grant WHERE principal_id=<old> AND role='owner';  -- old
COMMIT;  -- trigger fires here, sees the new grant, passes.
```

Attempting to remove the last owner (by grant delete, principal delete, principal disable, or role change) raises a check-violation. The Gateway translates this into a 400 with a clear remediation message.

## Enforcement: where each check lives

There is **no RLS and no direct database access** (no PostgREST). The **Storage Gateway is the only door to the database** and the API is its only caller, so authz lives entirely in the app. A targeted mutation passes three checkpoints in order: the **capability fast-reject** at the route, the **`canDo` decision** in the handler, and the **per-action scope plus audit** injected by the gateway. Each is one code seam:

```d2
direction: down
classes: {
  node: { style.border-radius: 8 }
  group: { style.border-radius: 8 }
}
client: "Client: SPA / CLI / MCP" { class: node }
api: "API process (one binary)" {
  class: group
  mw: "Route middleware\nrbac.Require('alarm:ack')" { class: node }
  mwq: "action in\nANY grant?" { class: node; shape: diamond }
  e403a: "403 capability missing" { class: node }
  handler: "Handler" { class: node }
  hq: "canDo(P, ack, X) ?" { class: node; shape: diamond }
  e403b: "403 cannot act on target" { class: node }
  e404: "404 non-disclosing" { class: node }
  mw -> mwq
  mwq -> e403a: "no"
  mwq -> handler: "yes: fast-reject passed"
  handler -> hq
  hq -> e403b: "readable, not ack-scope"
  hq -> e404: "out of read-scope"
}
gwbox: "Storage Gateway: the only DB door" {
  class: group
  gw: "inject visible_set(P, ack)\nplus audit_log in one txn" { class: node }
  db: "Postgres" { class: node; shape: cylinder }
  ok: "200 plus action row" { class: node }
  gw -> db: "parameterized predicate"
  db -> ok: "1 row changed"
}
kv: "NATS KV cache\ngrants plus role index\nCDC-invalidated" { class: node; shape: cylinder }
client -> api.mw: "POST /alarms/X:ack"
api.hq -> gwbox.gw: "yes"
gwbox.db -> api.e403b: "0 rows: backstop fires"
kv -- api.handler: "composed per request" { style.stroke-dash: 4 }
kv -- gwbox.gw { style.stroke-dash: 4 }
```

The capability check is **necessary not sufficient** (it only rejects), the `canDo` check is the **authoritative decision**, and the gateway predicate is the **enforce-by-construction backstop**: handler and gateway return the same status for the same input, so a forgotten handler check cannot leak a write. The detail of each:

- **Capability (RBAC) in the API middleware is a FAST-REJECT, never an authorization.** It answers one necessary-but-not-sufficient question: does the action appear in **any** of the principal's grants? If not, 403 before the gateway is ever touched. Answered from an in-process cache (the flattened union of permissions across all grants). It never grants access: passing the fast-reject only means "not categorically forbidden", scope still decides. Routes declare their required permission with `rbac.Require("component:create")`.
- **Scope (ABAC) in the Storage Gateway is per-action.** Every query carries `visible_set(P, action)` for the **specific action** being performed (read for a list/get, ack for an `:ack`, command for a `:command`), and the gateway filters rows by their exclusive-arc owner against that action-specific set (the owning `component`/`system`/`location`). A read uses `visible_set(P, read)`; a write uses `visible_set(P, write-action)`, the union of scopes of **only** the grants whose role carries that write action, never the read set and never a global union. This is the enforce-by-construction backstop: an `:ack` whose target lies outside `visible_set(P, ack)` matches **0 rows** even if the handler forgot its up-front check. A gateway write whose action-scoped predicate affects 0 rows is **never a silent success**: the gateway reports the miss to the handler, which returns 404 (target also outside `visible_set(P, read)`, non-disclosing) or 403 (target readable but outside the action scope), matching the up-front `canDo` decision for the same input. A silent 200/no-op is a correctness bug and is forbidden. Each per-action set is bounded by **fleet size (entities), not data volume**, so it stays an indexed membership filter even on the firehose; and because it is an owner filter in app code, not a DB policy, it works identically on Postgres, the columnar tier, or object storage.
- The gateway has three query **modes**: **scoped** (an API request carrying a principal's visible set), **node** (a node-driven write confined to the node's placement-derived `visible_set`, the owners of the tasks assigned to it from its NATS subject grants), and **system** (trusted internal work: the CDC publisher, the datapoint persistence sink, reconcile / migrate / seed, all-visibility). Node mode sits between scoped and system: a node is trusted to write platform internals on behalf of itself, but only for the owners it actually covers, so a compromised node cannot write arbitrary owners intra-tenant. System mode is an explicit, audited choice, never the default. There is no fourth path: any storage caller is one of these three.
- **Targeted mutation on a known id evaluates `canDo` up front.** A custom method against a specific id (`POST /alarms/X:ack`) evaluates `canDo(P, action, X)` in the handler **before** dispatch, so the decision is clean and explicit, with the gateway per-action predicate as the backstop for a forgotten check. The status split is fixed and three-way, not binary: (a) action in **no** grant -> 403 at the middleware fast-reject (capability missing entirely); (b) target in `visible_set(P, read)` but **outside** `visible_set(P, action)` -> **403** (the principal can read X but cannot perform this action on this target); this 403 leaks no existence, because the caller can already read X. (c) target **outside** `visible_set(P, read)` -> **404**, non-disclosing, exactly as an out-of-scope read. The up-front check and the gateway backstop return the **same** status for the same input.
- **Scope is structural, not per-handler**: the principal's scope is a required input to the gateway's query layer, so no code path can query unscoped by accident. With no RLS backstop for in-database scope the gateway is the sole guarantor, so "forgot to filter" must be impossible by construction, not by discipline.

**Worked example (per-grant binding denies estate-wide ack).** Principal P holds two grants: `operator @ group-A` (role carries `alarm:ack`) and `viewer @ all` (read-only). Alarm X is owned by a component in **group-B**. P calls `POST /alarms/X:ack`:

1. **Middleware fast-reject**: `alarm:ack` appears in *a* grant (the `operator @ group-A` one), so it passes. (This is why fast-reject is necessary-not-sufficient: it cannot see that the ack-carrying grant does not cover X.)
2. **Up-front `canDo(P, ack, X)`**: the only grant whose role carries `ack` is `operator @ group-A`; X is not in `expand(group-A)`. `viewer @ all` carries `ack` = no. So `canDo` = **false**.
3. **Status**: X is in `visible_set(P, read)` (via `viewer @ all`) but outside `visible_set(P, ack)`. Branch (b): **403**, "cannot ack this alarm", not a 404 (P can already `GET /alarms/X`, so non-disclosure does not apply).
4. **Backstop**: had the handler skipped step 2, the gateway's `:ack` write carries `visible_set(P, ack)`, X is outside it, the UPDATE matches 0 rows, and the gateway returns the same 403, never a silent success.

The flattened-set model would have wrongly allowed this: `ack` is "in the permission set" and X is "in the global visible set", so the per-grant binding is exactly what stops estate-wide ack.
- **Non-entity resources** have no entity `E`, so `canDo` cannot scope by owner. Two governance classes:
  - **IAM subjects** (`principal`, `role`, `principal_grant`, and a principal's **login credential** create/delete): the action must appear in a grant whose `scope_kind` is `all`. A scoped grant confers **no** IAM capability, so `role:create` carried by an `operator @ HQ` grant does not let you create roles. Typically `owner @ all` / `admin @ all`. (Device secrets are a different resource: a **credential variable** is entity-scoped, so its `secret:read` plaintext decrypt and its rotation are ordinary scoped actions against the credential's owner, [config and credentials](/architecture/variables/).)
  - **Data registries** (`datapoint_type`, `tag`, `unit`, `event_type`, `severity_level`, source): governed by a distinct **`<registry>:create` curator capability** (`datapoint_type:create`, `tag:create`, `unit:create`, `event_type:create`, `severity_level:create`, `source:create`). A registry entry has no owner entity, so the grant's `scope_kind` is irrelevant: the check is simply whether the principal holds the capability. Granting it to a curator role lets a principal mint registry entries **without** IAM admin; a minted entry carries its own `scope` (an org-scoped entry shadows an official one, the [namespace-shadow pattern](/architecture/datapoints/#key-scope-template-org-official)), and `official`-scoped entries are reserved to `owner` and the boot seed.

  The fast-reject still only rejects; for these resources the authorization is the grant-class check (an `all`-scoped grant for IAM, the `<registry>:create` capability for registries), the one place the decision is capability-shaped because there is no entity to scope.

Both layers operate **within one database**. Tenant isolation is **per-deployment**: a tenant is one database plus one **NATS account** plus one deployment, so per-database isolation (storage) and per-account isolation (messaging) are the same boundary. There is no `tenant_id` column anywhere, so the cross-tenant boundary is the database / account boundary itself, not a row predicate. Intra-database scope (above) is the only app-enforced layer; there is no RLS backstop.

:::caution[Open question]
Whether to add a **third authorization lever**: a declarative **tenant-level policy** layer, evaluated at
the **highest priority** above RBAC and ABAC, expressing **negative guardrails** an admin declares
centrally, the things that must **never** happen. A grant plus scope might permit `system:delete`, yet a
tenant policy ("no member of the `integrator` group may ever delete a system") **denies** it, and the
deny wins. This is where negative authorization would live, keeping [roles](#roles-and-the-role-hierarchy)
additive and positive (a role still carries no negative permissions). Open: whether to add it at all, the
policy shape (deny rules over resource + action + subject / scope conditions), the evaluation order, and
whether it is deny-only or can also force-allow.
:::

## Caching strategy

The hot path must not hit the DB for RBAC. Three layers, in-process, no persisted "effective permissions" projection (which would invite the stale-join class of cache-coherence bug; the grant and role caches below still carry a bounded staleness, the contract for which is stated at the end):

1. **Role index**: at boot, the `role` table is loaded into a Go map with `inherits` resolved transitively, wildcards expanded, and the **`:read` floor injected** (each `<resource>:<action>` adds the implied `<resource>:read`, so the floor is in the flattened union the fast-reject reads, not only the scope layer). Refreshed on a NATS KV watch keyed on `role` changes.
2. **Principal cache**: at session establish (or first token-auth), the principal's **grants** and the `role -> perms` index are cached by `principal_id`; the flattened `Set[resource:action]` (used only for the fast-reject and `/auth/me`) is derived from them. Invalidated on a NATS KV watch keyed on `principal_grant`, `principal`, or `role` changes. **Group membership is resolved live in-query** (no materialized member-set cache), so a dynamic group's expansion is always current.
3. **Per-request**: the per-action authorization is **composed at request time** from the cached grants + `role -> perms`. The middleware does an O(1) Set-membership fast-reject on the flattened permissions; the gateway builds `visible_set(P, action)` for the **specific action** by unioning the scopes of only the grants whose role carries it. The flattened set never authorizes; it only fast-rejects. Both O(1)-with-a-prefactor in the common case.

The DB is the source of truth; caches are derived views with explicit invalidation events. The principal/permission cache, config, and distributed locks live in **NATS KV** (not Postgres `LISTEN/NOTIFY`): a committed change to `role` / `principal` / `principal_grant` reaches NATS through the leader-elected CDC publisher, which updates the KV keys those watches observe. The same KV contract holds whether the design runs single-binary (embedded NATS) or against an external NATS cluster at scale.

**Staleness contract.** Both the handler `canDo` and the gateway predicate read the **same** cached grants, so the gateway backstops a *forgotten* check, not a *stale* one: a revoked-but-not-yet-invalidated grant authorizes at both layers. The grant cache therefore carries a **bounded max-staleness**, a TTL floor independent of CDC invalidation, so a CDC-publisher outage or failover cannot extend the revoke-lag window unbounded. For **high-sensitivity mutations** (IAM changes and deletes of IAM objects) the gateway **re-resolves grants in the transaction** against source-of-truth, trading a round trip for zero revoke-lag; that round trip is off the read and firehose **hot path** (which never hits the DB for RBAC). Other control-plane mutations (`:ack`, `:command`, a config `PATCH`) take the cached path and so accept a **bounded revoke-lag** (the TTL floor above): documented and bounded, not closed. An open SSE session **re-checks on every grant-cache invalidation** for its principal (next section's relay) and closes if `:read` is lost. The freshness asymmetry is deliberate: grant membership (the **subject** side) is cached and is the binding staleness constraint, while group membership (the **object** side) is resolved live in-query, so it can only tighten, never loosen, a stale grant.

## The /auth/me contract

The web app (and any CLI client) gets the principal + their effective permissions in one call:

```json
GET /api/v1/auth/me
{
  "principal": { "id": "...", "kind": "human" },
  "human":     { "username": "jordan", "email": "jordan@example.com", "display_name": "Jordan Rivera", ... },
  "permissions": [
    "component:read", "component:create", "component:update",
    "alarm:read", "alarm:ack", "alarm:snooze", "alarm:resolve",
    ...
  ],
  "grants": [
    { "role": "operator", "scope_kind": "location", "scope_id": "HQ" },
    { "role": "viewer",   "scope_kind": "all",      "scope_id": null }
  ]
}
```

`permissions` is flat and wildcard-expanded, ready for O(1) `useCan(...)` checks in the web app. It is a **fast-reject / UI hint only**, the union over all grants: it answers "could this principal ever do X anywhere", never "can it do X to **this** entity". List visibility likewise (a row in `GET /alarms` is read-scoped) does **not** imply per-action authority on that row. Per-row action affordances (the ack/snooze button on a specific alarm) must be computed against `visible_set(P, action)` for that target, which the `grants` array drives: `grants` is the source for advanced UI logic (scope chips, deciding per-row actionability, explaining why a button is or is not shown). The server is the only authority regardless; the flat list and the list view are hints, the scoped gateway decides.

## The node path

Nodes do not use general role x scope. A node authenticates with a per-tenant **NATS JWT/nkey** credential bound to its `node.name` and is authorized only to **its own assignments**: publish telemetry, heartbeat, consume the commands addressed to it. It is an identity-scoped narrow path, and the scope is carried by **NATS subject permissions**, not a route authorizer:

- A node is a NATS client over the WAN (outbound only). The connection resolves the principal (kind=`node`) from the nkey, and the JWT's subject permissions are the node's placement-derived `visible_set`: it may publish only to its own ingress and report subjects and consume only from its own durable command queue. The general RBAC permission matrix does not apply.
- Datapoints land on the JetStream **raw ingress** subject (the admission consumer confines owner to the trusted stream); the node receives commands from a durable, server-side JetStream command queue rather than polling a route. Placement (the [cascade](/architecture/cascade/)) compiles directly into the account's subject grants, so a node can address only the owners it actually covers.
- A node's published datapoints are owner-bound at **stream-consume time, ahead of any evaluation**, by the **admission consumer** at the head of the data lane: for a node it checks the payload owner against the node's placement-derived `visible_set`; for a central webhook, against the interface's declared owner (the per-class confinement is specified in [messaging](/architecture/messaging/)). It re-publishes only confined datapoints to the trusted stream the rule engine, calc, and persistence sink consume; an owner outside the set is an orphan / discovery candidate, never an authoritative datapoint (see [collection](/architecture/collection/)). The fence cannot live only at the durable write, because the rule engine consumes the stream **live**: a forged owner must be caught **before** it can open an alarm or fire an action. **Trusted server-internal producers** (calc, the action layer's intended write) publish to the trusted stream directly, no admission pass. The admission consumer itself runs in **system mode** (its owner lookup is a system-mode gateway read); the persistence sink is then a trusted **system mode** `COPY` relying on confined owners upstream, with no per-row scope predicate of its own.

A `node` credential whose subject permissions do not cover a subject is rejected by NATS at publish/subscribe time; a non-`node` principal cannot hold a node account's subject grants.

## One model, never duplicated

Authorization is **two in-app layers, each enforced in one place and re-derived nowhere else**: the `<resource>:<action>` **capability** check runs as API route middleware before the handler, and the **ABAC scope** filter is injected by the Storage Gateway on every query (a row filter belongs at the data path, where it holds by construction; the gateway also writes the in-transaction `audit_log`). The gateway owns **scope and audit**, not capability. The invariant is that no third surface re-implements either:

- **The live UI relay calls these, it does not copy them.** Operators never connect to NATS. The SSE subscribe is a normal route, **capability fast-rejected** at open (not authorized there); the server-side [SSE relay](/architecture/messaging/) then runs each candidate message through the **same** gateway scope a read uses, filtering by `visible_set(P, read)` against each message's exclusive-arc owner, so a live tile gets exactly the rows the operator could have fetched. The session **re-checks on every grant-cache invalidation** for its principal and closes if `:read` is lost, so a mid-stream scope shrink tears the stream down rather than leaking.
- **Node subject permissions gate the subject; the admission consumer gates the owner.** A node's NATS grants are mechanically derived from its placement as a coarse transport gate on the WAN edge. But subject permissions constrain the subject **string**, while a datapoint's owner lives in the **payload** (a multi-owner function resolves owner from labels server-side), so the subject grant is **not** a redundant copy of the owner fence: the **admission consumer** (above) is the authoritative owner fence, checking the payload owner against placement at consume time. Subject perms keep a node off subjects it has no business on; the admission consumer keeps a forged owner label out of the trusted stream. The bus carries no operator (`kind=human`) clients at all; an AI tool acting as one reaches the platform only through the API.

## Encryption in transit

TLS on the HTTP API (terminated at the binary when given a cert + key, or at the operator's reverse proxy) and on the NATS connection that carries node telemetry and commands. **BYO PKI.** "TLS off" is a deliberate dev-mode flag, never a silent default.

## Audit

Every API operation records the resolved **actor** (the principal id) in `audit_log`. Secret decrypts are always audited, never filterable. Node-mode writes record the node principal as actor; system-mode writes record `actor = 'system'` (or `'bootstrap'` for the seed phase) so the audit trail distinguishes operator action from platform internals. An AI tool acts via OAuth as a `human` or `service` principal, so its writes record that principal as actor.

## Bootstrap

The first install runs `og iam create-owner --username ops --email ops@example.com`. This creates the first operator as a `human` principal, a password credential (argon2id), and an `owner @ all` grant in one transaction. That operator logs in via the web UI or CLI and begins minting other principals. There is no implicit default principal; the bootstrap is the only path to the first owner.

## Worked example

Sam is an AV support tech. SCIM syncs Sam into the **`AV-Support`** `principal_group` (or Sam is a local `human` principal). The group holds one grant: `operator @ "AV-devices" (component-group), viewer @ "HQ" (location)`. Result:

- Sam can **operate** (create / update / ack alarms) on AV devices fleet-wide (the cross-cutting entity-group), and **read** everything at HQ (the location node + its subtree).
- The gateway's scope filter hides every row outside those scopes; the API middleware blocks Sam from, say, creating a principal (no `principal:create` capability in `operator`).
- The day a device joins the `AV-devices` dynamic group, it enters Sam's scope; the day Sam leaves `AV-Support` in the IdP, SCIM removes the grant.

:::caution[Open question]
The SCIM mapping detail: which IdP attributes drive `principal_group` membership and grants.
:::

## Storage

The IAM subjects and their grants; the physical layout lives on [storage](/architecture/storage/).

| Table | Key columns | Notes |
|---|---|---|
| `principal` (+ per-kind `human` / `service` / `node`) | id, kind | base `principal` is identity (opaque uuid) + kind only; per-kind tables hold the rest, including each kind's label: `human.display_name` (the person's real name) + username + email, the `service` label, the `node` name (+ labels, last_heartbeat_at, bound credential) |
| `role` | id, **official**, permissions (jsonb: `<resource>:<action>`) | RBAC capability set; ship viewer/operator/admin/owner + custom |
| `principal_grant` | (principal_id, role, **scope**) | role x scope; scope = a structural node, an entity-group, or `all`; additive |

---

# Messaging

URL: /architecture/messaging/

The internal and edge NATS subject contract, the sibling to the public API: JetStream streams and consumers, the two lanes, request-reply, KV, the live UI relay, and per-tenant subject isolation.

Omniglass has **two typed contracts**. The [public API](/architecture/api/) is the north face (HTTP and
OpenAPI: operators, the SPA, the CLI, integrations, MCP). This is its sibling: the **internal and edge
transport**, a **NATS subject contract** over JetStream. Service-to-service traffic, the edge, and the
live UI ride it. **Postgres stays the system of record; NATS moves.** The deployment topology and the
inter-service diagram are on [scaling](/architecture/scaling/).

## Two lanes, one bus

Internal traffic splits by what is moving:

- **Data lane (NATS-native): datapoints.** Untrusted publishers (a node, an external webhook sender)
  publish to a **raw ingress subject**; an **admission consumer** at the head of the lane owner-confines
  each datapoint and re-publishes only confined ones to the **trusted** datapoints stream. The confinement
  set is **per publisher class**: a **node**'s payload owner is checked against its placement `visible_set`;
  a **central webhook**'s against the interface's declared owner (from the trusted server-set `interface`
  label). The republish copies the original `Nats-Msg-Id`, `correlation_id`, and `caused_by_event_id`
  headers verbatim, so dedup survives the hop. **Trusted server-internal producers publish straight to the
  trusted stream**, no admission pass: calc output (owner from the validated `calc_rule` scope) and the
  action layer's intended write (owner from the command target) are already inside the trust boundary. The
  rule engine consumes the trusted stream directly, and a **persistence consumer** batch-writes it to
  Postgres as an async sink. Confinement is at **consume time, ahead of evaluation**, because the rule
  engine reacts live: a forged owner must be dropped before it can open an alarm, not just before it is
  persisted. The admission consumer itself runs in **system mode** (its owner lookup is a system-mode
  gateway read; a dropped datapoint is logged as a discovery candidate,
  [identity and access](/architecture/identity-access/)). Datapoints do not go through CDC, they are
  already on the bus, idempotent on `(series, ts)`.
- **Record / state lane (Postgres-first, CDC-out): events, alarms, actions, operator mutations.** Born in
  a Postgres transaction (a firing `event_rule` writes the event plus the alarm transition atomically; the
  API writes config, ack, settings). A **leader-elected CDC publisher** (logical decoding of the WAL)
  publishes those committed changes to JetStream, where `action_rule`, reconcile, and projection consumers
  react. No dual-write: born in the commit, the bridge fans it out.

## Streams and consumers

- **datapoints** (data lane): untrusted publishers (node, external webhook) publish to a **raw ingress**
  subject; the **admission consumer** owner-confines per publisher class and re-publishes to the **trusted**
  datapoints stream that the rule engine, calc, and the persistence consumer read. Trusted server producers
  (calc, the action layer's intended write) publish to the trusted stream directly. A **work-queue consumer
  group** scales horizontally (each message to exactly one consumer), so adding worker replicas adds
  throughput with no leader.
- **records** (events, alarms, actions): published by the CDC publisher from Postgres commits; consumed by
  `action_rule`, reconcile, and projection consumers.
- **commands**: a durable, per-node **command queue** the edge holds a consumer on ([nodes](/architecture/nodes/)).
- **telemetry** (control-plane, not the datapoint firehose, which lands on raw ingress above): the edge publishes `node.self`, `session_log`, and command results.

Durable consumers track their own position; delivery is at-least-once with `Nats-Msg-Id` dedup plus double
ack, which with the idempotent sinks (a datapoint on `(series, ts)`, an action transition on
`(alarm, action, transition)`, the CDC idempotency key) gives exactly-once **outcomes**. This triple
(`Nats-Msg-Id` dedup, double ack, idempotent sink) is the canonical exactly-once mechanism the other pages
refer to. The edge stamps `ts`, so the system is ts-authoritative and needs no strict ordering on the wire.

## Subjects, accounts, and scope

Subjects are hierarchical and **scope is expressed in them**, not bolted on:

- **Tenant = one NATS account.** Per-account isolation (messaging) is the same boundary as the
  per-database isolation (storage): no shared subjects, no shared rows ([identity and access](/architecture/identity-access/)).
- **Subject permissions gate the subject string; the admission consumer gates the owner.** A node may
  publish and subscribe only the subjects for its placement; the grant is **mechanically derived from
  placement**, a coarse transport gate, not a second copy of the ABAC model. But a datapoint's owner lives
  in the **payload** (a multi-owner function resolves owner from labels), which subject permissions cannot
  see, so the **admission consumer** (above) is the authoritative owner fence, and authorization stays
  authoritative in the [Storage Gateway](/architecture/storage/). **Operators never connect to the bus**,
  so there is no operator subject-permission model to keep in sync (see the live UI relay below).

## Request-reply: service to service

Synchronous internal calls use **NATS request-reply**: an in-process call in single-binary mode, a
request over the bus when modes are split across pods. The public API never uses request-reply (it is
HTTP); request-reply is the east-west wire only.

## KV and object store

- **KV** holds config, **distributed locks and leader-election** (the CDC publisher and the clock are
  leader-elected singletons), and the principal and permission cache (replacing Postgres `LISTEN/NOTIFY`
  invalidation, [identity and access](/architecture/identity-access/)).
- **Object store** holds internal artifacts (a compiled per-node runtime unit, for example). User files
  stay on the content-addressed [blob store](/architecture/files/), not here.

## The live UI relay

The web UI gets real-time data by **subscribing to the server, not to the bus**, and never through a
polling loop on the API. **Operators do not connect to NATS** (the bus is internal-plus-nodes only), so the
live path introduces **no second authorization model**:

- **Server-side relay.** The SSE subscribe is a normal route, capability-checked before it opens. The
  server then holds the internal JetStream subscription, runs every candidate message through the **same
  Storage Gateway scope** a read would use (the one authoritative ABAC filter, in-process), and streams
  only what passes down to the browser. The scope filter executes in exactly one place; the live path
  **calls** it per message instead of re-encoding it as subject permissions.
- **Transport is SSE.** The browser opens a **Server-Sent Events** stream on the same authenticated,
  same-origin HTTP seam as the rest of the API (same cookie or bearer, same proxy, same TLS), and the
  server pushes. One-way fits a live read: subscribe is one request, data flows down, and mutations and
  commands keep their own paths (the API action row, the internal bus). Over HTTP/2 the stream
  multiplexes, so there is no connection-count ceiling. There is **no NATS-WebSocket path and no
  fallback**: SSE is the one live transport.
- **Seed then stream.** A [view](/architecture/views/) over HTTP paints current state; the SSE stream
  keeps it live with deltas. Bulk reads stay on the views BFF; live deltas come over the relay.
- **Where it shines:** a live fleet tile, the alarm console, and the **template-debug / dev-tap** surface,
  where an operator watches datapoints arrive in real time as a template runs (the learning-tool "render
  the real engine against live data" surface, [the learning tool](/contributing/learning-tool/)).

Related: [API](/architecture/api/) (the public HTTP contract), [scaling](/architecture/scaling/) (the
deployment topology and the diagram), [nodes](/architecture/nodes/) (the edge as a NATS client),
[workers](/architecture/workers/) (the JetStream consumers), and [storage](/architecture/storage/)
(Postgres as the system of record).

---

# Nodes

URL: /architecture/nodes/

How the edge runtime pulls its worklist, runs tasks and commands, manages sessions, gates reachability, and ships telemetry.

A node is the edge runtime that lets an operator collect from and control gear no matter where it sits, by pulling its worklist from the server, running it on the spot, and shipping results back. This page covers how it gets its instructions and runs them: worklist pull, placement, executing tasks and
commands, sessions, inbound demux, the task queue, reachability, and shipping
telemetry. The declarative shape it executes lives in [templates](/architecture/templates/) and [collection](/architecture/collection/).

## The node

The node is the edge process (`omniglass --mode node`), one per site, or the
**server itself** for work with no site-local edge (see *Placement*). Its identity
is **bound to `node.name`**; a compromised node cannot impersonate another (see
[identity-access](/architecture/identity-access/) for the node auth path). It holds no
config of its own: it pulls what to do, runs it, and ships results. A node's writes
are confined to its **placement-derived `visible_set`** (the owners of the tasks
assigned to it), so a node ingests in **node mode**, not all-visibility system mode
(see [identity-access](/architecture/identity-access/)).

## Getting its instructions

The node pulls a **worklist**: the tasks and commands resolved for the
components **placed on it**, over a NATS request-reply config pull. It **heartbeats**
separately, on its own subject (see [the protocol](#the-node-server-protocol)), so the
server tracks liveness independently of the pull. The server, not the template, decides placement (next), and
resolves the cascade (config / `$var:` values, effective `interval`, credentials)
before handing the node concrete work. The node never sees a template; it sees
materialized, resolved task and command instances.

The full wire contract, the channels, the command queue, delivery, buffering, credentials, and
enrollment, is **[the node-server protocol](#the-node-server-protocol)** below.

### Config propagation (declared change to running node)

An interface's connection config (endpoint, snmp community, http auth header) is a
**projection** of the component's declared config through its template. The node
re-pulls the worklist (tasks) every tick, but **caches interface config for its
process lifetime**, so a changed connection input must be propagated, not just
written:

- **Reconcile on the server.** Changing a declared input (via
  `/components/{name}:apply`, or a direct write to the component's config)
  re-renders the affected interfaces from the component's *current* declared
  config and upserts them, preserving placement. So the materialized interface
  always reflects the latest declared config, regardless of which path changed it.
- **Invalidate on the node.** The worklist reply carries a per-node **config
  generation** (a `config_generation` field on the reply, not an HTTP header: the
  node path is NATS): the max `updated_at` across the
  interfaces the node polls. When it advances, an interface's rendered config
  changed, so the node drops its interface cache and re-fetches this tick. A
  steady generation serves from cache; a real change forces a refresh within one
  tick, no restart.

The generation moves at **operator-config pace, not telemetry pace**: it is a
read-side aggregate over interface config, and the high-volume datapoint-write
path never touches `interface.updated_at`. A no-op re-apply (identical rendered
config) does not advance it, so nodes are never woken for nothing.

## The node-server protocol

The edge is **outbound-only**: a node sits behind NAT at a site, so the server never dials it. A node is a
**NATS client over the WAN**: it opens one authenticated, outbound connection to the bus (an nkey/JWT
credential bound to `node.name`, [identity and access](/architecture/identity-access/)), and everything
server-to-node arrives as messages on subjects the node is permitted to consume. Three flows share that
connection:

- **Telemetry up** (node to server): the node **publishes** `Event` batches (`{datapoints, labels}` plus
  the `(task, ts)` envelope, [below](#shipping-datapoints)) to a **JetStream raw ingress subject**;
  JetStream acknowledges each publish (at-least-once), and a `Nats-Msg-Id` lets the server dedup a replay
  (the admission consumer preserves it when it republishes to the trusted stream). The firehose from the edge.
- **Control down** (server to node): the node holds a **durable JetStream consumer** on its
  **command queue** (commands to run) and subscribes to **worklist-change signals** (the config-generation
  bump, so the node re-pulls). Subjects the node may consume are scoped by its placement (next).
- **Control up** (node to server): heartbeat (liveness, feeding the `node.down` sweep), command-execution
  results (the `action`-row status), `session_log` transitions, and the `:report` self-telemetry, each
  published on its own subject rather than a separate HTTP path.

### Commands: a durable server queue, a stateless edge

A command is **issued server-side** (the action layer records it and writes intended state,
[alarms and actions](/architecture/alarms-actions/)) and dispatched onto a **durable server-side JetStream
command queue**. The **edge holds nothing durable**: the node is a worker that pulls the next command from
its durable consumer on that queue (and on reconnect resumes from its last ack, draining whatever the
queue still holds), runs it, and reports the result back up, which updates the `action` row. Durability
lives where the source of truth is, the server, so a node restart loses no command. The held consumer
delivers commands as they arrive, so there is no poll latency.

### Delivery: at-least-once, idempotent by nature

The node publishes **at-least-once** and reconnects by **resuming unacked publishes** (JetStream ack plus
`Nats-Msg-Id` dedup); the server makes replay safe **without a separate idempotency layer**, because
everything the edge ships is idempotent by its own key:

- **datapoints** dedup on **`(series, ts)`**: a replayed point at the same timestamp is the same point,
  an idempotent upsert. The edge stamps `ts`, so the server is **ts-authoritative** and reorders
  out-of-order arrivals for free, so there is **no strict-ordering requirement** on the wire.
- **command results** are an **idempotent status update** on a known `action` row (by id): applying
  "done" twice is "done".

**Events are not shipped from the edge**, so there is nothing to dedup for them: an event is **derived
server-side** (an `event_rule` over datapoints, or a `log_datapoint` promoted by a rule,
[events](/architecture/events/)). The edge produces datapoints (including log lines) and command status;
the server derives the events. "We do not re-raise the same event next poll" is the **alarm** model's job
(one stateful open alarm, fire and clear), not a delivery concern.

### Buffering and retention are cascade settings

When the server is unreachable the node **buffers in memory**, bounded; the buffer is **not durable at
the edge** (the edge is a worker, the durable side is the server). Both the **buffer** (size, shed
policy) and **retention** are **cascade-resolved** ([cascade](/architecture/cascade/)) with **global
defaults**, overridable down the tree, so a chatty site gets a bigger buffer and a sensitive class a
longer retention, tuned like any other setting rather than per-node flags. When the buffer fills the node
**sheds oldest metrics first and surfaces it** as a `node.buffer` datapoint (depth, drops), so shedding
is visible, never silent.

### Credentials at the edge

The worklist materialization resolves credentials server-side, so **device secrets travel to the node**
(over TLS). They are held **decrypt-on-use**: in memory, or encrypted at rest in a scratch dir with the
key from the [`SecretProvider`](/architecture/variables/), **never persisted in plaintext**, scoped to
the node's placement, and re-fetched on the config-generation bump. A field node is physically less
trusted, so a secret never lands on edge disk in the clear.

### Enrollment

Day one, a node is **created server-side first** (its `node.name` and properties), and the UI mints a
**per-node enrollment token**; the token is handed to the edge deployment, and the node **claims its
identity** on first connect (the token is exchanged for its **NATS credential**, a per-node JWT signed for
its nkey, scoped to the subjects its placement allows, [identity and access](/architecture/identity-access/)).
Later, a **shared enrollment token** plus a **`discovery_rule`** can auto-enroll a fleet: the node's **own
properties** (stable facts, selected ENV) derive its name, editable server-side after deploy, so a rollout
mints no per-node token.

## Placement (ETL, cascaded)

Collection follows **ETL**: extract **and transform** (including the extractor's Expr
transform) default to the **edge**, then the shaped datapoints are **loaded** to the
server, where resolve / bind / calc / evaluate default to **central**. Placement is a **cascaded property** ([cascade](/architecture/cascade/)), not a
special mode: `placement: central` makes the **server itself the node target**, for
cloud APIs, SaaS pollers, and inbound webhooks from external sources. A listener
endpoint lives where placement puts it: the on-site node for LAN devices (lower
latency, survives a WAN outage), the server for cloud sources, which is why a
registered callback URL resolves to the placed listener's address, not a hardcode.

## Running tasks

For each task the node runs the protocol over the interface's connection,
then **normalizes at the edge**: it applies the locate + Expr extraction
([collection](/architecture/collection/)) to produce datapoints and stamps labels (cascading
union + override); it keeps the original wire bytes as `raw` only on a parse or validation
failure (for `collection.failed`) or under dev raw-mode, and drops them on success. A task
runs in one of **two modes** ([collection](/architecture/collection/)); a held-open connection
is a **stateful interface transport**, not a third task type:

- **poll**: we ask. On the resolved `interval`, send the command/request, read the
  response (SNMP get, HTTP GET, an SSH-exec or xAPI `xStatus` on a held session);
- **listen**: we wait. Receive data pushed to us, whether to an endpoint we expose
  (webhook, syslog) or as feedback on a held connection (MQTT subscribe, xAPI feedback on
  a stateful interface).

Both assemble the same telemetry payload (below).

The built interface types (poll protocols and listeners), their per-task params, and the fixed
datapoints each emits are the collection **type catalog**: see
[built interface types and their config](/architecture/collection/#built-interface-types-and-their-config).
This page covers how the node *executes* them; the rest of this section is the runtime that wraps that
catalog (reachability gating, sessions, the task queue, tick scheduling).

## Sessions

A stateful interface (`ssh`, `mqtt`, anything held open) becomes a **session** at
runtime: one connection keyed by `(node, interface)`, shared by every task and
command under it, so the handshake and auth are paid once and reused. A session pool
holds the connection open across poll ticks (reconnect, backoff, keepalive), and a
listener runtime wakes on its inbound. The live socket is ephemeral and lives on the
node; the node **reports lifecycle transitions as `session_log` rows** to the server,
where the `session` entity projects current state (a current-state view over
`session_log`, ground-truth side; see [storage](/architecture/storage/)).

:::caution[Open question]
The exact `session` lifecycle state enum and pooling parameters (idle timeout, max lifetime, pool
size per interface, a shared versus dedicated session for a stream).
:::

Generic lifecycle:

- **establish**: connect, authenticate, **subscribe** if a stream rides this
  session;
- **operate**: run pollers and receive stream events over the held connection,
  demuxed (next);
- **recover**: graceful retry on connect, **especially auth failures** (backoff,
  surface as a `session_log` error, never hammer, since hammering a rejected
  credential risks lockout; ties to credentials); a
  subscription is session-scoped, so a reconnect **re-subscribes**;
- **teardown**: on error or when told, exit cleanly and set up again.

**Where failures land.** `session_log` owns **connection health** (cannot connect,
auth rejected, dropped, timeout). The **data event owns parse health**: a parse
failure (connected, got bytes, the extraction did not match) emits a `collection.failed`
event carrying the `raw` (the caused `event` + the `action` row for commands), and surfaces
as a collection-health datapoint so it is alertable. A command timeout can touch both.

## Inbound handling on a shared connection

When one connection carries heterogeneous inbound frames (a session with pollers + a
stream, or one webhook taking many payload types), an arriving frame is **not**
self-evidently the response to the last command. Frames route through an **ordered
matcher set**:

- every task contributes a matcher (a poller's awaited-response shape, a
  listener/stream's `match:` predicate); each inbound frame is tested **in
  order**, first match routes it to that task's extraction;
- while a poll is **outstanding**, its response matcher is tried **first**, then the
  standing matchers in declared order, so an event arriving mid-poll falls through to
  its stream instead of being mis-eaten as the response;
- where the protocol **frames** responses vs events (xAPI tags `*r` vs `*e`, a
  request id correlates), framing drives routing and the regex only extracts within
  the matched frame; otherwise ordered content-matching is the fallback;
- an **unmatched** frame lands as `raw` (orphan, logged), so a
  missing matcher is a fixable gap that surfaces rather than failing silently.

## The component task queue

The node's work is the **component task queue** (distinct from the central
**rule engine** that consumes datapoints off NATS and does derivation; see
[workers](/architecture/workers/)). It
holds **poll tasks** (produce datapoints) and **command tasks** (from `run`
actions, produce a caused `event` + `action`-row status), and splits work by shape:

- **discrete tasks** (pollers, commands): scheduled or triggered, request/response,
  **serialized into per-component lanes**. Component, not host, is the contention
  key: a server with two IPs is one component, and a reboot takes out both
  interfaces, so a per-host lane would run parallel work against the box you just
  rebooted. A shared poller that fans to many components runs once on its parent and
  fans out at binding.
- **standing receivers** (listen tasks): always-on, event-driven, **not
  lane-serialized**; they normalize as events arrive, sharing a held session with
  pollers (demuxed) or owning their connection.

**Smart-wait gate.** After a disruptive command, the lane blocks until reachability
reports the host back up, then releases the next task. The gate is a condition over
live reachability read from the node's **local** copy, not a round-trip to the
datapoint store; a fixed timeout is only the backstop.

Tasks within a single interface run serially (one probe, then its tasks in order); only distinct
interfaces run concurrently.

:::caution[Open question]
Whether to add intra-interface concurrency, given that connection and order semantics differ per
protocol.
:::

The node-side queue is **not** durable: the edge is a stateless worker, and durability lives
**server-side** (the JetStream command queue, and the cascade-configurable telemetry buffer). On reconnect
the node re-pulls its worklist, resumes its durable consumer on the command queue, and replays its unacked
telemetry publishes (idempotent on `(series, ts)`). See [the node-server protocol](#the-node-server-protocol).

## Implicit reachability

Any interface with a host address gets reachability for free: the node pings the
host and checks the declared port(s) are listening, continuously and out of band.
Smart default, **bypassable per interface** (endpoints that drop ICMP or have no port
to check opt out or override the probe). The results come back as `reachable` /
`port_open` **datapoints** usable in rules and dashboards, and they feed the
smart-wait gate from the node's local copy, so the connection detector and the
dashboard signal are the same always-on probe.

**The layered availability gate.** The gate is an **OSI-layered**
set of cheap checks run as a **concurrent pre-pass** (its own high concurrency,
short timeouts) before a connection-interface's poll tasks. All applicable checks
run (they are cheap), each ships a built-in datapoint, instanced (the ping by
host, the rest by interface) and owned by the queried component, and the
interface's **`interface.reachable`** verdict is their AND. The pre-pass is
separate from the bounded poll phase, so a node pinned to `--workers 1` (to
trickle telemetry past the queue) still gates a large fleet in ~one wave.

| Layer | Check | Datapoint | Notes |
|---|---|---|---|
| L3 network | ICMP ping, **batched once per host** per tick | `icmp.reachable` / `icmp.rtt_avg` | **informational** (see verdict below); shared by every interface on the host |
| L4 transport | TCP connect (tcp-family) **or** UDP presence (snmp/UDP) | `tcp.open`/`tcp.connect_time` · `udp.open` | a closed UDP port answers ICMP port-unreachable, so absence of that is "present"; this is why SNMP's transport check is L4, not its auth-dependent get |
| L7 app | protocol handshake: SNMP `sysUpTime` get (**`snmp.reachable`**, default-on) · SSH handshake+auth · telnet login chain | (verdict) | the SNMP get is the **primary, default** SNMP liveness (ICMP-independent); SSH/telnet are **opt-in** (`ssh_check`/`telnet_check="on"`) because their liveness credential can differ from the device's |

**The verdict respects each layer's definitiveness.** A TCP connect and any L7
handshake (SSH/telnet auth, the SNMP get) are **definitive** proof of
reachability, so they stand on their own and the **ping is informational**: an
ICMP-filtered host (a hardened device or a cloud API that drops echo) still reads
up from its port/protocol check, instead of the whole interface going dark. A UDP
"present" is a **read timeout** (open|filtered) and so is **ambiguous**; the only
thing that disambiguates it is the ping. So a failed ping fails the verdict ONLY
for an SNMP interface that has *opted out* of the L7 get (`snmp_check=off`),
leaving the ambiguous UDP probe as its only signal (`pingGates`); by default the
SNMP get is the signal and the ping is informational. A definitively *down* layer
(TCP refused, UDP ICMP-unreachable, an L7 auth/no-answer) fails the verdict
regardless; an inconclusive probe (a setup/resolve error, a missing credential)
does not gate.

**Off gates (the enable/disable convention).** Every check is toggled by
`params.<name>_check = "on" | "off"`, overriding its default; `params.liveness =
"off"` disables the whole gate. The default split is by **auth dependence**:

- **auth-independent layers default ON (opt-out):** `ping_check`, `port_check`
  (and `tls_check` when TLS lands). Cheap and credential-free, so safe to gate on.
- **`snmp_check` defaults ON** (opt-out), the one auth-dependent exception: the
  get reuses the *same* community the poll already needs, so a get failure means
  the device is genuinely unpollable, the right verdict, and it's the only
  ICMP-independent SNMP signal. Opt out to fall back to ping+UDP.
- **`ssh_check` / `telnet_check` default OFF** (opt-in): a service whose *liveness*
  credential differs from the device's must not read as down, so the operator
  opts in per interface.

The honest limit on SNMP status: a v2c wrong community is a **silent drop** (the
agent answers the *manager*, not us), so a get failure alone can't separate down
from wrong-community. Cross-referencing the layers does: host pings + UDP not
refused + get silent ⇒ "reachable, SNMP not answering this community" (auth/ACL/
wedged), distinct from "host down"; with ICMP fully blocked that inference is lost
and it's honestly reported as "host down or fully filtered." SSH verifies auth (a
rejected handshake is down); telnet completes the `login:`/`Password:` chain
(service-up, not a verified shell). Override the SNMP probe OID with
`params.liveness_oid` when a community view excludes the system group.

**Poller** tasks run only if the verdict is up; **listener** (`mode=listen`) tasks
are inbound and run ungated (and are never pinged); **inline probes** (`icmp`/`tcp`
with the host on the task, no interface endpoint) *are* the check and run ungated.
A down interface's gate datapoints all ship in **one** batched call. L5 (socket),
L6 (TLS), and further L7 handshakes slot in by extending the check stack: one
`append` in `ifaceChecks`, gated by its own `_check` param.

## Shipping datapoints

The node ships a native `Event`: `{ datapoints, labels }` plus an envelope
(`task`, batch `ts`), **published to the JetStream raw ingress subject** (protobuf-encoded
message, the proto surviving as the NATS message schema), buffered with
retry/backoff. On a parse or validation failure it also ships the **raw** wire bytes so the
server can emit a `collection.failed` event; on success raw is omitted (there is no telemetry
table), unless a **dev raw-mode** is on. An **OTLP adapter** at the edge accepts OTLP from
third-party tools and translates to the native shape.

```d2
direction: right
classes: { node: { style.border-radius: 8 } }
worklist: "pull worklist\n(placed tasks + commands)" { class: node }
execute: "execute:\nprotocol + locate/Expr extraction" { class: node }
normalize: "normalize: datapoints + labels\n(+ raw on failure)" { class: node }
ship: "buffer + publish\nraw ingress subject" { class: node }
admission: "admission: bind owner\n(consume time) → trusted" { class: node }
worker: "rule engine + persistence\n(trusted stream)" { class: node }
failed: "collection.failed\n(event, carries raw)" { class: node }
worklist -> execute
execute -> normalize
normalize -> ship
ship -> admission
admission -> worker
ship -> failed: "raw on failure" { style.stroke-dash: 4 }
```

The node has already produced the datapoints at the edge; an **admission consumer** binds
owner (registry lookup, owner attribution against the node's placement) at **consume time**
and republishes to the trusted stream the rule engine and persistence read, so a forged owner
is dropped before evaluation, not at the durable write. On a parse or
validation failure it emits a `collection.failed` event carrying the raw; on success there is
no raw to store. The server does not re-derive observed datapoints; only calc and event rules
derive. The node's job ends at the ship.

## Tick scheduling, concurrency, and self-observability

A tick groups the worklist **by interface** and runs in three phases: the L3 ping
pre-pass (batched per host), then the per-interface gate-verdict pre-pass, then
the poll phase. The two gate pre-passes run at a **high fixed concurrency**
(`gateConcurrency`, the checks are cheap short-timeout socket probes), while the
poll phase fans out across the **bounded poll pool** (default 16, `--workers`).
Splitting them is the point: the cheap gate is never throttled by a small
`--workers` (a node pinned to one poll worker still gates a large fleet in ~one
wave), and a node facing many dead or slow targets is bounded by concurrency, not
the serial sum of every probe timeout (a dead SNMP get costs `timeout *
(retries+1)`, configurable via `--snmp-timeout` / `--snmp-retries`; default 3s x2).
Each poll task additionally runs under a per-task deadline (`--task-deadline`,
default 30s).

:::caution[Open question]
Per-task schedule dispatch: the resolved `interval` exists, but honoring distinct per-task cadences
within one node tick is unsettled.
:::

The loop is **overrun-aware**: instead of a fixed ticker that silently drops
ticks when one runs long, it reschedules relative to each tick's finish. A tick
that exceeds its interval is flagged and the next fires immediately, so a node
falling behind **surfaces** the overrun rather than stalling its cadence
silently.

Each tick the node reports its own execution by publishing a `node.self` envelope: tick
duration, task attempted/ran/skipped/failed counts, interface probed/up/down counts, and the
`node.overrun` state. It is **not special-cased**: `node.self` is node-owned datapoints (the
seeded `node.*` types) that ride the **same raw-ingress -> admission -> trusted** path as any
other node datapoint, and the rule engine derives node-health from them like any other
datapoint. A node carries no operator-authored template; its self shape is **built into the
binary** (the seeded `node.*` datapoint types and node-health rules), and the `node.self`
shape selects that built-in template at derive time. The one node-specific piece is owner
resolution: the **admission consumer** binds `node.self` to the **reporting node**
(`owner_kind = node`, a `node` owner arc, the `node_id` arm of the exclusive arc alongside
component/system/location/global), the node-arc analogue of a per-component interface binding
its datapoints to its component. So node datapoints land node-owned, and the rule engine's
batching + concurrency + amortized rule refresh apply for free. This is the operator-visible
health of the collection layer itself. Self-telemetry is best-effort (a failed
report is logged, never fatal; it must not break collection).

A node that goes dark, though, reports nothing, so a degraded-but-alive signal
is not enough. A **node-liveness sweep** runs server-side alongside the rule engine: a node
whose last heartbeat (or its registration, if it has never checked in) predates
the staleness window (`OMNIGLASS_NODE_DOWN_AFTER`, default 90s) gets a node-owned
**`node.down` alarm**, auto-resolved the moment it heartbeats again. The alarm is
raised directly by the sweep (no event_rule: a dead node emits no datapoint to
evaluate), keyed by `(node.down, node owner)` so it is idempotent across sweeps.
This is why the node owner arc reaches `event` and `alarm`, not just datapoints:
"the node isn't working" is a first-class node-owned incident.

A degraded-but-alive node, by contrast, *does* report, so it alarms through the
ordinary **event_rule** path the rule engine runs over every arriving datapoint, no
node-specific evaluation: a rule on a `node.*` key opens a node-owned alarm. Two
are seeded by default: `node-overrun` (fires while `node.overrun` is true) and
`node-tasks-failing` (fires while `node.tasks.failed > 0`), both resolving
implicitly on the next clean tick. This works because the trigger engine is
owner-general: `Evaluate` opens and resolves alarms for the datapoint's actual
owner (component, system, location, or node), which also unlocks system- and
location-owned alarms.

---

# Scaling and deployment

URL: /architecture/scaling/

One binary that runs a laptop demo or a Kubernetes fleet: two run modes, embedded Postgres and NATS, the CDC bridge, horizontal scale, high availability, platform configuration, and per-database multi-tenancy.

Omniglass is **one Go binary**, and that is a packaging decision, not a scale ceiling. The same artifact
runs an all-in-one container on a laptop and a horizontally-scaled fleet on Kubernetes; you scale by
**topology**, not by swapping products. This page is the deployment and scale model: the two run modes,
the embedded services, what replicates, the coordination substrate, platform configuration, high
availability, and multi-tenancy.

## Two run modes, one binary

The binary is a **modular monolith**: one codebase, one artifact, modules behind clean seams (the Storage
Gateway is the only path to the database, coordination rides NATS, collection runs at the edge). It runs
two ways, the **same binary**, no fork:

- **All-in-one (the modular monolith).** One process runs every role, with **Postgres and NATS embedded**
  (below), against nothing external. The desktop, single-binary, small-estate case: download, run, done.
- **Split by run mode (Kubernetes).** The same binary launched **per mode** as separate Deployments,
  against an **external** Postgres and an external NATS cluster. A Helm chart wires it up, and each role
  scales independently.

Splitting a mode onto its own pods is a **deployment choice, not a rewrite**, because the modules already
talk over NATS and the gateway rather than in-process calls that would need untangling. The roles:

- **server**: the public HTTP API ([API](/architecture/api/)) and the views read path; it serves the
  **SPA embedded in the binary** (`go:embed`), so the web UI is not a separate service. Stateless.
- **worker**: the **JetStream consumers** (rule engine, reconcile, notify, [workers](/architecture/workers/)).
  Stateless competing consumers; add replicas for throughput.
- **controller**: the leader-elected **singletons** (the clock and the CDC publisher, below). A role, not
  necessarily its own pod.
- **node**: collection; at the edge it runs **at the sites** (outside the cluster) and connects back, with
  a central `node` for cloud-API and SaaS sources (`placement: central`, [nodes](/architecture/nodes/)).

## Embedded services (single-binary mode)

In all-in-one mode the binary brings its dependencies up in-process, so an operator runs **one container,
zero external setup**:

- **NATS + JetStream, embedded as a library** (`nats-server` in-process, file-backed). The app is always a
  NATS client; embedded versus external is a config flag, not a code path.
- **PostgreSQL, embedded as a managed subprocess** ([embedded-postgres](https://github.com/fergusstrange/embedded-postgres)):
  a **real** Postgres, so logical decoding (the CDC bridge below), JSONB, partitioning, and the
  exclusive-arc CHECK constraints behave identically to at-scale. Pinned to **Postgres 18.3.0 or newer**
  for ARM and x86. **Not SQLite**, which has no logical replication and would fork the data layer into a
  second, lesser architecture.

So "single binary" is the binary orchestrating a real Postgres and NATS for you, not a different
datastore. The data and coordination architecture is identical at any size.

## Coordination: NATS moves, Postgres remembers

The split is firm. **Postgres is the relational system of record** (entities, datapoints, events and
alarms, audit, and the queries the cascade, fusion, views, and scope need). **NATS (JetStream) is the
nervous system**: work distribution, the durable command queue, the telemetry buffer, and fan-out, plus
**KV** (config, locks, leader-election) and an object store for internal artifacts.

The two meet through **change data capture**: Postgres tells us *what changed* (logical decoding of the
WAL), and NATS carries the queue. A single **leader-elected CDC publisher** reads committed changes from a
replication slot and publishes them to JetStream (an idempotency key per change yields exactly-once
outcomes downstream). **Postgres is never a message bus**; it only emits its changes. The replication
**slot and publication are ensured idempotently in the boot phase, not a migration**, since dbmate
migrations run exactly once.

### Inter-service communication

Service-to-service traffic rides **two lanes on the one JetStream bus**, by what is moving:

- **Data lane (NATS-native).** Observed and calculated **datapoints** live on NATS. The edge and central
  nodes publish observed datapoints to a **raw ingress** subject; an **admission consumer** owner-confines
  them per publisher class and republishes to the **trusted** datapoints stream, which the rule engine
  consumes directly from NATS (calc publishes derived datapoints onto the trusted stream as a trusted producer). A
  **persistence consumer** batch-writes datapoints to the Postgres metric, state, and log tables as an async
  **sink**. Datapoints do not pass through CDC: they are already on NATS, idempotent on `(series, ts)`, and
  the firehose, so rules never wait on Postgres. Postgres is the durable record, NATS is the live signal.
- **Record and state lane (Postgres-first, CDC-out).** **Events, alarms, actions, and operator mutations**
  (config, ack, snooze, settings, manual commands) are **born in a Postgres transaction**: when an
  `event_rule` fires, the consumer writes the event record and the alarm transition (serialized per
  `(event_rule, owner)`) in one transaction, and the API writes config, ack, and settings the same way. The
  leader-elected CDC publisher then fans those committed changes out to JetStream, where `action_rule`,
  reconcile, and projection consumers react. **No dual-write**: born in the commit, CDC fans out.

```d2
direction: down
classes: {
  node: { style.border-radius: 8 }
  key: { style: { border-radius: 8; bold: true } }
  group: { style.border-radius: 8 }
}
north: "North plane: public API (HTTP / AIP)" {
  class: group
  direction: right
  c1: "SPA" { class: node }
  c2: "CLI" { class: node }
  c3: "MCP / AI agent" { class: node }
  c4: "integrations · webhooks" { class: node }
}
binary: "one Omniglass binary: modular monolith (1..N replicas)" {
  class: group
  api: "API / server (per-replica)" { class: node }
  gw: "Storage Gateway (the only DB path)" { class: node }
  wk: "JetStream consumers: rule engine · reconcile · notify · persistence (per-replica, competing)" { class: node }
  clk: "clock (singleton)" { class: node }
  cdc: "CDC publisher: WAL to JetStream (singleton)" { class: node }
  nats: "embedded NATS: JetStream · KV · Object store" { class: key }
}
pg: "PostgreSQL: system of record" { class: node; shape: cylinder }
edge_nodes: "edge nodes (distributed · NATS clients)" { class: node }
ext: "external NATS cluster (optional BYO at scale)" { class: node }
north -> binary.api: "HTTPS"
binary.api -> binary.gw
binary.gw <-> pg
pg -> binary.cdc: "WAL (logical decoding)"
binary.cdc -> binary.nats: "publish committed changes"
binary.nats -> binary.wk: "east-west: work + events"
binary.wk -> binary.gw
binary.clk -> binary.nats: "schedule fires"
binary.nats <-> edge_nodes: "South: telemetry up · commands down" { style.stroke-width: 3 }
binary.nats -- binary.clk: "KV: config · locks · leader-elect" { style.stroke-dash: 4 }
binary.nats -- ext: "swap embedded for BYO" { style.stroke-dash: 4 }
```

## Horizontal scale: what replicates

- **server** is **stateless**: replicate it behind a load balancer; state lives in Postgres.
- **workers** are **JetStream consumers**: a work-queue stream delivers each message to exactly one
  consumer, so adding replicas adds throughput with no leader and no cross-worker chatter (NATS is the
  coordinator, [workers](/architecture/workers/)).
- **edge nodes**: distribution is the design, one or many per site, connecting back; adding sites adds
  nodes ([nodes](/architecture/nodes/)).
- **singletons** (the clock and the CDC publisher) are **leader-elected via a NATS KV lock**: exactly one
  active, the rest stand by and take over on failure. One mechanism, no separate election service.

## Platform configuration

Configuration is **two tiers**, and platform settings are deliberately **centralized**, not scattered
across dozens of tables and APIs:

- **Bootstrap (env, optional).** The irreducible minimum needed before the database exists: the Postgres
  DSN, the NATS embed-or-external choice and address, the `SecretProvider` key, the run mode, and the
  listen address. In all-in-one mode these have working defaults, so a desktop run needs **no configuration
  at all**; env vars override when you need them.
- **The platform settings store (one place).** Everything else lives in a single, audited **settings
  store**: feature flags, the buffer and retention defaults, CDC routing, integration settings, UI
  defaults, official-registry overrides. It is materialized in Postgres (the runtime authoritative copy,
  changeable through the API and audited), and **seeded declaratively from a settings file**
  (`settings.json` or YAML) reconciled on every boot (the idempotent boot-seed phase, `ON CONFLICT DO
  UPDATE`). The file is GitOps-friendly and mounts cleanly as a **Kubernetes ConfigMap** (and a future
  operator), so the same declarative source drives a laptop and a fleet.

This is distinct from estate [config and variables](/architecture/variables/), which describe the
*estate* and resolve down the cascade. The settings store describes the **platform itself**, and there is
exactly one home for it, the single source of truth core settings deserve.

## Vertical scale and high availability

Replicas are the **HA** story: the server and worker tiers have no single point of failure (any replica
can serve or consume), the singletons fail over by re-electing on the NATS KV lock, Postgres HA is the
database's concern (CNPG, a managed cluster), NATS HA is the JetStream cluster's, and the **edge survives a
WAN outage on its own** (the bounded buffer plus the durable command queue, [nodes](/architecture/nodes/)).
Vertical scale is the simple first lever (a bigger Postgres, more worker CPU); horizontal removes the
ceiling.

## Multi-tenancy: per database, per account, per deployment

Tenant isolation is **physical, not a row predicate**: a tenant is **one database, one NATS account, and
one deployment**. There is no `tenant_id` column anywhere, no shared row store, and no shared subjects, so
per-database isolation (storage) and per-account isolation (messaging) are the **same boundary**. The data
model stays single-tenant-shaped; multi-tenancy lives at the orchestration layer (CNPG-per-tenant). One
noisy or compromised tenant cannot reach another because there is nothing shared to reach across
([identity and access](/architecture/identity-access/)).

## The one-binary promise

The same binary and the same code paths run the demo and the fleet. You do not adopt a different product
to scale: you run more roles, on more pods, against an external Postgres and NATS, with more edge nodes.
Simplicity at the small end, a real horizontal ceiling at the large end, one artifact across the range.

---

# Storage

URL: /architecture/storage/

How storage works: the Storage Gateway, views by default, per-database isolation, append-only partitioning and tiering, and the on-row lineage pattern.

Storage is the set of patterns every entity in Omniglass lands on, so an operator can trust that scope, audit, retention, and lineage behave the same way no matter which table the data lives in. This page describes **how storage works**, the
patterns every other leaf's entities land on, not a per-table column dump.

Postgres is the **relational system of record**: it holds the entities, events, alarms, actions,
audit, config, and the platform settings store. It is the record/state/intent lane. It is **never a
message bus**: the live signal travels on NATS JetStream, and Postgres earns its place as the durable
record. Two writes paths land here, and only one is the request path. **Operator mutations and the
record/state/intent lane** (config, ack/snooze, settings, manual commands, plus the `event` and
`alarm` rows an `event_rule` consumer commits in one transaction) are written synchronously through
the Storage Gateway. **The datapoint tables are an async SINK**: a NATS **persistence consumer**
batch-writes datapoints off the data lane ([datapoints](/architecture/datapoints/)), idempotent on
`(series, ts)`, so the rule engine never waits on a datapoint reaching Postgres. Committed changes on
the record lane are fanned out by a leader-elected **CDC publisher** (logical decoding of the WAL) to
JetStream; there is no dual-write, the change is born in the commit and CDC carries it. The column schemas live
with each owning feature: [datapoints](/architecture/datapoints/#the-datapoint-tables) (the three
kind-tables), [events](/architecture/events/#storage) (the `event` row), [alarms and
actions](/architecture/alarms-actions/#storage) (`alarm` / `action`), [config and
credentials](/architecture/variables/#storage) (`variable` / config / tags), [core
entities](/architecture/core-entities/) and [templates](/architecture/templates/) (the structural and
template tables), [collection](/architecture/collection/#storage) (interfaces and tasks),
[calculations](/architecture/calculations/#storage) (the rule families), [files](/architecture/files/),
[time](/architecture/time/#storage), and [identity and access](/architecture/identity-access/#storage).

## Conventions

- **No `tenant_id`.** Isolation is per-database (a database per tenant); there is no tenant column
  anywhere. The key registries `datapoint_type` and `event_type` carry a **`scope`** (template / org /
  official) deciding where the name is unique ([key scope](/architecture/datapoints/#key-scope-template-org-official)),
  and the non-template registries (`interface_type`, `component_type`, `variable_type`) carry an
  **`official` boolean**, the same axis minus the template layer: `official: true` rows are the
  ship-with canonical set distributed with the binary, and `official: false` rows are operator- or
  org-authored, local to this deployment.
- **Three storage shapes.** **Ground-truth records** are append-only and immutable, each named for
  what it is: `log_datapoint` (a datapoint kind), `audit_log` (operator actions), and the standing
  `*_log` ground-truth logs (`session_log`, `internal_log`, plus the `collection_log` /
  `node_log` companions). There is **no `telemetry` table**: datapoints are published to the
  JetStream data lane, not synchronously inserted, so the raw payload is not persisted in steady
  state; the persistence consumer sinks the typed datapoint, and raw appears only on a
  `collection.failed` event or a dev raw-mode tap ([datapoints](/architecture/datapoints/)). A
  schedule fire is not a record here: it is an `event` with `origin=scheduled`.
  There is no separate rule-execution table: derived rows carry their lineage on the row.
  **Datapoints** (`metric_datapoint` / `state_datapoint` / `log_datapoint`) are the typed
  observation firehose. **Stateful entities and projections** (`alarm`, `action`, current-value)
  hold state directly or are rebuildable read models, **views by default**. The model is **not
  event-sourced**.
- **Provenance and lineage on every datapoint**: `provenance` (observed / calculated / intended),
  `source` (which sensor or path, for observed), and a lineage pointer. observed and calculated both
  carry `source_rule` (+ version), the function or calc_rule that produced the row; intended carries
  `event_id` (the command). A CHECK enforces the pointer per provenance; **observed vs calculated is
  the `provenance` value itself**, not a column-presence trick. Declared config is not a datapoint
  provenance; it lives in [config](/architecture/variables/), keyed to the same signal.
- **Ownership is the exclusive-arc** on every datapoint table, `event`, `alarm`, and `variable`:
  `owner_kind` enum plus the matching typed FK (`component_id` / `system_id` / `location_id` /
  `node_id`, or none for the singleton `global`) plus a CHECK that exactly the matching column is set
  (or all null for `global`). System-, location-, node-, and global-level datapoints are first-class.
  The full pattern is on [core entities](/architecture/core-entities/#ownership-the-exclusive-arc).
- **Keys**: datapoints and events use a surrogate id plus `ts`; the key registry `datapoint_type`
  carries a **`scope`** (template / org / official) deciding where the name is unique (`(template_id, name)`
  at template scope, `name` at org/official); structural entities are name-keyed; a `task` is **content-addressed**
  (`hash(interface, kind, schedule, params)`); a `node` by name.

## How the records relate

The relationships, not the columns. The columns of each table live on its owning leaf (linked above).

```d2
direction: right
classes: { node: { style.border-radius: 8 } }
metric: metric_datapoint { class: node }
state: state_datapoint { class: node }
event: event { class: node }
alarm: alarm { class: node }
action: action { class: node }
current: current_value { class: node }
variable: variable { class: node }
metric -> metric: calc_rule
state -> event: event_rule
event -> alarm: fire opens · clear resolves
event -> action: action_rule
alarm -> action
metric -> current: view: latest per key+provenance
state -> variable: linked_state (observed side)
```

The structural and template entities (`component` / `system` / `location` and the `*_template` /
`*_template_version` / `system_template_member` / `system_member` families) relate as shown on
[core entities](/architecture/core-entities/) and [templates](/architecture/templates/); the
collection entities (`interface_type` / `interface` / `task`) on
[collection](/architecture/collection/#storage).

## Two lanes land in Postgres differently

Every row in Postgres arrives on one of two lanes, and the lane decides how the row is written and
how the rest of the platform learns it changed.

- **The data lane (a sink).** Observed and calculated datapoints live on the JetStream data lane.
  The rule engine consumes them directly off NATS; Postgres is the durable record, not the live
  signal. The **persistence consumer** is a durable JetStream consumer that batch-writes the
  `metric_datapoint` / `state_datapoint` / `log_datapoint` tables as an async sink, idempotent on
  `(series, ts)`, so a redelivery lands the same row and the firehose never blocks on the database.
  Datapoints do **not** flow through CDC: they are already on NATS.
- **The record/state/intent lane (PG-first, CDC-out).** Events, alarms, actions, and operator
  mutations (config, ack/snooze, settings, manual commands) are born in a **Postgres transaction**.
  When an `event_rule` consumer fires, it writes the `event` row and the `alarm` transition in one
  transaction (the alarm transition is serialized per `(event_rule, owner)`); the API writes config,
  acks, and settings the same way. There is no row-lock single-fire worklist and no
  `LISTEN`/`NOTIFY` fan-out: the change is committed once, and the **CDC publisher** carries it
  outward.

The CDC publisher is **leader-elected** (exactly one active, fail over on death) via a NATS KV
CAS lock, the same singleton pattern the clock uses ([time](/architecture/time/)). It reads the WAL
by logical decoding and publishes each committed change to JetStream, where `action_rule`,
reconcile, and projection consumers react. The replication **slot** and **publication** it reads are
**ensured in the idempotent boot phase** (the same phase that upserts ship-with reference data),
**not** a run-once migration: boot creates them if absent and leaves them untouched if present, so a
fresh database and an existing one converge to the same state. Delivery is at-least-once with an
idempotency key per change, so a consumer that sees a change twice is a no-op.

## Ground-truth records

The immutable, append-only records, each named for what it is. They are the lineage targets and what
a backtest reads; none is derived. The detailed columns of `audit_log` live on
[audit](/architecture/audit/), `session_log` on [nodes](/architecture/nodes/#sessions); the rest is a
compact list here because storage is their natural architectural home:

- **`log_datapoint`** (a component's own words, a datapoint kind, [datapoints](/architecture/datapoints/));
- **`audit_log`** (operator actions: actor, verb, resource, `old -> new`; the lineage target for
  operator writes; secret decrypts always recorded, [audit](/architecture/audit/));
- **`session_log`** (connection-lifecycle transitions, node-reported; the connection log,
  [nodes](/architecture/nodes/#sessions));
- **`internal_log`** (platform self-narration: startup / reconcile / migration / node-reg /
  config-sync, [workers](/architecture/workers/));
- the **`collection_log`** / **`node_log`** companions (the cheap per-run execution record
  and the node's operational narration).

There is **no separate rule-execution table**: a derived row *is* the evidence of its rule's run,
carrying its lineage on the row (below).

## The lineage CHECK (the pattern)

Lineage lives on the derived row, no separate execution table. This is the **pattern** every derived
row follows: `source_rule` (+ version) is set for observed and calculated (the function or calc_rule
that produced the row); intended carries the command `event_id`. The pointer per provenance is enforced
so e.g. "intended with no command event" is impossible at the storage layer. One example, the datapoint
tables:

```sql
CHECK (
     (provenance IN ('observed','calculated') AND source_rule IS NOT NULL AND event_id IS NULL)
  OR (provenance = 'intended'                 AND event_id IS NOT NULL AND source_rule IS NULL)
)
```

Observed and calculated both carry `source_rule`; they are distinguished by the **`provenance`
column**, not a pointer-presence trick (an edge function versus a calc_rule). The intended split is
the one the CHECK enforces. This is one of three layers: the CHECK enforces *which pointers are populated*, foreign keys enforce
*the ids are real*, and the app enforces *the value type matches the key's kind*.

The datapoint tables also carry nullable **`correlation_id`** and **`caused_by_event_id`** trace
columns. These are orthogonal to the lineage pointers above: they are not lineage pointers, so they
do not participate in the exclusive-lineage CHECK. They carry causation across the command -> device
-> observed-datapoint round trip so the cycle guard walks a real id ([datapoints](/architecture/datapoints/),
[alarms and actions](/architecture/alarms-actions/)). On the wire these ride in **NATS message
headers**: a datapoint published to the data lane carries its `correlation_id` / `caused_by_event_id`
in the message header alongside the `Nats-Msg-Id` dedup key, and the persistence consumer lands them
into these columns, so the trace is unbroken from the live signal to the durable record.

## Current value and projections: views by default

`alarm` and `action` are **stateful entities** that hold their own current state in a real table
(not event-sourced). Everything else that is "current state" is a **read model**, and the default is
a **plain SQL view** (always-correct, never stale, zero maintenance). A worker-maintained table is a
**measured optimization**, earned only when a read profile shows a view too slow.

| Read model | Of | Shape | Notes |
|---|---|---|---|
| `current_value` | latest datapoint per (owner, key, **instance**, **provenance**), fused across sources per the key's `fusion_policy` | **view** | the dashboard read; per-provenance so observed and intended are both visible (the divergence model needs both), per-instance so siblings of one key stay distinct, fusion applied on read. The one table candidate if a profile earns it, metric kind only |
| `session` | `session_log` | **view** | low-volume; node, interface, status, opened_at, last_activity_at, command/error counts |

**When the view stops scaling.** A latest-per-key view's cost scales with the number of **distinct
keys** (a loose index scan), not total rows. Point and scoped reads ("current value of X on Y") are
a covering-index probe, fast at any size. A full-fleet "every current value" is O(distinct keys):
comfortable to hundreds of thousands, painful past a few million. A naive `DISTINCT ON` scans the
whole log and dies on the firehose; never that plan.

So only `current_value` for the **metric** firehose is even a table candidate, and only when
frequent full-fleet reads meet low-millions-plus distinct keys. The sparse kinds (`state` / `log`)
stay views indefinitely. A worker-maintained table costs **one upsert per datapoint write** (write
amplification, hot-key contention) and reintroduces a staleness window; that cost must be earned by
a read profile, not assumed. **Never a materialized view**: a PG MV is stale between refreshes and
has no incremental refresh, so a refresh is a full firehose recompute. The choice is plain view
(default) versus inline table (profiled).

:::caution[Open question]
If `current_value` is ever materialized, is it one wide table or a table per kind, keyed per (owner,
key, instance, provenance)?
:::

## Partitioning and retention

- **Append-only tables are range-partitioned by `ts`** (native declarative partitioning;
  `pg_partman` where the provider permits, else a documented manual roll). The firehose
  (`metric_datapoint`) is the partitioning-critical one.
- **Retention is per table**, set by policy, not one global TTL: `metric_datapoint` short,
  `state_datapoint` / `log_datapoint` longer, `audit_log` longest (compliance), `internal_log`
  short. On-row lineage ages out with its datapoint. The per-table defaults are **cascade-resolved**
  ([cascade](/architecture/cascade/)) with global defaults, so a class or entity can hold longer or
  shorter without a global change.
- **The `raw_sample` buffer** (the opt-in raw-retention policy, [collection](/architecture/collection/))
  is range-partitioned by `ts` and cold-tierable like the metric partitions, on a short retention. It
  is bounded, sampled, and short-lived; it is not a telemetry table.
- **Views are not partitioned** (bounded by fleet size, not time) and are computed from the
  underlying tables, never the source of truth.

:::caution[Open question]
The index strategy per datapoint table beyond the obvious (BRIN on metric `ts`, GIN on log body),
tuned against real volume.
:::

:::caution[Open question]
The append-only id type under partitioning: bigint identity versus uuid v7.
:::

## The Storage Gateway and tiering

The **Storage Gateway is the only door to the database** (no direct access, no
PostgREST); it is also where IAM scope is injected, **per action**: every query carries
`visible_set(P, action)` for the specific action it performs, so a read filters by read-scope and an
`:ack` write filters by ack-scope. A write whose action-scoped predicate matches **0 rows** is surfaced to
the handler as a 403 or 404, never a silent success, matching the up-front `canDo` decision
([identity and access](/architecture/identity-access/)). Isolation is per-database (one database per
tenant, paired one-to-one with one NATS account, [datapoints](/architecture/datapoints/)), so there
is no tenant context to set. Every read and write lands here: the synchronous request path runs in
**scoped** mode, and the persistence-consumer datapoint sink and the CDC publisher run in **system**
mode (trusted internal work, all-visibility), the same three-mode contract identity and access
describes. The CDC publisher reads committed changes by **logical decoding of the WAL**, a
replication-protocol stream beneath the table surface; that is how it learns of a change without
re-querying, not a second application path around the Gateway. Because every
application read and write goes through the Gateway, the physical backend is swappable beneath it:

- **default**: Postgres for everything (datapoints, ground-truth records, views, registries). In
  single-binary mode the one binary embeds a real Postgres (the same code path runs an external
  Postgres at scale); the data lane's persistence consumer and the record lane's CDC publisher both
  target this one backend.
- **tiering**: the firehose does not stay in hot Postgres forever. Aged
  `metric_datapoint` / `log_datapoint` partitions tier out to a **columnar or object
  store** (Parquet on S3-compatible, or an embedded columnar engine) behind the same gateway, so
  historical queries fan across hot and cold with no model change. The cold tier is partitioned by
  `ts`.

:::caution[Open question]
Which cold engine backs the tier, what triggers tier-out (age versus a partition-detach hook), how
queries federate across hot and cold, and whether projections ever tier.
:::

## Query construction: typed, parameterized, generated

The gateway builds every query with **[jet](https://github.com/go-jet/jet)**, a type-safe SQL builder
whose column and table types are **generated from the dbmate-managed schema** (dbmate stays the single
schema authority; jet regenerates after `migrate`). The shape is dynamic (the per-action scope predicate,
the [filter expression](/architecture/expressions/), order, pagination compose at runtime) but the safety
is **structural, not by discipline**:

- **Values are always bound parameters**, never interpolated into SQL text.
- **Identifiers (columns, tables) are typed constants** from the generated schema, so a wrong or
  attacker-supplied column name is a **compile error**, never a string. The filter language's field names
  resolve against those same generated columns before they become a predicate.
- **Operators are a closed set.**

A wrong column or type fails the build, so the compiler and tests catch a bad query before runtime, which
is what keeps the gateway safe to evolve and safe for an AI to edit. Because all dynamic construction
lives in this one module, the injection-safe discipline is a single reviewable chokepoint. The one
carve-out is the high-volume datapoint insert (the persistence consumer), which may use `pgx` `COPY` for
throughput, still inside the gateway. It runs in all-visibility **system mode**, not per-row scoped: its
safety rests on the typed column targets plus the upstream **admission consumer** having already confined
owners ([identity and access](/architecture/identity-access/)), not on a per-write scope predicate.

---

# Templates

URL: /architecture/templates/

The immutable, versioned shapes that instances pin: the component_template (the device shape) and the system_template (the composition shape with its frozen BOM).

Templates let an operator define a device or system class once and stamp it onto many instances, with each instance pinned to a frozen version so its keys and roles never shift underneath it. Templates are the **immutable, versioned, content-hashable shapes**
that instances pin. A [component](/architecture/core-entities/) pins a `component_template_version`;
a [system](/architecture/core-entities/) pins a `system_template_version`. Editing a template mints a
**new version**; an instance pins one frozen version, or tracks `latest`, or follows a channel
(`stable` / `beta`), and re-pointing is explicit, so the keys and roles never change under a pinned
instance.

## The component_template: the device shape

A **`component_template` is the direct mirror to a Zabbix template**: it bundles, as one versioned
unit, everything needed to monitor and control a class of device. Where a Zabbix template ships items,
triggers, macros, and tags, ours ships:

- **collection** authored as [functions](/architecture/collection/) (inputs, interfaces, functions),
  below;
- **commands** (command-triggered functions the device supports, e.g. `reboot`, `set-input`), detail
  in [collection](/architecture/collection/);
- **`datapoint_type`s** (kind / unit / validation live on the registry, see
  [datapoints](/architecture/datapoints/#the-datapoint_type-registry); a template declares its keys at
  **template** scope, or references an **org** / **official** key, see [Template-scoped keys](#template-scoped-keys-and-optional-alignment));
- required **[config](/architecture/variables/)** and defaults, and the **credential shapes** it needs
  (see [config and credentials](/architecture/variables/));
- default **tags**;
- default **alarms / health** (the trigger mirror; [alarms and actions](/architecture/alarms-actions/)
  owns the detail).

A template is authored once and **assigned to an existing component**; the node then executes the
result.

| Family | What it is | Examples |
|---|---|---|
| `component_type` | classification | device, app, cloud-api |
| `component_template` | the **device shape**: everything about a class of device | Polaris DSP 16, Cisco Room Kit Pro, Q-SYS Core |
| `component` | a deployed instance | `dsp-boardroom-3` |

### Collection is built from functions

A template's collection is authored as [functions](/architecture/collection/): `inputs` (typed
parameters), `interfaces` (connections declared once, possibly persistent), and `functions` (each a
trigger plus a DAG of steps that parse at the edge and emit datapoints). A command is a
command-triggered function in the same model. See [collection](/architecture/collection/) for the full
schema; this page covers the rest of the device shape.

### Template-scoped keys and optional alignment

A template declares its datapoints **and** commands at **template scope** by default: auto-discoverable,
no registry friction, identified by `(template_id, name)` so two templates can both declare an `input`
with no collision ([key scope](/architecture/datapoints/#key-scope-template-org-official)). It may
**optionally align** each datapoint to an org or official canonical key. Alignment is just
**referencing** a canonical `datapoint_type` (plus an optional value transform), which is what buys
cross-fleet comparability, dashboards, and AI; the shipped official set covers the common signals, so
most templates align by referencing one. That value transform is also where the device's **native
unit** is normalized to the key's **canonical unit** before the datapoint is emitted (a Fahrenheit
display's template emits celsius), so storage stays single-unit ([datapoints](/architecture/datapoints/)). Commands are template-scoped (the functions live on
the template); a canonical **command type** (the abstract `reboot` to per-model layer) follows the same
promotion ladder.

:::caution[Open question]
The `args` typing vocabulary for commands (which scalar and structured arg types it admits) and how
command results beyond `success-when` map to the `action` row fields.
:::

### The rest of the shape

- **Config.** The template declares the [config](/architecture/variables/) a component *requires*
  (connection and inventory facts, e.g. `ip-addr`, `serial`) and their defaults. Effective values
  resolve through the cascade ([cascade](/architecture/cascade/)).
- **Credential shapes.** The template declares the *kinds* of credential the device needs
  (`basic_auth`, `snmp_community`, `bearer_token`); these are
  [`variable_type`](/architecture/variables/) shapes, bound to actual secret values at assignment
  (credentials).
- **Tags.** Default org labels seeded onto the component (`category: audio-dsp`).
- **Alarms / health.** Default `event_rule`s the template ships, the conditions worth catching for
  its device class (the Zabbix-trigger mirror: a fan-stall on a DSP template, a person-entered event
  on an occupancy template). The alarming policy lives on the **template**, not on the
  `datapoint_type`: a `datapoint_type` is pure identity (kind / unit / domain / validation / fusion)
  and carries no event rules. A *truly universal* default (e.g. `cpu.utilization > 0.9` everywhere) is
  an official **rule-set scoped by a group or key filter**, resolved through the cascade (the rule
  accumulation mechanism), not a `datapoint_type` attribute. Owned in detail by
  [alarms and actions](/architecture/alarms-actions/).
- **Function trigger params are cascade bases.** A function's `interval: 30s` is the floor of the
  cascade, overridable by a location, group, or the instance (the `poll_interval` example in
  [cascade](/architecture/cascade/)), not a hard value.

:::caution[Open question]
Whether a template's default `event_rule`s are declared inline on the version or referenced (the
policy is template-authored either way; this is the storage shape, co-designed with the
alarms-and-actions model).
:::

### Deploy: assign a template to an existing component

Assigning a template to a component materializes its collection in one action: it binds the template's
required [`inputs`](/architecture/collection/#inputs-the-templates-typed-parameters) (the `:apply`
gate, a 422 lists any unmet required fields), writes the supplied inputs as the component's
[config](/architecture/variables/) (declared, audited), resolves the interfaces, and compiles the
functions to the per-node runtime unit at the server-chosen node. Re-applying converges. The 80% case
is one action, as cheap as "add host".

### Integrity, authenticity, and the capability gate

A template carries two distinct trust properties, and they are not the same property.

- **Integrity** is the **content hash** every version already carries: it answers "is this the exact
  bytes that were authored", and it makes a `component_template_version` a stable, addressable artifact.
  It says nothing about *who* authored those bytes.
- **Authenticity** is a separate, **optional author signature / attestation** on the
  `component_template_version`. The signature is over the content hash, so a verified signature binds an
  authoring identity to those exact bytes. The signature is **verified on import**: an unsigned template
  imports as unattributed (the operator owns the risk), a signed template imports with its author
  identity and verification result recorded, and a signature that does not verify against the content it
  claims to cover is rejected.

A template also declares a **capability manifest**: the set of **write-commands** it exercises and the
**credential shapes** it requires (the `reboot` / `set-input` commands it issues, the `basic_auth` /
`snmp_community` / `bearer_token` shapes it binds). The manifest is derived from the template, not
operator-asserted, so it cannot understate what the template does. At [`:apply`](#deploy-assign-a-template-to-an-existing-component)
the manifest is **shown and approved**: an operator sees exactly which device-mutating commands and
credential shapes they are authorizing before the template materializes onto a component. Approving the
manifest is the consent record for that capability set.

A **device-mutating** template (one whose manifest declares any write-command) does **not** silently
follow `latest` or a channel into a new capability set. Tracking `latest` or a channel still moves a
read-only template forward automatically, but a new version of a device-mutating template that changes
its capability manifest is gated behind an **explicit operator re-pin**: the operator re-approves the
manifest at the new version before it takes effect. Auto-update never expands what commands run against a
device without a human approving the expansion.

The **hosted / marketplace** path verifies author signatures and enforces the capability gate on every
import regardless of the self-host runtime stance. Self-hosters still own the risk of what they import
(governance is curation, [collection](/architecture/collection/)): the runtime does not refuse an
unsigned self-hosted template, but the signature state and the approved capability manifest are recorded
either way.

## The system_template: the composition shape

A **`system_template`** is first-class and **fully parallel to the component_template**: it declares a
system's composition, its members and their roles, and the system-level rules and KPIs that only the
system can see. Like a component template it is mutable, **versioned**, and **content-hashable** (each
`system_template_version` is a stable artifact, addressable by a content hash); **editing mints a new
version** and a system pins the **immutable** `system_template_version` snapshot.

- **The frozen bill of materials.** A `system_template_version` carries a **`system_template_member`**
  for each role: the role, its **requirement** (the canonical datapoints and commands a member must
  provide), and the `health_role` (how it counts in the rollup). The role list, the requirements, and
  the health roles are **frozen into the version**; what an instance actually assigns is any component
  whose template meets the role's requirement, so the role validates against the system's frozen version
  and an assignment never expires under it.
- **Pin, latest, or a channel.** A system (the instance) either pins a **specific
  `system_template_version`**, or tracks **`latest`**, or follows a **channel** (e.g. `stable` /
  `beta`). This is the same pin-vs-channel choice a component makes.
- **Edits never silently change a pinned system.** An edit **mints a new version** and does **not**
  move a pinned system or its frozen role requirements; only instances tracking `latest` or a channel
  pick the change up (and a channel only when the new version is promoted into it). The immutability
  guarantee is explicit: a frozen system and its frozen BOM stay exactly as pinned until an operator
  re-points them.
- **`health_role` rides the frozen version.** Each member declares `required` / `redundant` /
  `informational`, the knob for the built-in role-aware health rollup ([health](/architecture/health/)).
  It lives on the `system_template_member` (not on the component) because the same device can be
  required in one system and redundant in another.
- **System-level rules, flows, and KPIs.** The system template owns the conditions only the system
  cares about: system-scoped `event_rule`s over member data (a display on input 2 is fine for the
  display but wrong for the room), the [flows](/architecture/alarms-actions/) that respond, and the
  system-owned [KPIs](/architecture/health/) (availability, the utilization family). These are stated
  here briefly; the rule and KPI detail live on [alarms and actions](/architecture/alarms-actions/)
  and [health](/architecture/health/).
- **System-level config.** Config declared on a system template (or a role slot) resolves onto
  whichever component fills the role, so `video.input = HDMI1` for the main-display role applies to
  whatever display is assigned ([config and credentials](/architecture/variables/)).

### Role requirements

A role declares **what a member must provide**, in canonical terms; any component whose template meets it
can fill the role:

```yaml
role: main-display
requires:
  datapoints: [display.power, video.input]   # canonical datapoint_types
  commands:   [set-input, power]             # canonical command types
health_role: required
```

- **A checklist, not a matching engine.** A component's template **qualifies** when it aligns the
  required canonical [datapoint_types](/architecture/datapoints/) and command types (its set is a
  superset of the requirement). The requirement is stated in **canonical** keys, because it only means
  the same thing across templates when it names a canonical signal, not a template-local one.
- **Qualify, then assign.** Pairing a component to a role **filters the picker to qualifying templates**,
  and the [API](/architecture/api/) **validates on assign** (a clean 422 if the component's template is
  missing a required datapoint or command). Mixed-vendor falls out for free: an LG and a Sony display
  both qualify for `display` if both align the required signals, with nothing to enumerate.
- **No allow-list of templates.** A role names *what it needs*, never *which templates*. Declare the
  requirement on the system template, and any qualifying component, today's or a future vendor's, fills
  it.

**Removing a capability is gated at adoption, not authoring.** Dropping a required datapoint from a
component template means **minting a new `component_template_version`** without it, which is never
blocked. The old version is **immutable**, so every live assignment pinned to it keeps working. The new
version simply **no longer qualifies** for any role requiring the dropped signal, so it cannot be adopted
into that role (the same validate-on-assign check fires at re-point), and a role tracking `latest` or a
channel will not auto-jump to it. Removal surfaces at adoption against frozen versions, never as a silent
break. (Deleting an org-canonical `datapoint_type` a requirement references is a registry-governance warn
or block, the same surfaced-not-silent pattern.)

**The runtime backstop.** Validate-on-assign is prevention; detection covers anything it misses (an
optional input, a drifted alignment). A system calc that reads a datapoint it cannot find **fails
loudly**, a `calc.failed` event and a health-impacting "misconfigured" alarm, never a silent empty
result. Two layers: gate at assignment, shout at runtime.

This resolves **flexibility versus reliability** without trading either: templates evolve freely and any
qualifying component fills a role (flexibility), while the requirement gates adoption against immutable
versions with a noisy backstop (reliability).

| Table | Key columns | Notes |
|---|---|---|
| `system_template` | name, type, **spec (jsonb)** | the mutable system shape; editing mints a new version |
| `system_template_version` | (template, **version**), frozen **spec** | the **immutable** snapshot a system pins; roles never change under it |
| `system_template_member` | (system_template_version, **role**, **requires** (canonical datapoints + commands), **health_role**) | the frozen **role requirement**: role -> the canonical datapoints and commands a member must provide + health role (required / redundant / informational, [health](/architecture/health/)). Any component whose template meets it can fill the role, validated on assign ([role requirements](#role-requirements)) |

```d2
direction: right
classes: { node: { style.border-radius: 8 } }
ct: component_template { class: node }
ctv: component_template_version { class: node }
st: system_template { class: node }
stv: system_template_version { class: node }
stm: system_template_member { class: node }
dt: datapoint_type { class: node }
component: component { class: node }
system: system { class: node }
ct -> ctv: versions
st -> stv: versions
stv -> stm: frozen BOM (role + health_role)
dt -> stm: required by role
component -> ctv: pins
system -> stv: pins
```

Locations have no template: the `location_type` is the only shape-definer
([core entities](/architecture/core-entities/)).

:::caution[Open question]
Whether a `LocationTemplate` (`kind` is reserved in the collection apiVersion) is ever introduced, or
locations stay template-less.
:::

---

# Time

URL: /architecture/time/

The one primitive that manufactures events from the passage of time, so the rest of the pipeline stays purely event-driven.

Time lets an operator alarm on things that produce no event of their own, "10 minutes elapsed", "it is 8am Monday", "the data stopped", by turning the passage of time into events the rest of the pipeline consumes.

## Why time needs a primitive

Everything else is **push-driven**: an event arrives, rules fire. Time is the one input that
**arrives as nothing**. "10 minutes elapsed," "it is 8am Monday," and especially "the data
*stopped*" produce no inbound event, so nothing would ever fire on them. This primitive's whole
job is to turn the passage of time into events the normal pipeline consumes.

## The pair: schedule, timer

- **`schedule`** (config): a recurring definition, a cron or rrule plus an IANA timezone and what
  it triggers. Config, like a rule.

:::caution[Open question]
The recurrence surface a `schedule` accepts: a full iCalendar rrule, or a cron subset plus calendar
anchors like month-start and month-end.
:::
- **`timer`** (mechanism, working-set): every *pending* fire, kind-discriminated
  (`schedule-tick | for-sustain | runbook-wait | watchdog`), with a `fire_at` and a pointer to
  what it is for. A PG row, the durable working set. The clock singleton scans due rows and
  realizes each fire on its lane (a record-lane fire is written to PG and CDC fans it out to
  JetStream; a watchdog's staleness enters the data lane as a derived datapoint); rows are then
  consumed and rescheduled. A mutable working-set, like the outbox, **not** a history log.

A schedule fire is **not** a separate log table: it is an ordinary **`event` with
`origin=scheduled`**, manufactured by the clock into the `event` log. The event is born in a PG
transaction (record plus any alarm transition) the same as any other event, never published
directly (no dual-write), and the history of schedule fires lives in the `event` log alongside
caught, caused, and derived events. The leader-elected CDC publisher fans the committed event out
to JetStream, where an `action_rule` consumer reacts to it exactly as it reacts to any other event.

## One mechanism, three patterns

All time behavior is the one `timer` table scanned by the clock singleton (sorted by `fire_at`,
woken by a ticker with a crash-recovery backstop), each due row's fire realized on its lane (a
record-lane fire born in PG and CDC-fanned to JetStream, a watchdog's staleness onto the data lane):

- **recurring** (a schedule): reschedule the next `fire_at` after firing. Digests, synthetic
  checks, SLA calendar resets.
- **armed and cancellable** (a relative one-shot): armed by an event, fires later, cancelled if
  the condition clears. The `for`-duration sustain, runbook waits, escalation delays.
- **reset-on-arrival** (a watchdog): pushed to `now + tolerance` on each datapoint, fires if it
  lapses. No-data and staleness.

Durable (a table, survives restart), single-fire across replicas: the clock is a leader-elected
singleton, exactly one active at a time, held by a NATS KV CAS lock and failed over on death, so
no replica races another to claim a row.

:::caution[Open question]
Whether a runbook's per-step waits each get their own `timer` row, or one row is advanced per step.
:::

:::caution[Open question]
The clock singleton's wake strategy: wake-on-insert for near-term fires plus a coarse backstop
ticker, so a far-future schedule needs no frequent ticks.
:::

## A fire is recorded once, on the log of what it produces

The `timer` table is mechanism; the **event is the product**. Each fire lands on the log of
whatever it drives, never twice:

| Timer kind | Produces | Logged on |
|---|---|---|
| schedule-tick | a trigger | an `event` (`origin=scheduled`) |
| for-sustain | the alarm opens | an `event` (alarm edge) |
| runbook-wait | the action advances | the `action` row |
| watchdog | the datapoint goes stale | `datapoint` |

So every schedule fire is an `event` with `origin=scheduled`, and every other timer fire is on
the entity it advances. No untracked fires, no double-logging, and the high-churn watchdog never
floods an event log with its resets.

## The backtest split

Time divides cleanly across the backtest boundary:

- **Schedules and armed timers are ground truth.** The wall clock genuinely advanced and a digest
  genuinely went out at 8am; a backtest does not re-run the clock, it reads the recorded
  `origin=scheduled` events as-is.
- **No-data is derived.** The gap is *already in the recorded data* (the absence of datapoint rows
  in a window), so a backtest re-detects the same gaps and would re-emit the same staleness, no clock
  needed. At runtime it needs a real watchdog (you cannot know data is missing until the deadline
  passes), but logically it is a `calc_rule` reading arrival times.

## A schedule fire is the `origin=scheduled` event

An `action_rule` consumer reacts to a schedule fire exactly as it reacts to an alarm, so
`origin=scheduled` is the uniform "rules consume events" model, not special wiring:

```yaml
action_rule:
  on: event
  when: 'origin == "scheduled" && schedule == "daily-digest"'
  action: email-open-alarms-summary
```

A synthetic check, an SLA window reset, and a digest are all schedules whose fire an action (or a
check) subscribes to.

## No-data: stale vs unknown

Absence of data is two conditions, and the why matters:

- **`stale`**: we *had* a value and it has aged past its expected cadence. The watchdog's product
  (it can only arm after a first arrival). The last value and its **age are retained**; usually
  **actionable**, because a signal that stopped most often means lost visibility (the source
  died). The watchdog emits a derived staleness datapoint (`X stale at T`, and `fresh again` on
  resume).
- **`unknown`**: **never** observed. No baseline, no last value. A static "not monitored yet"
  condition (a fresh device, a datapoint_type never reported), detected by "no observations
  exist," not by a watchdog. Gray, not actionable.

`current_value` carries `value, as_of_ts, freshness (fresh | stale)`; staleness is a quality of
the datapoint with the last value preserved. **[Health](/architecture/health/) treats them
differently**: a *stale required member* defaults to `unknown` (lost visibility, so the system
rolls to `unknown`, [health](/architecture/health/)), an *unknown member* is gray and does not down the system. Whether stale means "last value still valid" (a
slow config signal) or "lost visibility, alarm" (a liveness signal) is **per-datapoint-type
policy**: the datapoint_type declares its staleness tolerance.

These two absences surface on the [health](/architecture/health/) side as `unknown` reasons:
a went-stale datapoint is the `stale` reason, and a covered-but-never-reported datapoint is the
`no-data` reason (distinct from `uncovered`, where no health-impacting rule resolves at all).

**Cadence is inferred for pollers, declared for heartbeats.** A poller's expected interval is its
`interval` times a tolerance. A listen-triggered function is **opt-in**: watched only if it declares
an expected heartbeat interval (an MQTT keepalive, a source that pings); silence on a listener
with no declared heartbeat is normal and unwatched.

:::caution[Open question]
The watchdog tolerance defaults (the multiplier on a poller's `interval`) and whether to debounce a
missed-poll burst before declaring stale.
:::

## Timezones

Every stored instant is a **`timestamptz`** (UTC, tz-aware), universal everywhere. A **`schedule`
additionally carries an IANA timezone** (`America/New_York`) for computing recurrence and calendar
boundaries, because DST means "8am" and "the 1st of the month" cannot be precomputed as fixed
offsets. The resolved `fire_at` is a `timestamptz`; the recurrence is computed in the schedule's
timezone.

## Digests

A digest is a **schedule that fires an aggregating action**: the `origin=scheduled` event triggers
an `action_rule` whose action queries (open alarms, the day's events), renders a Go-template body
([alarms and actions](/architecture/alarms-actions/)), and sends. No new machinery: schedule plus
action, composed.

## Storage

The recurring trigger config and the clock singleton's pending-fire working set; the physical layout lives on [storage](/architecture/storage/).

| Table | Key columns | Notes |
|---|---|---|
| `schedule` | id, rrule/cron, **tz (IANA)**, target, enabled | config: a recurring trigger |
| `timer` | id, **fire_at (timestamptz)**, kind (schedule-tick / for-sustain / runbook-wait / watchdog), ref, payload | the clock singleton's pending-fire **working-set** (the durable PG working set, mutable, scanned for due rows and the fire realized on its lane: a record-lane fire born in PG and CDC-fanned to JetStream, a watchdog's staleness onto the data lane), not a history log; fires are logged on the entity they produce |

---

# UI

URL: /architecture/ui/

The operator console: one renderer library in two composition modes, reads through views, and an identity-based information architecture.

The UI is where an operator actually does the work, so it is built as one renderer over the same views the rest of the platform reads, with an information architecture organized around the entities you care about. This page covers the renderer / page / dashboard model and the information architecture. The stack, the typed client, the build pipeline, and
the concrete reusable primitives are the [design system](/contributing/design-system/).

## The renderer contract: ViewResult and the views BFF

The whole console rests on one contract. **All UI reads go through [views](/architecture/views/)**
(the read-side BFF), CRUD for writes; the operator never queries raw tables. Every view returns a
uniform **`ViewResult`** (`{columns, rows}`), and the SPA renders any view through **one renderer per
view**: adding a view does not add a bespoke renderer. This is what decouples the render layer from any
specific query and keeps the read contract uniform whether a page is coded or a dashboard widget is
configured.

The **dense-ops layout is an architectural pattern**, not a one-off page: list surfaces follow one
shape (a summary of donut facets over the full set, then a keyboard chip filter, then a group-by table,
then a click-row detail drawer plus a full detail page), and the facets drive the filter while the
summary stays whole so click-to-filter is stable. The concrete extracted primitives that realize the
pattern (`DensePage`, `FilterBar`, `Donut`, `SummaryFacet`, `Drawer`, `HealthBadge`, `Actor`,
`Sparkline`) live in the [design system](/contributing/design-system/); the pattern is the model.

## One renderer library, two composition modes

The factoring avoids both "every screen is hand-coded" and "everything must be a dashboard":

- **Renderer library** (coded once): `stat`, `table`, `status-grid`, `timeline`, `heatmap`,
  `line` / `area`. Each takes a **view result plus a field-mapping** (which column is the value /
  label / time / series key), so a renderer is decoupled from any specific view, and any view of the
  right shape can feed it. The set is closed but grown reactively, the same discipline as the
  reducer vocabulary.

  :::caution[Open question]
  The field-mapping contract between a view result and each renderer (the column roles per renderer
  type).
  :::
- **Coded pages** compose renderers plus custom interaction: the built-in information architecture
  (overview, drill-downs, config forms, exploration).
- **Composable dashboards** (config-driven): operator-built grids where each
  **widget = a view ref + a renderer + a field-mapping + params**, no code per dashboard.
  Dashboard-level params flow into widget view-params, so one "system overview" dashboard works for
  any system.

  :::caution[Open question]
  The composable-dashboard schema: the widget placement grid, the view binding, and the dashboard
  params.
  :::

  :::caution[Open question]
  Whether dashboards are themselves resources (carrying the `official` boolean, saved like views) or
  a thin layer over saved views.
  :::

The contract underneath both: **all UI reads go through [views](/architecture/views/)**, CRUD
for writes. The renderer library serves coded pages and dashboard widgets identically; the only
difference is whether the composition is code or config.

## Coded pages and dashboards share one view layer

Coded pages give the complete operator console; composable dashboards are the customization layer on
top (a grid editor, widget config, and the view-binding UI), and the view layer is what makes them
cheap. A built-in page **queries a default view, not a raw resource** (the Alarms page reads the
`firing-now` view, not `GET /alarms` directly), so the read contract is uniform and the same view
backs a dashboard widget unchanged.

## Live updates: polling by default

Live data is **query polling** (a refetch interval; slow-changing config uses a long stale time). A
read can also **stream over the view layer (a server-side SSE relay)** where latency or fan-out
earns it, the same earn-it-with-a-profile discipline. Presentation that depends on config (a severity
level's id to its label and color) resolves client-side from the config view. A datapoint
value resolves the same way: on read the UI converts canonical to the operator's preferred
display unit, looked up from the unit registry by the [datapoints](/architecture/datapoints/)
datapoint_type's canonical unit, so storage stays single-unit while one operator sees
Celsius and another Fahrenheit.

:::caution[Open question]
Which high-frequency surfaces move from polling to the SSE relay, and what latency earns it.
:::

## Configuration UIs

CRUD forms over the typed resource API, one per primitive (components, templates, rules, config,
tags, groups, schedules, severity levels, and the IAM resources). Editing a setting is editing
**[config](/architecture/variables/)**, an audited mutation, not a separate prop store
([audit](/architecture/audit/)). The standout is the **rule-authoring
page**:

- an **Expr editor** for the predicate or condition, with the prepared-input contract surfaced
  ([expressions](/architecture/expressions/));
- a **live blast-radius preview** (which entities a scope selects, which datapoints a rule would
  have fired on), so a rule is validated against reality before it is saved;
- the **AI-suggestion seam** ([AI](/architecture/ai/)): AI may propose a rule pre-filled with
  provenance; the operator edits and approves, and approval is the ordinary audited create. AI never
  saves a rule itself.

## Exploration UIs

Coded pages with rich interaction, all reading through views:

- **The cascade resolve view** (the standout): "why did this value win", rendered from the
  [cascade](/architecture/cascade/) resolve output: the effective value, the winning source, and the
  ordered shadowed bindings it beat. The feature that makes an opinionated cascade explainable.
- **Datapoint history**: a `line` or `heatmap` over a chosen time range, with the stale / unknown
  distinction surfaced ([time](/architecture/time/)).
- **Alarm drill-down**: the alarm, its triggering datapoint and history, the actions it fired, and
  ack / snooze / resolve controls.
- **Inventory and topology**: the location / system / component trees, navigable, with
  [health](/architecture/health/) (`status-grid`) at each level.
- **Event exploration**: query the event log by entity / time / category, with the audit trail.

## Information architecture

The IA has two layers, deliberately decoupled:

1. **Routes are flat and identity-based.** Every entity page is a top-level path (`/systems`,
   `/components`, `/templates`, `/config`); a page's URL addresses the *entity*, never its place in
   the menu. This is the contract we refuse to churn: bookmarks, deep links, and cross-links stay
   stable however the menu is later reorganized. There are no taxonomy-nested routes and no redirects
   to maintain.
2. **The sidebar groups those flat routes into clusters for browsing**: Home, Dashboards, Alarms,
   Inventory (systems, components, locations, interfaces, nodes, tasks), Catalog (templates, types,
   tags, rules), Explore, Settings (config, secrets, identity, audit). Grouping is pure
   presentation: a cluster is not a destination and carries no route of its own. It can be
   rearranged, and is user-customizable, without touching a single route.

**Home is distinct from Dashboards.** Dashboards monitor the *fleet* (datapoint views over the
inventory). Home monitors the *monitor*: the operator and admin situation room for config lifecycle
(stale or out-of-date templates), control-plane health (rules failing to evaluate, datapoints
dropped with no matching rule), and proactive suggestions. A dashboard cannot model that, so Home
earns its own slot; "Overview" is the name of the default dashboard, not the landing.

The theme is **dark-first** (the NOC aesthetic) on the brand palette (teal `#21CAB9`, navy
`#080c16`), semantic tokens only, no hardcoded colors in components.

---

# Config, credentials, and variables

URL: /architecture/variables/

Three kinds of operator-set value resolved by one cascade: config keyed to a signal, credentials with a lifecycle, and free variables.

Everything an operator **sets** resolves the same way: a typed value, owned at a scope, resolved
most-specific-wins down the [cascade](/architecture/cascade/) on every poll and every tick. Three
kinds share that resolution but differ in what they are keyed to and what lifecycle they carry:

- **config**: a device setting you declare. Keyed by a **canonical signal** (a `datapoint_type`),
  so it has an observed side and can be reconciled.
- **credential**: an access secret. Its own keyspace, a pluggable storage provider, and a
  lifecycle (refresh, rotation, expiry).
- **variable**: a free interpolated value (a macro). Not bound to a signal, just resolved and
  spliced into functions and interfaces.

| | **config** | **credential** | **variable** (macro) |
|---|---|---|---|
| what it is | a declared device setting | an access secret | a free interpolated value |
| keyed by | a canonical signal (`datapoint_type`) | its own template/interface-local name | an org config key (cascade namespace) |
| has an observed side? | yes, a datapoint via a get function | its **validity**, not the secret value | no |
| lifecycle | drift → reconcile (a set function) | provider + refresh + rotation + expiry | none; resolved and interpolated |
| example | `video.input = HDMI1` | an `ssh_credential`, an `oauth2` token | `poll_interval = 30s`, a base URL, a label |

The common thread is the cascade and an exclusive-arc scope (exactly one of
`global | template | location | system | component`): the same exclusive-arc ownership as
datapoints, plus a `template`-scoped default the datapoint arc lacks (and unlike datapoints,
config is not `node`-owned). The three are not three subsystems; they are three uses of one
"set a value, resolve it down a scope" idea.

## config: declared device state, keyed to a signal

A **config** item is the **declared side of a canonical signal**. `video.input` is one key with two
sides: the **observed** value the device reports (a `state_datapoint`, provenance=observed) and the
**declared** value you set. They share the **key** but not the **storage**: the declared value lives
in the config table, resolved down the cascade, and is **never a datapoint row**. Same name, opposite
direction, the observed side flowing *up* from the device and the declared side flowing *down* from
the operator. This is not a "declared provenance" (there are no declared rows in the datapoint
tables); it is one signal with two homes, and their gap is **drift**.

Keying config to the signal registry instead of a private name is what removes the import problem: a
component template **brings no keys, it references registered ones**, exactly as it does for the
datapoints it reads. Two display templates that both touch `video.input` are two references to one
governed key, not a collision. Config reuses the `datapoint_type`'s value domain, so a declared value
is validated against the same `{values: […]}` the observed side uses.

**The template is the source of truth for configurability.** A signal becomes settable on a device
class when that class's [component template](/architecture/templates/) binds a **get** function (an
ordinary collection function that emits the observed datapoint) and a **set** function (a
command-triggered function that writes it). The registry may carry a soft `settable` hint, but the
binding is authoritative: no set function, not enforceable here.

Each piece of a config item has one home, joined by the canonical key:

| Piece | What it holds | Lives in |
|---|---|---|
| signal definition | key, kind, value domain, unit | `datapoint_type` (the registry) |
| get / set binding | how this device class reads and writes the signal | the **component_template** version |
| declared value | the intent (`HDMI1`), plus the per-item `reconcile` policy | the **config table** (cascaded) |
| observed value | what the device reports (`HDMI2`) | `state_datapoint` rows (observed) |
| drift | declared ≠ observed | **computed on read**, not stored |

### Drift and reconcile

When a config item has both a declared and an observed value, their gap is **drift**: the same
[`disagree(declared, observed)`](/architecture/datapoints/#disagree-and-divergence) comparison used
everywhere, with the declared side sourced from config. A per-item `reconcile` policy turns drift
into action:

- **`observe`** (default): record the drift, raise **no** alarm. Log that it differs and go get the
  info; drift stays visible through [`disagree`](/architecture/datapoints/#disagree-and-divergence)
  and the config view, silently.
- **`warn`**: raise an alarm for the drift, at **warning** severity. Surface it, change nothing.
- **`enforce`**: declared wins. Call the template's **set** function to push the value back; that
  issues a command, writes an [`intended`](/architecture/datapoints/#intended-the-declared-effect-of-a-command)
  datapoint, and reconciles against the next observation (desired-state convergence, the controller
  half of spec-and-status). If the set **fails**, raise a real alarm (enforcement failure).

Adopting the observed value as the declared one (reality becomes intent) is **not** an ongoing mode;
it is a separate **one-shot import action** an operator runs deliberately.

:::caution[Open question]
The `reconcile: enforce` execution (the set-function push and the enforcement-failure alarm) and the
separate one-shot import action (observed-becomes-declared): the controller shape behind the reserved
seam.
:::

The power here is that **remediation needs no rule**. You do not author an `event_rule` or a flow to
fix a setting; you declare the value, set the policy to `enforce`, and the cascade plus drift plus
the set function close the loop. Reconcile runs **per item**, so one reconciled setting is better than
none; the capability of any item is simply which of its get/set functions the template has bound (get
only gives observe or warn on drift; a set too makes it enforceable). The data-mediated loop (set -> device ->
observe -> drift clears) is the one guarded at action dispatch
([alarms and actions](/architecture/alarms-actions/)), with a per-item backoff so a device that
refuses a write does not hammer.

### Declaring at the system level

Because config rides the standard cascade, you rarely declare on the device. Declare
`video.input = HDMI1` for the **main-display role** on the system template, and the cascade resolves it
onto whichever display fills that role; the display's own template declared nothing. Drift and
reconcile then *just happen*, no per-device authoring.

:::caution[Open question]
Resolving a value scoped to a role slot (the `system_template_member` where `health_role` already
lives) may need a new cascade level between system and component, alongside the per-item get/set
binding shape on the template.
:::

## credential: a secret with a lifecycle

A **credential** is an access secret. Sensitivity alone (mask + encrypt) is just a flag any value can
carry; a credential is its own primitive because it has a **lifecycle** a plain config never does:
a storage provider, refresh, rotation, and expiry. That lifecycle is surfaced on the **component
template**, because that is where the interface and its auth logic live.

**Shape.** A credential has a structured `variable_type` **shape**: `bearer_token`, `basic_auth`,
`ssh_credential {username, password | private_key}`, `snmp_community`, `oauth2 {client_id,
client_secret, access_token?, refresh_token?, expires_at?}`, `tls_cert`, with **secrecy per field**
(`oauth2.client_secret` is secret, `client_id` is not). An interface consumes a shape directly:
`credentialRef: ${input.ssh}` binds an `ssh_credential`, and the SSH adapter uses `{username,
private_key}`.

**Pluggable storage provider.** Secret fields are encrypt-at-write, decrypt-on-use, with the key
supplied by a pluggable **`SecretProvider`**: an env-var key by default, with **KMS, Vault, or an
external secrets manager** behind the same seam and no model change (the off-platform-storage case is
just an external provider). It is the same seam pattern as the
[storage backend](/architecture/files/#backends-swappable-behind-the-gateway): one interface, a
swappable implementation, no caller changes when the provider does.

**Read is permissioned and audited.** Sometimes a secret must be read in plaintext; that is a
privileged, audited action. `secret:read` is an [IAM](/architecture/identity-access/) permission you
grant to roles, and **every decrypt writes an [audit](/architecture/audit/) row**. The
machine-acquired token cache (below) is exempt to avoid audit-on-read noise on every request.

**Lifecycle, built from the primitives we already have.** None of this is a new subsystem; each
behavior is a template-declared use of functions, time, and flows:

- **refresh** (an `oauth2` access token) is **lazy**: refreshed on use when within a skew window of
  expiry, coordinated across replicas by a NATS KV lock (CAS on the credential key). Idle credentials never
  refresh; the refreshed token is a separate encrypted cache, not an operator secret.
- **rotation** (a password on a schedule) is a **flow**: generate → set on the device (a set function)
  → update the store → verify → invalidate the old, driven by the [time](/architecture/time/) primitive.
- **expiry and reminders** are an expiry timestamp plus a **watchdog** that fires an event and an
  **alarm** before the credential lapses.

**Credential health is its validity, not its value.** A credential's observable is whether it still
works: **intrinsic expiry** (an `oauth2` token, a `tls_cert` `notAfter`) warns proactively, and
**observed-use failure** flips it unhealthy after N consecutive auth failures consumers report. Both
surface through the ordinary datapoint-to-alarm pipeline, so a credential gets a health story without
being a device signal.

**Shared versus per-device is just scope.** A fleet-wide SNMPv3 user is a credential set high in the
cascade; a unique-per-device secret is the same shape set at component scope. No shared-versus-unique
split to model; it is the cascade, like everything else.

## variable: free interpolated values (macros)

A **variable** is the leftover, and the most familiar: a value you splice into behavior that is **not**
a device signal and carries no lifecycle, like a poll interval, a base URL, an environment label, or a
tuning constant. These are Zabbix-style **macros**, resolved `global → template → instance` down the
same cascade and interpolated as `$var:<name>` into functions, interface definitions, and rule
scopes.

- **Names are org-specific config keys, not canonical signals** (the one place the
  "operator-defined, not curated" namespace genuinely applies). There is no registry authority and no
  pre-registration; sprawl is controlled by a creation **role-gate** ([IAM](/architecture/identity-access/))
  and by every variable being **surfaced in the tree** as it is added.
- **Global and template-local are the same primitive at different scopes.** A global macro
  (a company-wide NTP server) and a template-local one (a device class's default poll interval) differ
  only by where on the cascade they sit.
- A variable has **no observed side and no reconcile**; nothing on a device mirrors a poll interval.
  That absence is exactly what separates it from config.

Scalar shapes (`string`, `int`, `float`, `bool`, `json`) cover the common case; a variable may be
flagged secret (a free secret like a webhook signing token) without being a full credential, since it
has no lifecycle.

## tag: a normalized label vocabulary

A **tag** is an operator **`key: value`** label attached to an entity to organize, filter, and scope by
dimensions Omniglass does not model natively (`category: audio-dsp`, `environment: prod`,
`cost_center: 4021`). A tag is not a signal and carries no lifecycle; it rides the cascade with a
**union-on-key, override-on-value** combinator, so keys accumulate down the tree while the
most-specific binding wins each value.

**The key is a tenant-wide governed vocabulary.** A tag **key** is a row in the `tag` registry, shared
across the whole tenant (one registry per database, which is the tenant boundary). Minting a new key is
**permissioned**: it takes a `tag:create` grant ([identity and access](/architecture/identity-access/)),
an admin or curator action. *Setting a value* on an existing key is the ordinary entity write
(`component:update` and friends), open to operators. That split is the point: the vocabulary stays
**normalized** (no one inventing `env` beside `environment` beside `Environment`) while binding values
stays routine. The UI **autocompletes keys from the registry** as you type, so you reach for the
existing key instead of coining a near-duplicate.

**Values bind down the cascade.** A `tag_binding` sets a value for a key at any scope
(`global | template | location | system | component`) and through [groups](/architecture/groups/),
exactly like config and variables. Keys **union** (an entity surfaces every tag bound at or above it);
values **override** most-specific-wins. A [template](/architecture/templates/) seeds default tags onto
its component (`category: audio-dsp`). Because resolution is cascaded, you tag a location once and every
system and component beneath it inherits it, which is what makes tags a practical scoping dimension: a
high-weight [group](/architecture/groups/) can key a rule-set off `compliance: pci`, an action can read
a `maintenance_window`.

:::caution[Open question]
Value-domain normalization. Key normalization is settled (the governed registry plus the `tag:create`
gate). The open part is the **value** side: whether a tag key may **constrain** its values (an enum or
`value_type` on the key, so `environment` accepts only its allowed set, validated and autocompleted like
a `datapoint_type` domain), and whether it may **normalize** them on input through an Expr transform
(lowercase, trim whitespace, fold synonyms) so `Prod`, `prod `, and `PROD` resolve to one value.
Free-text values ship either way; the question is how much governance a key places on its values.
:::

## What's shared

- **The cascade.** Config, credentials, and variables resolve most-specific-wins down
  `global → … → component`, with a template-scoped value as a shipped default; **tags** resolve down
  the same cascade with a union-on-key, override-on-value combinator. One resolver
  ([cascade](/architecture/cascade/)).
- **The exclusive-arc scope.** Each value is owned at exactly one scope: the same exclusive-arc
  ownership as datapoints, plus a `template`-scoped default the datapoint arc lacks (and config is
  not `node`-owned).
- **`variable_type` shapes** back credentials (structured secrets) and variables (scalars); config
  instead borrows the `datapoint_type`'s domain, because its key *is* a signal.
- **`$var:` interpolation** renders variables and credential fields into requests; config is read by
  key like a datapoint. Secrets are **masked at interpolation time** and never surface in a log line,
  error string, or datapoint label.

The observed side of config is maintained by one **event-driven worker** (the one-worker-plus-stages
model): when a `state_datapoint` lands whose `(owner, key)` a config item is keyed to, it refreshes
that item's cached observed value, reverse-indexed so "is this datapoint a config's observed side?" is
a sargable lookup, not a scan. It is the one controlled, one-directional crossing from the timeseries
back into current-value config.

```d2
direction: right
classes: { node: { style.border-radius: 8 }; key: { style: { border-radius: 8; bold: true } } }
operator: operator { class: node }
declared: "config\ndeclared (spec)" { class: key }
device: device { class: node }
state: state_datapoint { class: node }
observed: "config\nobserved (status)" { class: key }
command: "command (intended)" { class: node }
operator -> declared: declares (cascade)
device -> state: observed (get fn)
state -> observed: observed-value worker
declared -- observed: "disagree = drift" { style.stroke-dash: 4 }
declared -> command: reconcile: enforce (set fn)
command -> device
```

## How this changes provenance

Modeling declared state as **config** (and secrets as **credentials**) keeps **declared** out of the
datapoint provenances. Datapoints carry three ([observed, calculated,
intended](/architecture/datapoints/#provenance-how-we-know-a-value)); declared intent lives in config,
keyed to the same signal but stored down the cascade rather than as a row. The `state` **kind** is
unchanged: an observed `power.state = on` is still a `state_datapoint`, and a config item is keyed to
it. What moved is the *declared* value, out of the datapoint tables and into config resolved by the
cascade. There is no separate property or vault store; config, credentials, and variables are one
resolution model, and the spec-and-status loop gets a real home instead of overloading datapoint
provenance with operator intent.

## Storage

The shape registry, the config / variable cell, and the operator-label tables; the physical layout (the owner arc, the cascade key) lives on [storage](/architecture/storage/).

:::caution[Open question]
Whether config, credentials, and variables are one table with a discriminator or three; they share
the cascade and scope either way.
:::

| Table | Key columns | Notes |
|---|---|---|
| `variable_type` | name, schema (fields + **per-field secret**), refresh, validation, **official** | the **shape** registry (a scalar, or structured like `oauth2` / `ssh_credential` / `snmp_community`); the `official` boolean marks shipped-canonical versus org-local |
| `variable` | (name, **owner arc**), type, **declared_value** (secret fields encrypted), **linked_state** (-> state_datapoint, nullable), **observed_value**, reconcile | the config cell and the `$var:` cascade key; scope is the exclusive arc (template/component/system/location/global). Holds declared intent, optionally mirrors an observed datapoint for drift |
| `tag` | name, applies_to, propagates | the **tenant-wide governed key vocabulary**; minting a key needs `tag:create` ([identity and access](/architecture/identity-access/)). No `_type`, no namespace; values bind via `tag_binding` |
| `tag_binding` | (scope_kind, scope_id, tag), value | the `key: value` binding: **union on key, override on value** down the [cascade](/architecture/cascade/), bindable at any scope and via groups |

---

# Views

URL: /architecture/views/

The read side: a view is a named, parameterized, scope-checked query returning a uniform ViewResult, the backend-for-frontend every read goes through.

Writes go through typed resource CRUD; **everything read goes through a view**. A view is a named query
that returns a uniform **`ViewResult`** (`{columns, rows}`) and executes through the scoped
[Storage Gateway](/architecture/storage/), so the read side is a safe backend-for-frontend that the
console, the [API](/architecture/api/), and an AI agent all hit without ever touching raw tables or
writing SQL. This page is the read contract; the [API](/architecture/api/) is the surface that exposes
it, the [UI](/architecture/ui/) is the renderer that consumes it, and [API first](/contributing/api-first/)
is the doctrine behind both.

## Why a view layer

- A single resource reads through its typed `GET` (the [API](/architecture/api/) standard methods).
  Anything richer, a cross-entity aggregate, a fleet-health grid, the cascade "why did this value win"
  explainer, is a **view**: a named query the platform ships or an operator saves, not a bespoke
  endpoint per page.
- **One shape, one renderer.** Every view returns `ViewResult` (`{columns, rows}`), so one renderer per
  shape serves every view ([UI](/architecture/ui/)); adding a view never adds a bespoke renderer or a
  raw query path.
- **One safety boundary.** A view runs through the **scoped gateway**, so a caller sees only the rows in
  its visible set, exactly as for any read. The read side can be a public BFF precisely because no view
  ever runs unscoped or as raw SQL.

## What a view is

A `view` carries an id, a typed **params schema**, the query it runs, a **default / private** flag, and
the `official` boolean:

- **Default views** ship with the binary (curated, PR-governed, optionally backed by a Postgres view).
  They are the read surface the console's coded pages query: the Alarms page reads the `firing-now`
  view, not `GET /alarms` directly, so the read contract stays uniform and the same view backs a
  dashboard widget unchanged.
- **Private views** are operator-saved **structured** queries (filter + order + fields + params),
  **never raw SQL**. They follow the official / private
  [namespace shadow](/architecture/datapoints/#key-scope-template-org-official) like the registries.
- A view is **parameterized**: it declares typed params bound at run time. The
  [API](/architecture/api/) runs one at `/views/{id}:run?param=`; an undeclared or missing-required
  param is a clean 400.

## ViewResult: the uniform shape

`ViewResult` is `{columns, rows}`: each column carries a name and type (plus role hints a renderer maps),
and rows are the records. The shape is uniform so the renderer library is decoupled from any specific
view; a **field-mapping** tells a renderer which column is the value, label, time, or series key
([UI](/architecture/ui/)).

- **Cursor-paginated** like any [API](/architecture/api/) list (`page_token`), over the already-scoped
  result.
- **Views by default, materialized only when earned**: most views are live queries; a hot view becomes
  a materialized projection only when a read profile proves the live query too slow (the same discipline
  as [storage](/architecture/storage/)).

## Scope and safety

- Every view runs in the gateway's **scoped mode**: the caller's `visible_set`
  ([identity and access](/architecture/identity-access/)) filters the rows, on every view, with no
  per-view code. A private view an operator saves **cannot widen their scope**: it resolves against
  their visible set at run time, so a saved query is never a privilege escape.
- A view is **read-only** by construction: it never writes and has no side effects, which is what makes
  exposing views broadly (to the API, an MCP tool, a shared dashboard) safe.
- Presentation that depends on config (a severity level's label and color) resolves client-side from the
  config view, not baked into the result.

## How views are consumed

One read contract, three consumers:

- **The console** renders a view through the renderer library ([UI](/architecture/ui/)): coded pages and
  dashboard widgets both bind `view ref + renderer + field-mapping + params`.
- **The API** exposes every view at `/views/{id}:run` ([API](/architecture/api/)); views are part of the
  public contract.
- **An AI agent** reads through view-backed tools on the [MCP surface](/architecture/api/) (the agent's
  search and query tools *are* views), scoped and audited like any caller. The read side is one contract
  whether a human, a script, or an agent asks.

## Live updates

A view read is **query-polling by default** (a refetch interval; slow-changing config uses a long stale
time). A view may **stream** over a server-side [SSE](/architecture/messaging/) relay where latency or
fan-out earns it, the same earn-it-with-a-profile discipline ([UI](/architecture/ui/),
[time](/architecture/time/)).

## Versioning

A default view evolves **additively** within the API version (new columns, new optional params, never a
removal or a meaning change); a breaking change to a shipped view is a new view. A private view is
operator-owned data.

:::caution[Open question]
The structured view-definition grammar for private views (filter + order + fields + params), shared with
the [API](/architecture/api/) list filter language ([expressions](/architecture/expressions/)).
:::

Related: [API](/architecture/api/) (the surface and `/views/{id}:run`), [UI](/architecture/ui/) (the
renderer and the field-mapping), [identity and access](/architecture/identity-access/) (the scope a view
runs in), [storage](/architecture/storage/) (materialize-when-earned), and
[API first](/contributing/api-first/) (the doctrine).

---

# Workers

URL: /architecture/workers/

One worker machinery over several JetStream consumers, plus the backtest capability and the reconcile desired-state loop.

Workers are how Omniglass does the steady background work, deriving datapoints, sending actions, firing timers, reconciling drift, on one machinery instead of a pile of bespoke loops, so the operator gets crash recovery and exactly-once outcomes for free everywhere.

## One machinery, several consumers

There is one worker machinery, a **JetStream work-queue consumer** over a configurable concurrency
pool (pull a message, do work, ack, with at-least-once delivery plus `Nats-Msg-Id` dedup and an
idempotent sink so it inherits crash recovery, exactly-once outcomes, and event-time semantics for
free). It is instantiated over several consumers rather than separate loops:

- **the admission consumer**: owner-confines raw-ingress datapoints (node and webhook) against the
  publisher's placement, preserving `Nats-Msg-Id`, and republishes to the **trusted** datapoints stream,
  so the rule engine and persistence read only confined points (system mode, [messaging](/architecture/messaging/));
- **the rule engine** (datapoint consumers): consume arriving datapoints from the **trusted**
  JetStream datapoints stream, apply `calc_rule`s and `event_rule`s, publish derived datapoints back
  onto the trusted stream (a trusted producer, no admission pass), and write events and alarm transitions
  to Postgres in one transaction;
- **the action sender** ([alarms and actions](/architecture/alarms-actions/)): consumes
  action work fanned out by CDC, sends at-least-once, advances action step state (PG-first, CDC-out);
- **the persistence consumer**: a batch sink that consumes the **trusted** datapoints stream and writes
  datapoints to the Postgres metric/state/log tables asynchronously, so rules never wait on PG;
- **the clock** ([time](/architecture/time/)): fires schedules and armed timers (a leader-elected
  singleton, below);
- **reconcile**: the desired-state loop (below).

Each consumer is the "produces new work, needs independent durability" exception applied: a
subsystem that consumes the same message is **a stage, not a second loop**. Competing consumers in a
group scale horizontally with no leader: JetStream hands each message to exactly one member, and
adding instances just adds throughput. Alongside the consumers, a **node-liveness sweep** runs on its
own ticker. Unlike a consumer it is a *poll*, not a drain: a down node produces no message, so it is
found by scanning heartbeat freshness, raising and resolving the node-owned `node.down` alarm
idempotently (the one-open index). There is no separate projector either: current state is **views by
default** ([storage](/architecture/storage/)), and `alarm` / `action` hold their state directly.

## Consumer groups versus singletons

Most of the machinery is competing consumers, but two pieces must run as exactly one active instance:
the **CDC publisher** (logical decoding of the WAL, fanning committed events, alarms, actions, and
operator mutations out to JetStream) and the **clock** (firing schedules and armed timers). These are
**leader-elected singletons** via a **NATS KV CAS lock**: each candidate races to compare-and-set a
KV key, the winner holds the lease, and on its death the lease expires and another candidate takes
over. Same pattern for both, no separate election service and no SKIP-LOCKED row claim. A singleton
that produces work still publishes onto the bus, where the competing consumers scale it out.

## Re-entry, not one mega-pass

The pipeline `datapoint -> alarm -> action` is **not one transaction**. A datapoint arrives on the
datapoints stream; `event_rule`s evaluate it (the stateless then stateful stages below); two edges
re-enter: **calc** (a `calc_rule` produces *new* datapoints) re-enters by publishing the derived
datapoints back onto the data lane, where the consumers pick them up again, and **actions** are born
when an `event_rule` writes the event and alarm to PG in one transaction, after which CDC fans the
committed change out to the action sender. So the rule engine never recurses unboundedly in one
transaction; a cross-producing stage hands off to the bus, which is also what makes it independently
durable. Calc re-entry **terminates by write-on-change** (a recompute that lands the same value
publishes nothing, the fixpoint) with a depth cap as a cyclic-rule backstop, carrying a rollup
(component -> system -> location health) one hop per pass. Parsing into datapoints is **not** a
worker stage; it happens at the edge ([collection](/architecture/collection/)).

## The stateless / stateful fork

This is the axis that decides almost everything else about a subsystem.

- **Stateless** (owner resolution, calc): output is a pure function of (input, rules, snapshot).
  Order-free, safe to backtest for free, no cross-event state. Write pattern: **append** (a batched
  multi-row INSERT).
- **Stateful** (the alarm lifecycle): maintains persisted state across events (the open alarm), so
  open and resolve depend on prior state. Consequences:
  - **Order-sensitive.** JetStream does not promise strict ordering (the server is ts-authoritative)
    and competing consumers can hand same-key messages to different members, so a stateful subsystem
    must either be idempotent and tolerate reorder (an as-of conflict rule) or serialize per state
    key. The alarm transition is serialized per `(event_rule, owner)`: that ordered write lands in
    the same PG transaction as the event record.
  - Write pattern: **guarded conditional upsert** (`INSERT ... ON CONFLICT` / `UPDATE ... WHERE`),
    with a **partial unique index** as the concurrency-correctness backstop.
  - **Backtest is harder**: it must process each entity's series in order.

## Lineage the engine stamps

Every derived datapoint carries its lineage **on the row** (a `provenance`, `source_rule` plus
version, and the one provenance pointer; see [storage](/architecture/storage/),
[datapoints](/architecture/datapoints/)). There is no separate execution table: a derived row is itself
the evidence of its rule's run, and a fan-out (one execution to N datapoints) stamps the same
`source_rule` on each. The rule version is the hinge for backtest.

## Backtest: re-run a changed rule over retained datapoints

The model is **not event-sourced**: current state lives in the datapoint tables and the `alarm` /
`action` rows directly, never reconstructed from a log. Omniglass does **not** re-run history to rebuild
events or state. But a changed `calc_rule` or `event_rule` can be **backtested**: a read-only
what-if that re-runs the new rule version over the **retained datapoints** and diffs its output
against what the old version produced, purely as DX sugar, without writing a new event or touching
live state. Only the **calculated** and **event-derived** slices are server-rule-derived, so only
they re-derive. Everything else does not:

- **observed** datapoints are parsed at the edge and are not re-derived server-side (the raw payload
  is not stored, so there is no server-side re-parse);
- **operator alarm transitions** (ack, snooze) come from `audit_log`;
- **action delivery status** comes from the action rows (the real-world send is not re-done);
- **no-data staleness** re-derives from the datapoint gaps ([time](/architecture/time/)).

Two modes, switched by the `source_rule` version: **historical** uses the original rule versions
recorded on each derived row (showing what the system actually computed, for audit), and
**prospective** uses the current rule versions (re-deriving as if today's rules had always applied,
for testing a rule change). **A backtest writes to a shadow, never live**: promoting a result to live is
a separate, explicit, audited step. A prospective backtest is **windowed by default** (over the last 30
days), with whole-history the explicit, heavier option.

## Reconcile: the desired-state control loop

Reconcile is another JetStream consumer: it projects **declared desired state** onto the things that
drift, the system-level form of [config](/architecture/variables/)'s `reconcile: enforce`
policy.

- **Inputs**: the desired declarations (templates, component assignments, config
  declared values) plus the observed state. Config changes are operator mutations born in a PG
  transaction; CDC publishes the committed change to JetStream
  ([audit](/architecture/audit/)), so reconcile is a CDC consumer plus the current
  projections.
- **Output**: it asserts the delta as **node config** (which tasks and commands each node runs,
  derived from placements) and as **reconciled `run` actions** (the desired-state commands that must
  stay asserted, for example a codec's feedback registration).
- **Idempotent**: assert-equals-observed is a no-op; it acts only on drift. Its runs log an
  `internal_log`, using the same worker machinery without a bespoke loop.

Open: the reconcile cadence (continuous versus on-audit-change versus a periodic full sweep) and
backoff on a flapping target.

---

# API first

URL: /contributing/api-first/

The Go API is the single integration contract; the SPA, CLI, and YAML tooling are generated clients of it.

The Go HTTP API is the **single integration contract**. The SPA, the CLI, the node
worklist, and the YAML authoring tooling are all **generated clients** of it. Nothing but
the API talks to the database, and the API is described by one machine-readable spec that
cannot drift from the implementation.

## The source of truth is the Go API

Request/response types are Go structs (Huma). The OpenAPI 3.1 document is *generated* from
them, server-less, and committed. Everything downstream is generated from that document.
This is the rule: **you change a Go route or shape, you regenerate, you commit the derived
artifacts.** A drift check in CI fails the PR if the committed artifacts are stale.

## The generation pipeline

| Generator | Input | Output | Consumer |
|---|---|---|---|
| `cmd/openapigen` | Huma Go structs | `api/openapi.json` (+ `.yaml`) | everything below |
| `web pnpm gen:api` | `openapi.json` | `web/src/api/schema.gen.ts` | typed `openapi-fetch` SPA client |
| `cmd/cligen` | `openapi.json` | `internal/cli/api_gen.go` (cobra) | the CLI, patched via `api_hooks.go` |
| `cmd/mcpgen` | `openapi.json` | the MCP server (a curated tool catalog) | AI agents over the [API contract](/architecture/api/) |
| `cmd/schemagen` | authoring structs | `schema/*.schema.json` | YAML editor validation (VSCode) |
| `gen-proto` | `proto/og/v1/*.proto` | committed `*.pb.go` | the gRPC ingest path |

One command runs them all (`make gen`); each has a focused target (`make gen-api`,
`gen-cli`, `gen-schema`, `gen-proto`). The committed `*.pb.go` and JSONSchema let a
contributor build without protoc or a running server.

## Conventions (AIP-style)

These are the conventions a route follows while you write it; the complete [API
contract](/architecture/api/) (the error envelope, idempotency, long-running operations,
versioning, and the authorization status mapping) is the architecture of record.

Every operation lives under `/api/v1/*`. The path shape is derivable, not special-cased:

- **Plural collections**, standard CRUD by primary key: `POST` creates (409 on PK
  collision), `GET` reads, `PATCH` updates by PK (AIP-134, partial), `DELETE` removes.
  No upsert/register shortcuts.
- **`:verb` (not `/verb`) for non-CRUD custom methods**: `/alarms/{id}:ack`,
  `/nodes/{name}:heartbeat`, `/rules/calc:validate`, `/components/{name}:apply`,
  `/views/{id}:run`.
- **Singular kind sub-segments**: `/rules/calc`, `/datapoints/metric`,
  `/types/component`, `/types/event`.
- **official / private namespace** on every registry and rule family (below).
- **List conventions** (AIP-132 target): `filter` / `orderBy` / `pageSize`+
  `pageToken` (cursor, never offset) / `fields`. The `filter` runs through the one pluggable
  expression engine ([Expr by default](/architecture/expressions/)), the same language
  across rule scopes, dynamic groups, and list filters.

The API is **self-describing**: the running server serves `GET /api/v1/openapi.json`,
`/openapi.yaml`, and a human reference page.

## The read side is views (backend-for-frontend)

Writes go through resource CRUD (each emitting an `audit_log` row in the same transaction).
**Reads beyond a single resource go through views**, and views are part of the public API:

- a **view** is a named query backing a page or widget, returning a uniform `ViewResult`
  (`{columns, rows}`) so one renderer contract serves every view;
- **default views** ship with the binary (curated, may be Postgres-view-backed, PR-
  governed); **private views** are operator-saved *structured* queries (filter + order +
  fields), never raw SQL;
- `GET /views/{id}:run?param=` binds declared params; undeclared or missing-required
  params are a clean 400;
- views execute through the **scoped Storage Gateway**, so IAM scope applies to a view's
  results exactly as to any read. This is the safety boundary that lets the read side be a
  public BFF without handing operators raw SQL.

## The per-route gate

Every typed route carries a per-route coverage test (an `openapi_coverage_test.go`-style
gate) and the CLI-covers-every-route test, so the generated clients never fall behind the
API. After any route change: `make gen-api && make gen-cli`, add the per-route test, keep
the coverage tests green.

---

# UI and the design system

URL: /contributing/design-system/

The SolidJS and daisyUI console, a generated typed client over the ViewResult renderer contract.

The operator console is a **SolidJS** SPA styled with **daisyUI 5** on **Tailwind CSS 4**. It
is a generated client of the API (typed via `openapi-fetch` off the committed `openapi.json`)
and a renderer over the views BFF. The same surfaces are also the **learning surfaces** (see
[the learning-tool restriction](/contributing/learning-tool/)).

:::note[What shipped]
Styling is **daisyUI 5 component classes + Tailwind utilities**, with two brand themes defined
through the daisyUI plugin (`omniglass-dark` default, `omniglass-light`) from the "Omniglass
Design System" tokens. Bespoke CSS is kept to what daisyUI has no slot for: the domain
severity/health colors, the type-system (`mixed`/`mono`) and density (`comfortable`/`compact`)
levers via `html` data-attributes, and the live pulse. Accessible **interactive** widgets
(dialog, combobox, select, popover, tabs, tooltip, toast) are built with **Kobalte**
(Solid-native, styled by daisyUI), pulled in primitive-first when the first one is needed; the
Tweaks panel is the first candidate to move there.
:::

## The stack

| Concern | Choice |
|---|---|
| Framework | SolidJS (`solid-js`, `@solidjs/router`) |
| Components / theme | daisyUI 5 on Tailwind CSS v4 (the `omniglass-dark` / `omniglass-light` themes) |
| Interactive primitives | Kobalte (selective: dialog, combobox, select, popover, tabs, toast), styled by daisyUI |
| Data fetching | `@tanstack/solid-query` over a typed `openapi-fetch` client |
| Tables | `@tanstack/solid-table` (group-by, sub-rows) |
| Flow / graph viz | `solid-flow` (collection functions, pipelines, DAGs) |
| Dashboards | `gridstack` (12-column widget grid) |
| Build / test | Vite, Vitest, `@solidjs/testing-library` |

The typed client is generated, never hand-written: `openapi-typescript` turns
`openapi.json` into `schema.gen.ts`, so a route or shape change surfaces as a TypeScript
error in the SPA.

## Core UI contracts

- **One renderer per view.** Every view returns `ViewResult` (`{columns, rows}`); the SPA
  renders any view through one contract, so adding a view does not add a bespoke renderer.
- **`useCan(...)` from `/auth/me`.** The console reads the principal's flat, wildcard-
  expanded `permissions` once and gates UI affordances with O(1) checks; `grants` drive
  scope chips and "why is this hidden" explanations.
- **The dense ops layout / `DensePage` primitive.** List pages follow one shape: summary
  (donut facets over the full set) then filter (keyboard chip `FilterBar`) then a group-by
  table then a click-row detail `Drawer` plus a full detail page. Facets drive the filter;
  the summary stays whole so click-to-filter is stable. The extracted primitives
  (`DensePage`, `FilterBar`, `Donut`, `SummaryFacet`, `Drawer`, `HealthBadge`,
  `Actor`, `Sparkline`) are the reuse target.
- **Learning surfaces ride the real engine.** A concept page (a collection function, a
  edge parse step, a calc rollup, an alarm lifecycle) renders the actual pipeline against real or
  lab-simulated data, not a static diagram. `solid-flow` is the workhorse for rendering the
  DAGs the engine actually runs.

## Build and embed

The SPA builds with Vite and is embedded into the Go binary (served under `/web`); the
docs/learning site is embedded and served under `/docs`. One artifact serves the API, the
console, and the docs. Component-level tests (Vitest) run in CI; user-observable behavior
gets an e2e (browser-driven) test per the test-first doctrine.

## How this relates to the UI architecture

This page is the **build and dev guide** for the console: the stack, the generated typed client, the
reusable primitives, and the build-and-embed pipeline. The **architecture** the console implements,
the `ViewResult` renderer contract, the views BFF (read side), one renderer per view, the dense-ops
layout as a pattern, the information architecture, and the live-update model, is
[UI](/architecture/ui/) on the architecture spine. Build mechanics live here; the model lives there.

---

# Docs with everything

URL: /contributing/docs-with-everything/

A feature is not done until the docs that teach it ship in the same PR.

Omniglass ships its documentation *as part of the product*. The docs are not an
afterthought in a separate wiki; they are Astro Starlight content under `docs/`, compiled
to a static site and published at docs.omniglass.hyperscaleav.com (and, in time, embedded
into the Go binary to serve at `/docs`). The architecture is
published ahead of the code, so the design is visible (and reviewable) before, or
alongside, the feature that implements it.

## The rule

**A feature is not done until the docs that teach it ship in the same PR.**

Concretely, a user-facing PR must do one of:

- change `docs/` to add or update the page(s) that explain the new behavior, or
- carry the `no-docs` label with a one-line justification (pure refactor, internal-only
  change, etc.).

CI enforces the docs-touched gate. The justification path exists so the gate never blocks
a genuine internal change, not as a routine escape hatch.

## What "the docs" means here

- **Architecture pages** (`/architecture/`) hold the model: the spine plus leaf
  documents, and the current decisions. Each official term is defined once in the
  [glossary](/architecture/glossary/) and not redefined in the leaves.
- **Concept and learning pages** teach a concept interactively (see
  [the learning-tool restriction](/contributing/learning-tool/)). When a feature introduces a concept
  an operator must understand, the teaching surface ships with it.
- **Contributor pages** (`/contributing/`) are this doctrine set.

## Style

- No em dashes. Use commas, colons, periods, or parentheses.
- No AI/assistant attribution.
- Write for someone learning the system, not someone who already built it. The same page
  serves the operator using the product and the contributor extending it.

## Publishing

Docs build in CI on every PR (so a broken docs build fails the PR) and are embedded into
the binary at release. The published site is docs.omniglass.hyperscaleav.com.

---

# The learning-tool restriction

URL: /contributing/learning-tool/

Every operator surface should also teach the concept it operates on, against real or simulated data.

Omniglass is two things at once, by design: **a functional tool and a learning tool.**
This is a standing design restriction, not a nice-to-have. It shapes what we build and how
we judge it done.

## The restriction

**Every operator surface should also teach the concept it operates on.** Where it makes
sense, a page is not just a control panel over data; it is an interactive explanation of
the concept and the data flow behind it, driven by real or simulated data.

A feature that introduces a concept (a collection function, an edge parse step, a calc rollup, an
alarm lifecycle) should ship a surface where a learner can *see the concept happen*:

- the function or pipeline rendered, not just described,
- real or simulated data moving through it,
- the ability to poke it and watch the result change.

## What it teaches, and what it does not

The audience is **AV and IT systems integrators and operators**, and the subject is **monitoring**: what
it is, how to do it well, and how Omniglass models and monitors an estate, so an operator understands the
data they get and the judgment behind it. It teaches the **AV Observability discipline** made concrete,
the Align / Measure / Instrument / Practice layers as explorable artifacts rather than a PDF.

It is **not** a software-engineering tutorial. It does not teach how to write software, how to architect
a platform, or how Omniglass is built internally. The learner is operating an estate and learning
**monitoring**, not reading source. "Teach the concept it operates on" means the *monitoring* concept (an
edge parse, a calc rollup, an alarm lifecycle, a health rollup), never the code that implements it.

## Why

The product is also the teaching artifact for the AV Observability discipline it
implements. The Measure and Instrument layers should be concrete, explorable artifacts
rather than blueprints in a PDF. A user who operates Omniglass should come away
understanding *how* it models their estate, because the tool taught them while they used
it.

## Real or simulated data

Teaching surfaces must work without a live fleet. A simulated/lab data source (the
emulated estate) backs the interactive pages so a learner, or a CI run, gets the same
explorable behavior as a live deployment. "Works against the lab emulator" is part of
done for a learning surface.

## How it interacts with the other doctrines

- **Docs with everything:** the teaching surface is part of the docs that ship with a
  feature. A concept-introducing PR that has no learning surface should say why in its
  docs note.
- **Test first:** the interactive surface is user-facing behavior, so it carries e2e
  coverage like any other surface.
- **Functional and pedagogical:** the learning surface rides on the *real*
  implementation and real (or lab-simulated) data. It is not a mock diagram detached
  from the engine; it is the engine, made legible.

---

# Primitive first

URL: /contributing/primitive-first/

Build the reusable primitive, then consume it. The shared engines (expression, ViewResult, gateway, cascade, timer) the rest of the system is written against.

Build the **reusable primitive first, then consume it.** A behavior that more than one feature
needs is a primitive: define it once, test it once, and write every consumer against it. Do not
inline a one-off where a primitive belongs, and do not grow a second variant of something that
already exists.

## Why

- **One tested thing, not N.** A primitive carries its own full test set, so every consumer
  inherits correctness. N inlined copies are N places to drift and N places a bug can hide.
- **Consistency by construction.** When one engine backs every site, a contributor (and an
  operator) learns it once: the `filter` you write for a dynamic group is the `filter` you write
  for a list.
- **The learning tool renders the real engine.** Operator surfaces teach a concept by running the
  primitive against real data ([learning tool](/contributing/learning-tool/)), so a primitive is
  the teaching artifact, not a diagram.

## The primitives the system is written against

| Primitive | One model for | Doc |
|---|---|---|
| **Expression engine** | list `filter`, rule `scope`, dynamic-group membership, `fire_criteria`, calc `reduce` | [expressions](/architecture/expressions/) |
| **`ViewResult` renderer contract** | every read beyond one resource (`{columns, rows}`, one renderer) | [views](/architecture/views/), [UI](/architecture/ui/) |
| **Storage Gateway** | the only DB door: scope and in-transaction audit by construction | [storage](/architecture/storage/), [identity and access](/architecture/identity-access/) |
| **Cascade** | resolving config, credentials, and variables down one tree | [cascade](/architecture/cascade/) |
| **Admission consumer and the two lanes** | the one owner fence and the one data / record split on the bus | [messaging](/architecture/messaging/) |
| **Timer and clock** | schedule, watchdog, for-duration, and runbook-wait, all one durable model | [time](/architecture/time/) |
| **The `action` row** | every long-running operation's handle, rule-fired or API-called | [API](/architecture/api/), [alarms and actions](/architecture/alarms-actions/) |
| **`datapoint_type` registry** | one registry across metric, state, and log | [datapoints](/architecture/datapoints/) |

## How to apply

- **Reach for the primitive before writing the one-off.** Need a filter, a read, a scheduled fire,
  a scoped query? It already exists, consume it. If you are about to hand-roll one, stop.
- **Extract on the second use, not the third.** The moment a pattern is copied it is a primitive
  that has not been named yet. Pull it out, give it tests, point both callers at it.
- **A primitive lands with its tests and its first consumer in the same slice** (vertical, not
  horizontal): build the primitive, prove it with one real consumer, ship both. The
  `/add-collection-primitive` and `/canonical-datapoint` skills are this doctrine made procedural.
- **Do not fork an engine.** A second filter language, a second DB path, a second timer model is
  the anti-pattern this doctrine exists to prevent.

This composes with the others: the [API](/contributing/api-first/) is generated from the
primitives, [test-driven](/contributing/test-driven/) tests each primitive once, and the
[learning tool](/contributing/learning-tool/) renders the real one.

---

# Slice workflow

URL: /contributing/slice-workflow/

How a feature ships: one vertical slice per PR, through a fixed lifecycle of define, build test-first, document, validate, review, and a ship-review the architect approves.

A feature is **one vertical slice**: a thin cut through the whole stack (schema to API to docs)
that delivers a user-observable outcome, not a horizontal layer. Each slice is one PR, built
through a fixed lifecycle so quality is a process, not a hope.

## The lifecycle

| Stage | Practice | Gate |
|---|---|---|
| **Define** | a [feature issue](https://github.com/hyperscaleav/omniglass/issues/new/choose) under an epic: the outcome, the thin cut, the deferrals, the test plan, and the permission and scope it touches | **hard gate**: issue filed and scope approved before any branch |
| **Design** | read the [architecture spine](/architecture/) (the docs are the spec); locate the seam; name the thin cut | the cut is explicit |
| **Branch** | a git worktree off `origin/main` under `.claude/worktrees/`, never a commit on `main` | only after Define is approved |
| **Build** | [test-first](/contributing/test-driven/): the failing test, then the feature, committing each increment. A slice cuts every entry point it touches, **API + CLI + UI**: the CLI command is generated from the OpenAPI (`make gen`); the UI view is built where the entity is live, or rendered as an honest stub where its backend does not exist yet | RED then GREEN; all three surfaces present (stub allowed) |
| **Document** | the teaching [docs ship with it](/contributing/docs-with-everything/), plus a build-progress note on the status page | docs in the same PR |
| **Validate** | `make test` green (run fresh), `make gen` clean, no drift | green, fresh |
| **Review** | a reviewer pass over the diff, findings addressed; a security lens when it touches authz, secrets, the edge, or an invariant | findings cleared |
| **Ship** | the ship-review (below), then squash-merge | architect approves |
| **Log** | record what shipped, the decisions, and the follow-ups | logged |

The first six stages are the [five doctrines](/) in motion; the last two are how the work
becomes externally visible and approvable.

## What "validated" means

Not a vibe at each gate, a check:

- **The ticket is the contract, and a hard gate.** The issue states the outcome, the thin cut,
  the deferred items (each its own issue), and the authorization surface (the permission checked
  and the scope injected). **No worktree or branch is created until the issue exists and the
  architect has approved its scope,** so the boundary is agreed before any code, not discovered
  at review.
- **Tests are tiered and fresh.** Unit (pure, fast), integration (real Postgres via
  testcontainers, no mocking the database), and end-to-end (drive the entry point as the user).
  `make test` is the gate, run without a cache: a cached pass or a `-short` run hides the
  database-backed behavior, and a green claim is not evidence until the tier actually executed.
- **Docs ship with the feature.** The page that teaches the concept lands in the same PR, the
  architecture-of-record stays consistent, and any divergence is stated, never silent.
- **The API cannot drift.** `make gen` regenerates the OpenAPI and the clients (the cobra CLI and the
  typed SPA client) from the Go; a non-empty diff fails the slice until committed.
- **Every entry point is covered.** A slice that adds or changes an operation surfaces it in all three
  entry points: the API route, the generated CLI command, and a UI view (live where the entity exists,
  an honest stub where its backend does not yet). Each is exercised as the user would drive it.
- **Review verifies behavior to the outcome,** not just the call site.

## The thin-cut discipline

A slice ships the smallest honest increment. A **thin cut** is a deliberate simplification (the
first auth slice did bearer tokens only, and resolved the owner scope to all); a **deferral** is
work moved to a later slice. Both are explicit: a thin cut is documented in the slice, a deferral
is a filed issue. The opposite, a silent gap, is the failure this discipline prevents.

## The ship-review (the approval artifact)

At PR-ready, the slice is presented as one **ship-review**, front-loaded so the architect approves
in seconds or redirects. The `/ship-slice` skill runs the pre-ship checklist and emits it:

```
SHIP REVIEW - <type>: <slice>   (PR #N, closes #M)

Outcome:   <one user-observable line>
Verdict:   ready | ready-pending-your-call

Scope:     in / thin cut / deferred (#issues)
Proof:     make test green (fresh, N packages); the load-bearing behaviors; tiers; make gen clean
Docs:      what shipped; arch-of-record consistent or a divergence note
Review:    findings and how addressed; security note if relevant

Decisions I made (your veto window): the judgment calls that bound the design
Decisions I need from you:           open forks, or none

Diff / Risk: size, PR link; outward-facing? invariant-changing? reversible?
```

Approval means squash-merge (the conventional-commit PR title drives the release). A redirect
adjusts the slice. The two lines that matter most are **Decisions I need from you** and **Risk**.

## Lessons held

- **Commit per increment.** A slice is built as a sequence of green commits, not one batch at the
  end. Work that is not committed is work that can be lost.
- **Verify fresh.** Re-run the database-backed tests before claiming green; do not trust a cache
  or a delegated agent's report.
- **Approve at the boundary.** Scope is agreed at the ticket and again at the ship-review, so a
  surprise never lands in `main`.

---

# Test-driven, always

URL: /contributing/test-driven/

Build the failing test before the feature; each change carries the tier that proves it.

The loop, in order, for every behavior change:

1. **Define the behavior.** State what the feature does and how it is observed, as an
   assertion, not a vibe.
2. **Write the failing test.** It must fail for the right reason before any machinery
   exists. A bug fix starts with a test that reproduces the bug.
3. **Build the minimal machinery** to make the test pass. Nothing more.
4. **Refactor** with the test green.

A change that adds or alters behavior is incomplete without a test that failed before it
and passes after. Each change carries the right tier(s): **unit** for logic,
**integration** (real Postgres) for anything touching storage, **e2e** (API, CLI, UI) for
user-facing behavior. Bug fixes start with a failing regression test that stays in the
suite. `make test` is the gate: green before commit and before merge. Validate locally;
do not lean on CI to find what a local run would.

## The spike carve-out

A spike to learn whether something is *possible* may precede tests, but it must be labeled
a spike and either deleted or stabilized with tests before it merges. "Spike" is not a
standing excuse to skip the failing test.

## The capability-primitive carve-out

When a unit wraps an environment-risky capability (raw sockets, ICMP, privileged
syscalls, an external protocol), a fake-based unit test is necessary but not sufficient.

- Commits may be incremental: a fake-green seam is a legitimate checkpoint commit.
- The real-implementation integration test is required to **close the increment** and is
  an absolute gate before any merge. It is never dropped, only sequenced within the
  increment.

The environment risk is the point of the primitive. A green fake with the real path
unproven proves nothing about the capability.

## Tiers

- **Unit:** pure logic, fast, no I/O. Expression compile/eval, decode, request shaping,
  mapping.
- **Integration:** real Postgres, no mocking the database. `testcontainers-go` gives each
  run an ephemeral instance on a random port; never bind a fixed host port.
- **End-to-end:** emulate the user at each entry point against the running stack: API
  (drive the contracts as a client), CLI (run the real commands), UI (browser-drive the
  SPA). Assert the user-observable outcome, not internals.

No mocking the system under test. No tests-within-tests.

---

# The CLI

URL: /guides/cli/

The omniglass CLI: a generated client of the HTTP API, with a stable seam for hand-written commands.

The `omniglass` binary is both the server and the client. Its data commands are
**generated from the OpenAPI** (`make gen`, via `cmd/cligen`), so the CLI cannot drift
from the API: a new route is a new command on the next regeneration. A small set of
commands (the run modes and the trusted bootstrap) are hand-written and compose with the
generated tree on the same root.

## Running the full stack locally

`make dev` brings up the whole stack for a browser session: a dev Postgres (docker
compose, matching the default DSN), the migrations and boot seed, a bootstrapped `dev`
owner whose token is printed once, and the server with the operator console at
`http://localhost:8080/web`. Ctrl-C stops the server; `make down` stops Postgres (the
named volume persists data between runs; `docker compose down -v` wipes it and re-mints a
token next run). `make up` / `make down` manage just the database. Tests never touch this
stack: they spin their own ephemeral Postgres via testcontainers.

## Connecting

Every generated command is a client of a running server and takes two shared flags, each
with an environment default:

| Flag | Env | Default |
|---|---|---|
| `--server` | `OMNIGLASS_SERVER` | `http://localhost:8080` |
| `--token` | `OMNIGLASS_TOKEN` | (none) |

The token is a bearer credential (mint the first one with `omniglass bootstrap`). The
server enforces the same capability and scope for the CLI as for any caller: the CLI is
just another client, with no privileged path.

```sh
export OMNIGLASS_SERVER=https://omniglass.example.com
export OMNIGLASS_TOKEN=ogp_...
omniglass location list
omniglass location create --name hq --location-type campus
omniglass location get hq
```

Output is JSON. A non-2xx response prints the server's error body and exits non-zero, so
the CLI is safe in scripts.

## Generated versus hand-written

- **Generated** (`internal/cli/api_gen.go`, do not edit): one command per API operation.
  The resource and verb come from the AIP-style path (`POST /locations` is `location
  create`, `GET /locations/{name}` is `location get <name>`, a `:verb` custom method is
  `<resource> <verb> <id>`); path parameters are positional args, the request body is
  `--flags`, and `--help` plus the example come from the operation's summary and
  description.
- **Hand-written** (`internal/cli/api_hooks.go` and the run-mode files): the client
  runtime the generated tree calls, plus commands that are not API operations, the
  `server` and `migrate` run modes and `bootstrap` (the trusted direct-DB owner lane).

To add a hand-written command, write a `newXxxCmd()` returning a `*cobra.Command` and add
it in `newRoot`, exactly as `bootstrap` does. Regenerating the API commands never touches
it.