Conformance suite — off-app QA

The Hub is QA'd from outside itself, by a Cowork plugin (helm-qa-sweep) driving a real browser session against a real Helm instance. The Hub itself ships no self-test code, no diagnostic tab, no in-app conformance runner. The substrate the off-app sweep reads against is the Process Library (BIKE.L<n>-NNNN designated processes + realizes(spec→binding) edges, migrations 130–132).

Hub-side QA is deliberately removed

v0.6.464 shipped one in-app "Run Diagnostic" (replacing three earlier QA viewers); v0.6.465 made it canon-driven and ships-clean-to-any-vertical (migration 129 added diagnostic_checks); v0.6.466 then removed it from the Hub entirely, dropping the migration's table via migration 133 and stripping the Health tab. v0.6.467 cleaned up the remaining doc orphans (including a draft conformance spec).

The decision: a system cannot honestly grade itself. The Hub may compute its own pass/fail and claim conformance against a canon the Hub itself ships, but that loop is closed — the same code authors and grades. Off-app QA is the only honest reading. Do NOT re-add diagnostic/conformance code to the Hub.

Architecture

+---------------------------+              +-----------------------+
|  Cowork "helm-qa-sweep"   |  drives →    |  Real browser tab     |
|  plugin                   |              |  pointed at Helm QA   |
|  SKILL.md + scenarios     |              |  instance             |
+-------------+-------------+              +----------+------------+
              |                                       |
              | reads the canon                       | observes outcomes
              v                                       v
+---------------------------+              +-----------------------+
|  Process Library          |              |  helm-qa-sweep        |
|  software_process,        |  ←── ←── ←── |  dashboard            |
|  realizes_links,          |              |  (verdicts + audit    |
|  BIKE.L<n>-NNNN catalog   |              |   chain diff)         |
+---------------------------+              +-----------------------+

The plugin is the precedent set by gm-dealer-audit: a SKILL.md driving Claude-in-Chrome through scenarios, posting verdicts to a verdict dashboard. The Hub plays no part in the verification loop — it just gets exercised.

The canon the sweep reads against

The Process Library is the source of truth for what the Hub is supposed to do. The sweep does not invent scenarios; it walks the canon and verifies behaviour matches.

Layer	Stored in	What the sweep does
L1 — Code	`software_process WHERE layer='L1'` + `code_anchor` pointing at `src/index.js:apiXxx`	Exercises the API path via the UI and asserts the audit chain entry lands
L2 — Vertical canon	`software_process WHERE layer='L2'` + designated `BIKE.L2-NNNN`	Asserts the business rule holds end-to-end (e.g. "walk-in service ticket requires deposit")
L3 — SME / per-shop lore	`software_process WHERE layer='L3'` + per-shop variants	Runs against the actual shop's D1; asserts shop-specific behaviour holds

Permanent designations (migration 130, v0.6.466)

Every Process Library row keyed by BIKE.L\<n\>-NNNN — Tom calls the roster the Dr. Strangelove roster. The designation is permanent and stable across renames; the sweep keys verdicts by designation so a process that's been re-titled doesn't lose its history. Examples:

BIKE.L1-0042 — the apiSalesPost code path
BIKE.L2-0017 — the walk-in-service-requires-deposit rule
BIKE.L3-0008 — Swicked-specific chainring-scenario awaiting_customer flip

The connective seam (migrations 131 + 132, Step A + extend)

realizes_links (migration 131) is the join between spec (L2/L3 narrative) and binding (L1 code anchor). Pre-Step-A: 148 spec keys + 239 binding keys + only 2 shared + 0 cross-layer realizes edges. The seam was missing, which means a canon-driven runner had nothing to traverse.

Step A (migration 131) — initial seed of realizes(spec→binding) edges
Step A extend (migration 132) — 40 more edges

Each realizes_links row says "this L2/L3 narrative is realized by this L1 code anchor." That's the join the off-app sweep needs to translate a canon row into an executable scenario.

What the sweep does — and what it does NOT do

Does	Does NOT
Walk the canon and exercise behaviours through the real UI	Read the Hub's own self-grading
Post pass/fail verdicts to its own dashboard	Trust any "diagnostic OK" result emitted by the Hub
Diff `audit_events.chain_hash` before and after each scenario as a tamper-check	Auto-gate the Hub's deploy (verdicts inform Tom; they don't block prod)
Run against the QA Helm instance (`mockup-only-swicked-helm`) for vertical-level regressions	Run against Swicked's prod D1 directly (L3 lore runs against a staging replica)
Cite the canon row (`BIKE.L2-0017`) in every verdict so a failure traces back to a spec	Invent scenarios from operator memory without a canon row to anchor them

The chain of doctrine that landed here

v0.27 — Process Library three-layer model (migration 085). Code / vertical canon / SME knowledge, joined by concept_key.
v0.28 — QA Tool runner (in-Hub) on the qa-tool branch. Phase 1 shipped, Phase 2 partial. Replaced by:
v0.6.464–v0.6.466 — Hub QA collapse + removal. Three QA viewers consolidated into one Run Diagnostic, made canon-driven, then removed entirely. The reasoning: a self-grading system isn't honest QA.
Off-app via Cowork — helm-qa-sweep, pattern modelled on the gm-dealer-audit plugin (agentic SKILL.md driving Claude-in-Chrome with verdicts to a dashboard).
Connective seam first. Step A (migrations 131 + 132) lays the realizes(spec→binding) edges that any canon-driven sweep needs.

The line that survives: the Hub authors the canon and the code; something else grades how well they match.

Architecture​

The canon the sweep reads against​

Permanent designations (migration 130, v0.6.466)​

The connective seam (migrations 131 + 132, Step A + extend)​

What the sweep does — and what it does NOT do​

The chain of doctrine that landed here​

See also​