Conformance suite — off-app QA
The Hub is QA'd from outside itself, by a Cowork plugin (helm-qa-sweep) driving a real browser session against a real Helm instance. The Hub itself ships no self-test code, no diagnostic tab, no in-app conformance runner. The substrate the off-app sweep reads against is the Process Library (BIKE.L<n>-NNNN designated processes + realizes(spec→binding) edges, migrations 130–132).
v0.6.464 shipped one in-app "Run Diagnostic" (replacing three earlier QA viewers); v0.6.465 made it canon-driven and ships-clean-to-any-vertical (migration 129 added diagnostic_checks); v0.6.466 then removed it from the Hub entirely, dropping the migration's table via migration 133 and stripping the Health tab. v0.6.467 cleaned up the remaining doc orphans (including a draft conformance spec).
The decision: a system cannot honestly grade itself. The Hub may compute its own pass/fail and claim conformance against a canon the Hub itself ships, but that loop is closed — the same code authors and grades. Off-app QA is the only honest reading. Do NOT re-add diagnostic/conformance code to the Hub.
Architecture
+---------------------------+ +-----------------------+
| Cowork "helm-qa-sweep" | drives → | Real browser tab |
| plugin | | pointed at Helm QA |
| SKILL.md + scenarios | | instance |
+-------------+-------------+ +----------+------------+
| |
| reads the canon | observes outcomes
v v
+---------------------------+ +-----------------------+
| Process Library | | helm-qa-sweep |
| software_process, | ←── ←── ←── | dashboard |
| realizes_links, | | (verdicts + audit |
| BIKE.L<n>-NNNN catalog | | chain diff) |
+---------------------------+ +-----------------------+
The plugin is the precedent set by gm-dealer-audit: a SKILL.md driving Claude-in-Chrome through scenarios, posting verdicts to a verdict dashboard. The Hub plays no part in the verification loop — it just gets exercised.
The canon the sweep reads against
The Process Library is the source of truth for what the Hub is supposed to do. The sweep does not invent scenarios; it walks the canon and verifies behaviour matches.
| Layer | Stored in | What the sweep does |
|---|---|---|
| L1 — Code | software_process WHERE layer='L1' + code_anchor pointing at src/index.js:apiXxx | Exercises the API path via the UI and asserts the audit chain entry lands |
| L2 — Vertical canon | software_process WHERE layer='L2' + designated BIKE.L2-NNNN | Asserts the business rule holds end-to-end (e.g. "walk-in service ticket requires deposit") |
| L3 — SME / per-shop lore | software_process WHERE layer='L3' + per-shop variants | Runs against the actual shop's D1; asserts shop-specific behaviour holds |
Permanent designations (migration 130, v0.6.466)
Every Process Library row keyed by BIKE.L\<n\>-NNNN — Tom calls the roster the Dr. Strangelove roster. The designation is permanent and stable across renames; the sweep keys verdicts by designation so a process that's been re-titled doesn't lose its history. Examples:
BIKE.L1-0042— the apiSalesPost code pathBIKE.L2-0017— the walk-in-service-requires-deposit ruleBIKE.L3-0008— Swicked-specific chainring-scenarioawaiting_customerflip
The connective seam (migrations 131 + 132, Step A + extend)
realizes_links (migration 131) is the join between spec (L2/L3 narrative) and binding (L1 code anchor). Pre-Step-A: 148 spec keys + 239 binding keys + only 2 shared + 0 cross-layer realizes edges. The seam was missing, which means a canon-driven runner had nothing to traverse.
- Step A (migration 131) — initial seed of
realizes(spec→binding)edges - Step A extend (migration 132) — 40 more edges
Each realizes_links row says "this L2/L3 narrative is realized by this L1 code anchor." That's the join the off-app sweep needs to translate a canon row into an executable scenario.
What the sweep does — and what it does NOT do
| Does | Does NOT |
|---|---|
| Walk the canon and exercise behaviours through the real UI | Read the Hub's own self-grading |
| Post pass/fail verdicts to its own dashboard | Trust any "diagnostic OK" result emitted by the Hub |
Diff audit_events.chain_hash before and after each scenario as a tamper-check | Auto-gate the Hub's deploy (verdicts inform Tom; they don't block prod) |
Run against the QA Helm instance (mockup-only-swicked-helm) for vertical-level regressions | Run against Swicked's prod D1 directly (L3 lore runs against a staging replica) |
Cite the canon row (BIKE.L2-0017) in every verdict so a failure traces back to a spec | Invent scenarios from operator memory without a canon row to anchor them |
The chain of doctrine that landed here
- v0.27 — Process Library three-layer model (migration 085). Code / vertical canon / SME knowledge, joined by
concept_key. - v0.28 — QA Tool runner (in-Hub) on the
qa-toolbranch. Phase 1 shipped, Phase 2 partial. Replaced by: - v0.6.464–v0.6.466 — Hub QA collapse + removal. Three QA viewers consolidated into one Run Diagnostic, made canon-driven, then removed entirely. The reasoning: a self-grading system isn't honest QA.
- Off-app via Cowork —
helm-qa-sweep, pattern modelled on thegm-dealer-auditplugin (agentic SKILL.md driving Claude-in-Chrome with verdicts to a dashboard). - Connective seam first. Step A (migrations 131 + 132) lays the
realizes(spec→binding)edges that any canon-driven sweep needs.
The line that survives: the Hub authors the canon and the code; something else grades how well they match.
See also
- Process Library — the L1/L2/L3 canon the sweep reads against
- Audit-everything principle — the chain the sweep diffs as oracle
migrations/130_process_designations.sql— BIKE.L<n>-NNNN designationsmigrations/131_realizes_links.sql— initial spec→binding edgesmigrations/132_realizes_links_extend.sql— Step A extend (40 more edges)migrations/133_drop_diagnostic_checks.sql— Hub-side QA removed- Cowork plugin
helm-qa-sweep(separate repo) — the off-app sweep itself