Offline architecture
The shop's internet will drop. Helm has to keep ringing sales and taking in bikes through the outage. This page is the design.
Slice 1 (connection detection + UI grey-out, 2026-05-12) and Slice 2 (idempotency-key middleware + idempotency_records table, 2026-05-13) are live. Slices 3–7 (LocalCache, MutationQueue, per-screen offline writes) follow incrementally. Full working spec lives at Helm/offline_architecture.md in the repo.
The promise
When the shop's internet is down, Helm keeps running for what a customer standing at the counter physically needs:
- Process cash sales
- Take in bikes (drop-off tickets)
- Look up customers
- Add new customers at point-of-sale
When the internet returns, every offline operation syncs to the cloud automatically.
This is the same pattern Square, Shopify POS, and Lightspeed use in practice. It is not "full offline-first replicate everything" — that's a different architecture with much higher build cost and isn't necessary for the bike-shop use case. See why this scope at the end.
What's online-required vs offline-capable
Offline-capable
| Surface | Offline behavior |
|---|---|
| Sales / Ring-Up (when built) | Cash sales only. Card terminal is blocked anyway because Stripe Terminal needs internet. Banner: "Offline — cash only." |
| Service → New drop-off | Full create flow. Pick customer, pick bike, type reported issues, set ready-by date. Ticket gets a client UUID; syncs on reconnect. |
| Customers → Search | Reads the local cached customer index. |
| Customers → Quick-add | Create new customer locally. Syncs on reconnect. |
| Service → Ticket detail | Read-only from cache. No edits, no status changes, no line adds. |
| Inventory → Variant detail | Read-only from cache. SKU + price + stock-as-of-last-sync visible; no edits. |
| Service → Kanban | Read-only. Last-known state. |
Online-required (gracefully unavailable)
| Surface | Why |
|---|---|
| Today dashboard | Aggregated real-time metrics |
| Reports | Real-time analytics need live data |
| Settings (all sections) | Configuration touches shared state |
| Audit log / Beta Comments queue | Server-side by definition |
| Inventory → receiving / + New SKU / Adjust stock | Mutates shared inventory state; would diverge across devices |
| Rentals | Time-window inventory locking needs server arbitration |
| Trade-in evaluations | Real-time pricing API |
| Online orders | Server-side fulfillment queue |
| Card sales | Stripe Terminal needs internet |
| SMS receipts | Twilio needs internet |
When the operator opens an online-required surface offline, the screen renders greyed-out with a centered overlay:
Offline — this section needs internet. Try again when you're back online, or use a section that works offline (Sales, Service drop-off, Customer lookup).
Architecture
Three runtime primitives.
1. ConnectionMonitor (Slice 1 — live)
A small JS module loaded once at app startup. Responsibilities:
- Listen for browser
online/offlineevents - Ping
GET /api/healthevery 30 seconds (and immediately ononlineevent) — browser sometimes lies about online state, the ping is authoritative - Maintain
window.helmConnection = { state, last_check_at, queue_size, last_error } - Dispatch
helm:connection-changedevent for any other code that wants to react - Toggle
body.helm-offlineclass — CSS does the heavy lifting from there
States: online | offline | syncing.
A connection pill lives in the topnav-right:
● Online(green) — default● Offline(amber) —navigator.onLine === falseor/api/healthfailed● Syncing · N(blue) — queue drain in progress (slice 5+)● Online · Pending: N(amber) — online but queue has items (slice 5+)
Click → small panel showing last sync, queue size, last error. Shift-click → dev-only toggle to simulate offline.
2. LocalCache (Slice 3 — planned)
IndexedDB read replica of the reference data needed for offline ops. Hydrated on first sign-in; refreshed every 5 min while online.
Stores:
customers_index—id, account_number, display_name, primary_phone, primary_email, popup_note, is_tax_exempt, pricing_tierproducts_index—id, sku, display_name, manufacturer_name, retail_price_cents, tax_category_id, is_serializedstaff_index—id, display_name, initials, role_nameshop_config_snapshot— single row mirror ofshop_config
Size estimate at Swicked-scale: ~5–10 MB total.
3. MutationQueue (Slice 5+ — planned)
IndexedDB store of pending writes. Each row:
{
id, // client-generated UUID (also the idempotency key)
endpoint, // e.g. /api/customers, /api/tickets
method, // POST | PUT | DELETE
body_json,
enqueued_at,
optimistic_local_id, // local ID the UI used before server confirmation
last_attempt_at,
attempt_count,
last_error
}
Drains FIFO when online. Idempotency keys mean retries are safe (ADR-0015).
Slice 2 — Server-side idempotency (live)
Slice 2 ships the server half of the idempotency contract: a new idempotency_records table (migration 014_idempotency_records.sql) and a withIdempotency(request, env, endpoint, handler) wrapper in src/index.js. The first endpoint to use it is POST /api/audit/manual; further mutating endpoints opt in by wrapping their handlers.
Table shape (see migration 014 for the canonical SQL):
CREATE TABLE idempotency_records (
id INTEGER PRIMARY KEY AUTOINCREMENT,
idempotency_key TEXT NOT NULL UNIQUE, -- client-minted UUID
endpoint TEXT NOT NULL, -- 'POST /api/sales', etc.
request_body_sha256 TEXT NOT NULL, -- guard against key-reuse-with-different-body
response_status INTEGER NOT NULL, -- first run's HTTP status
response_body TEXT NOT NULL, -- first run's exact JSON body
created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
expires_at TEXT, -- NULL = never; default policy: created_at + 7 days
actor_staff_id INTEGER REFERENCES staff(id) ON DELETE SET NULL
);
Wrapper semantics:
| Request | Wrapper behaviour |
|---|---|
No Idempotency-Key header | Runs the handler directly (header optional today for backward-compat) |
| Header + key not seen | Runs the handler; caches (status, body, body-hash) in idempotency_records; returns the response |
| Header + key seen + same body hash | Returns cached response with Idempotency-Replay: true header — handler is not re-run |
| Header + key seen + different body hash | HTTP 409 — the client reused a key with new data |
| Race: two requests with the same key arrive simultaneously | The UNIQUE index serializes them; the loser reads the winner's row and returns the same response with Idempotency-Replay: race |
Retention is 7 days — long enough for the longest plausible "queue stuck offline then drained" window, short enough to keep the table small.
This is what unblocks a real POST /api/sales: the cashier hits Charge, the network blips, the client retries with the same minted key — and the server processes the sale exactly once. Same primitive will serve the MutationQueue drain in Slices 6–7.
Sync protocol
Startup (every sign-in) — Slice 3+:
GET /api/cache/customers_index→ full customer index → write to IndexedDBGET /api/cache/products_indexGET /api/cache/staff_indexGET /api/cache/shop_config
Each endpoint returns a compact JSON payload of just the fields needed for offline use.
Background refresh — every 5 min while online: re-fetch the same four endpoints, replace local cache atomically.
Online detection:
navigator.onLineis the primary signalGET /api/healthping confirms (browser sometimes lies)- Both must agree
Offline write flow — Slice 5+:
Reconnect drain:
- ConnectionMonitor detects online
- Reads MutationQueue in FIFO order
- POSTs each with its idempotency key
- On
2xx→ remove from queue - On
4xx(validation error) → markdead, surface in a "Needs attention" list (one-tap merge/edit/discard) - On
5xx→ retry with exponential backoff (2s, 4s, 8s, … up to 5 min)
Conflict rules
For the bike-shop single-operator-per-shift use case, conflicts are vanishingly rare.
- Sales (transactions) — append-only. Each sale has a unique client UUID. Never conflicts.
- Service tickets — created with unique IDs. Never conflicts on creation. Edits during offline are append-mostly; safe in order.
- New customers — created with client UUID. If the server later finds a duplicate (same name + phone), surface in the existing merge UI (slice 2).
- Customer edits — last-write-wins per field. Server records both before+after in
audit_mutationsper audit-everything. - Inventory writes — blocked offline. No conflicts possible.
UI primitives
Connection pill — see ConnectionMonitor above.
Toast on transition:
- Going offline: "Offline — only cash sales and drop-offs are available."
- Going online: "Reconnected. Syncing N pending operations…" (then) "All synced."
Persistent offline banner — a thin amber strip across the top of every screen when offline: "OFFLINE MODE · Cash and drop-offs only · Reconnecting…"
Screen-level gating via data-offline-capable="true" attribute:
- Screens without the attribute are greyed when offline
- Greyed screens show a centered overlay: "This needs internet. Try Sales or Service drop-off."
Nav-level dimming via data-online-required="true":
- Nav tabs for online-required screens go to 35% opacity and
pointer-events: nonewhen offline
Section-level gating via data-offline-readonly="true":
- Marked sections render but are uneditable when offline
- Buttons within are disabled with
title="Reconnect to edit"
Build sequence
| Slice | Topic | Status |
|---|---|---|
| 1 | Connection detection + UI grey-out | Built (2026-05-12) |
| 2 | Idempotency-Key header support + idempotency_records table + withIdempotency wrapper | Built (2026-05-13) |
| 3 | LocalCache (customers, products, staff, shop_config) | Planned |
| 4 | Refactor Customers search to read from cache | Planned |
| 5 | MutationQueue + offline-capable Sales (when Sales is built) | Planned |
| 6 | Service drop-off offline | Planned |
| 7 | Service kanban + ticket detail (read-only) offline | Planned |
Each slice is independently shippable. Slice 1 alone gives operators the visible behavior — flip the network off and watch Helm transparently grey the wrong things — which is the credibility primitive. Slices 2–7 fill in the functionality behind that visibility.
What we deliberately don't build
- Bidirectional sync (server pushing changes to client during a multi-device deployment) — single-shop, single-device assumption holds for v1
- Encryption of the local cache — the threat model is "operator's family member opens the file in a text editor," not state actors; sensitive data with real threat exposure lives on the server with audit-logged access
- Conflict resolution UI — conflicts are rare enough to surface as audit-log warnings rather than build a merge interface
- Multi-tab sync — single-tab assumption for v1
- Web Worker for sync — main thread is fine at this scale
- Stripe Terminal store-and-forward — Square does this; it's its own slice with merchant-specific config
Forward-compatibility checklist
- ☐ Every mutating endpoint accepts
Idempotency-Key(backward-compatible — header optional today) - ☐ Every screen carries
data-offline-capableso the rule is explicit and reviewable in code - ☐ LocalCache schema is versioned so future migrations can detect a stale store and re-hydrate
- ☐ MutationQueue rows store the intent (endpoint + body) not the outcome — so endpoint payload changes don't strand queued items
- ☐ ConnectionMonitor's API (
window.helmConnection,helm:connection-changedevent) is the only surface other code couples to — implementation can swap (poll / SSE / WebSocket) without breaking callers
Why this scope
The temptation is to do "real" offline-first — every screen reads exclusively from IndexedDB, writes go through a queue regardless of connectivity, server is just a sync target. That's a different product. Cost-of-build is 3–4× this scope; it would mean rewriting every existing screen.
What the shop owner actually needs is the till keeps working in an outage. Everything else — receiving inventory, building reports, adding SKUs, configuring tax rules — can wait until the connection is back. That's the realistic mode of use.
The bonus: the small scope means we can ship the UI behavior (Slice 1) while the data layer is still being built. The operator sees "offline" appear when they yank the cable even if no screen actually writes offline yet. That visibility is the trust primitive; Slices 2–7 fill in the functionality behind it incrementally.
See also
- Fail quietly, recover loudly — the broader principle the offline mode lives under
- ADR-0015: Idempotency keys on external writes — also applies to client-to-server retries here
- Slice 5 — Transactions & Payments — built natively on the queue when it ships
- Slice 4 — Service Tickets — drop-off goes offline-capable in Slice 6
- C4 — Component — ConnectionMonitor sits between the Router and the Worker shell
- Working spec:
Helm/offline_architecture.mdin the Helm repo