Offline architecture

The shop's internet will drop. Helm has to keep ringing sales and taking in bikes through the outage. This page is the design.

Slice 1 + Slice 2 shipped · slices 3–7 planned

Slice 1 (connection detection + UI grey-out, 2026-05-12) and Slice 2 (idempotency-key middleware + idempotency_records table, 2026-05-13) are live. Slices 3–7 (LocalCache, MutationQueue, per-screen offline writes) follow incrementally. Full working spec lives at Helm/offline_architecture.md in the repo.

The promise

When the shop's internet is down, Helm keeps running for what a customer standing at the counter physically needs:

Process cash sales
Take in bikes (drop-off tickets)
Look up customers
Add new customers at point-of-sale

When the internet returns, every offline operation syncs to the cloud automatically.

This is the same pattern Square, Shopify POS, and Lightspeed use in practice. It is not "full offline-first replicate everything" — that's a different architecture with much higher build cost and isn't necessary for the bike-shop use case. See why this scope at the end.

What's online-required vs offline-capable

Offline-capable

Surface	Offline behavior
Sales / Ring-Up (when built)	Cash sales only. Card terminal is blocked anyway because Stripe Terminal needs internet. Banner: "Offline — cash only."
Service → New drop-off	Full create flow. Pick customer, pick bike, type reported issues, set ready-by date. Ticket gets a client UUID; syncs on reconnect.
Customers → Search	Reads the local cached customer index.
Customers → Quick-add	Create new customer locally. Syncs on reconnect.
Service → Ticket detail	Read-only from cache. No edits, no status changes, no line adds.
Inventory → Variant detail	Read-only from cache. SKU + price + stock-as-of-last-sync visible; no edits.
Service → Kanban	Read-only. Last-known state.

Online-required (gracefully unavailable)

Surface	Why
Today dashboard	Aggregated real-time metrics
Reports	Real-time analytics need live data
Settings (all sections)	Configuration touches shared state
Audit log / Beta Comments queue	Server-side by definition
Inventory → receiving / + New SKU / Adjust stock	Mutates shared inventory state; would diverge across devices
Rentals	Time-window inventory locking needs server arbitration
Trade-in evaluations	Real-time pricing API
Online orders	Server-side fulfillment queue
Card sales	Stripe Terminal needs internet
SMS receipts	Twilio needs internet

When the operator opens an online-required surface offline, the screen renders greyed-out with a centered overlay:

Offline — this section needs internet. Try again when you're back online, or use a section that works offline (Sales, Service drop-off, Customer lookup).

Architecture

Three runtime primitives.

1. ConnectionMonitor (Slice 1 — live)

A small JS module loaded once at app startup. Responsibilities:

Listen for browser online / offline events
Ping GET /api/health every 30 seconds (and immediately on online event) — browser sometimes lies about online state, the ping is authoritative
Maintain window.helmConnection = { state, last_check_at, queue_size, last_error }
Dispatch helm:connection-changed event for any other code that wants to react
Toggle body.helm-offline class — CSS does the heavy lifting from there

States: online | offline | syncing.

A connection pill lives in the topnav-right:

● Online (green) — default
● Offline (amber) — navigator.onLine === false or /api/health failed
● Syncing · N (blue) — queue drain in progress (slice 5+)
● Online · Pending: N (amber) — online but queue has items (slice 5+)

Click → small panel showing last sync, queue size, last error. Shift-click → dev-only toggle to simulate offline.

2. LocalCache (Slice 3 — planned)

IndexedDB read replica of the reference data needed for offline ops. Hydrated on first sign-in; refreshed every 5 min while online.

Stores:

customers_index — id, account_number, display_name, primary_phone, primary_email, popup_note, is_tax_exempt, pricing_tier
products_index — id, sku, display_name, manufacturer_name, retail_price_cents, tax_category_id, is_serialized
staff_index — id, display_name, initials, role_name
shop_config_snapshot — single row mirror of shop_config

Size estimate at Swicked-scale: ~5–10 MB total.

3. MutationQueue (Slice 5+ — planned)

IndexedDB store of pending writes. Each row:

{
  id,                      // client-generated UUID (also the idempotency key)
  endpoint,                // e.g. /api/customers, /api/tickets
  method,                  // POST | PUT | DELETE
  body_json,
  enqueued_at,
  optimistic_local_id,     // local ID the UI used before server confirmation
  last_attempt_at,
  attempt_count,
  last_error
}

Drains FIFO when online. Idempotency keys mean retries are safe (ADR-0015).

Slice 2 — Server-side idempotency (live)

Slice 2 ships the server half of the idempotency contract: a new idempotency_records table (migration 014_idempotency_records.sql) and a withIdempotency(request, env, endpoint, handler) wrapper in src/index.js. The first endpoint to use it is POST /api/audit/manual; further mutating endpoints opt in by wrapping their handlers.

Table shape (see migration 014 for the canonical SQL):

CREATE TABLE idempotency_records (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  idempotency_key TEXT NOT NULL UNIQUE,    -- client-minted UUID
  endpoint TEXT NOT NULL,                  -- 'POST /api/sales', etc.
  request_body_sha256 TEXT NOT NULL,       -- guard against key-reuse-with-different-body
  response_status INTEGER NOT NULL,        -- first run's HTTP status
  response_body TEXT NOT NULL,             -- first run's exact JSON body
  created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
  expires_at TEXT,                         -- NULL = never; default policy: created_at + 7 days
  actor_staff_id INTEGER REFERENCES staff(id) ON DELETE SET NULL
);

Wrapper semantics:

Request	Wrapper behaviour
No `Idempotency-Key` header	Runs the handler directly (header optional today for backward-compat)
Header + key not seen	Runs the handler; caches `(status, body, body-hash)` in `idempotency_records`; returns the response
Header + key seen + same body hash	Returns cached response with `Idempotency-Replay: true` header — handler is not re-run
Header + key seen + different body hash	HTTP 409 — the client reused a key with new data
Race: two requests with the same key arrive simultaneously	The UNIQUE index serializes them; the loser reads the winner's row and returns the same response with `Idempotency-Replay: race`

Retention is 7 days — long enough for the longest plausible "queue stuck offline then drained" window, short enough to keep the table small.

This is what unblocks a real POST /api/sales: the cashier hits Charge, the network blips, the client retries with the same minted key — and the server processes the sale exactly once. Same primitive will serve the MutationQueue drain in Slices 6–7.

Sync protocol

Startup (every sign-in) — Slice 3+:

GET /api/cache/customers_index → full customer index → write to IndexedDB
GET /api/cache/products_index
GET /api/cache/staff_index
GET /api/cache/shop_config

Each endpoint returns a compact JSON payload of just the fields needed for offline use.

Background refresh — every 5 min while online: re-fetch the same four endpoints, replace local cache atomically.

Online detection:

navigator.onLine is the primary signal
GET /api/health ping confirms (browser sometimes lies)
Both must agree

Offline write flow — Slice 5+:

Reconnect drain:

ConnectionMonitor detects online
Reads MutationQueue in FIFO order
POSTs each with its idempotency key
On 2xx → remove from queue
On 4xx (validation error) → mark dead, surface in a "Needs attention" list (one-tap merge/edit/discard)
On 5xx → retry with exponential backoff (2s, 4s, 8s, … up to 5 min)

Conflict rules

For the bike-shop single-operator-per-shift use case, conflicts are vanishingly rare.

Sales (transactions) — append-only. Each sale has a unique client UUID. Never conflicts.
Service tickets — created with unique IDs. Never conflicts on creation. Edits during offline are append-mostly; safe in order.
New customers — created with client UUID. If the server later finds a duplicate (same name + phone), surface in the existing merge UI (slice 2).
Customer edits — last-write-wins per field. Server records both before+after in audit_mutations per audit-everything.
Inventory writes — blocked offline. No conflicts possible.

UI primitives

Connection pill — see ConnectionMonitor above.

Toast on transition:

Going offline: "Offline — only cash sales and drop-offs are available."
Going online: "Reconnected. Syncing N pending operations…" (then) "All synced."

Persistent offline banner — a thin amber strip across the top of every screen when offline: "OFFLINE MODE · Cash and drop-offs only · Reconnecting…"

Screen-level gating via data-offline-capable="true" attribute:

Screens without the attribute are greyed when offline
Greyed screens show a centered overlay: "This needs internet. Try Sales or Service drop-off."

Nav-level dimming via data-online-required="true":

Nav tabs for online-required screens go to 35% opacity and pointer-events: none when offline

Section-level gating via data-offline-readonly="true":

Marked sections render but are uneditable when offline
Buttons within are disabled with title="Reconnect to edit"

Build sequence

Slice	Topic	Status
1	Connection detection + UI grey-out	Built (2026-05-12)
2	`Idempotency-Key` header support + `idempotency_records` table + `withIdempotency` wrapper	Built (2026-05-13)
3	LocalCache (customers, products, staff, shop_config)	Planned
4	Refactor Customers search to read from cache	Planned
5	MutationQueue + offline-capable Sales (when Sales is built)	Planned
6	Service drop-off offline	Planned
7	Service kanban + ticket detail (read-only) offline	Planned

Each slice is independently shippable. Slice 1 alone gives operators the visible behavior — flip the network off and watch Helm transparently grey the wrong things — which is the credibility primitive. Slices 2–7 fill in the functionality behind that visibility.

What we deliberately don't build

Bidirectional sync (server pushing changes to client during a multi-device deployment) — single-shop, single-device assumption holds for v1
Encryption of the local cache — the threat model is "operator's family member opens the file in a text editor," not state actors; sensitive data with real threat exposure lives on the server with audit-logged access
Conflict resolution UI — conflicts are rare enough to surface as audit-log warnings rather than build a merge interface
Multi-tab sync — single-tab assumption for v1
Web Worker for sync — main thread is fine at this scale
Stripe Terminal store-and-forward — Square does this; it's its own slice with merchant-specific config

Forward-compatibility checklist

☐ Every mutating endpoint accepts Idempotency-Key (backward-compatible — header optional today)
☐ Every screen carries data-offline-capable so the rule is explicit and reviewable in code
☐ LocalCache schema is versioned so future migrations can detect a stale store and re-hydrate
☐ MutationQueue rows store the intent (endpoint + body) not the outcome — so endpoint payload changes don't strand queued items
☐ ConnectionMonitor's API (window.helmConnection, helm:connection-changed event) is the only surface other code couples to — implementation can swap (poll / SSE / WebSocket) without breaking callers

Why this scope

The temptation is to do "real" offline-first — every screen reads exclusively from IndexedDB, writes go through a queue regardless of connectivity, server is just a sync target. That's a different product. Cost-of-build is 3–4× this scope; it would mean rewriting every existing screen.

What the shop owner actually needs is the till keeps working in an outage. Everything else — receiving inventory, building reports, adding SKUs, configuring tax rules — can wait until the connection is back. That's the realistic mode of use.

The bonus: the small scope means we can ship the UI behavior (Slice 1) while the data layer is still being built. The operator sees "offline" appear when they yank the cable even if no screen actually writes offline yet. That visibility is the trust primitive; Slices 2–7 fill in the functionality behind it incrementally.

The promise​

What's online-required vs offline-capable​

Offline-capable​

Online-required (gracefully unavailable)​

Architecture​

1. ConnectionMonitor (Slice 1 — live)​

2. LocalCache (Slice 3 — planned)​

3. MutationQueue (Slice 5+ — planned)​

Slice 2 — Server-side idempotency (live)​

Sync protocol​

Conflict rules​

UI primitives​

Build sequence​

What we deliberately don't build​

Forward-compatibility checklist​

Why this scope​

See also​