Skip to main content

Offline architecture

The shop's internet will drop. Helm has to keep ringing sales and taking in bikes through the outage. This page is the design.

Slice 1 + Slice 2 shipped · slices 3–7 planned

Slice 1 (connection detection + UI grey-out, 2026-05-12) and Slice 2 (idempotency-key middleware + idempotency_records table, 2026-05-13) are live. Slices 3–7 (LocalCache, MutationQueue, per-screen offline writes) follow incrementally. Full working spec lives at Helm/offline_architecture.md in the repo.

The promise

When the shop's internet is down, Helm keeps running for what a customer standing at the counter physically needs:

  • Process cash sales
  • Take in bikes (drop-off tickets)
  • Look up customers
  • Add new customers at point-of-sale

When the internet returns, every offline operation syncs to the cloud automatically.

This is the same pattern Square, Shopify POS, and Lightspeed use in practice. It is not "full offline-first replicate everything" — that's a different architecture with much higher build cost and isn't necessary for the bike-shop use case. See why this scope at the end.

What's online-required vs offline-capable

Offline-capable

SurfaceOffline behavior
Sales / Ring-Up (when built)Cash sales only. Card terminal is blocked anyway because Stripe Terminal needs internet. Banner: "Offline — cash only."
Service → New drop-offFull create flow. Pick customer, pick bike, type reported issues, set ready-by date. Ticket gets a client UUID; syncs on reconnect.
Customers → SearchReads the local cached customer index.
Customers → Quick-addCreate new customer locally. Syncs on reconnect.
Service → Ticket detailRead-only from cache. No edits, no status changes, no line adds.
Inventory → Variant detailRead-only from cache. SKU + price + stock-as-of-last-sync visible; no edits.
Service → KanbanRead-only. Last-known state.

Online-required (gracefully unavailable)

SurfaceWhy
Today dashboardAggregated real-time metrics
ReportsReal-time analytics need live data
Settings (all sections)Configuration touches shared state
Audit log / Beta Comments queueServer-side by definition
Inventory → receiving / + New SKU / Adjust stockMutates shared inventory state; would diverge across devices
RentalsTime-window inventory locking needs server arbitration
Trade-in evaluationsReal-time pricing API
Online ordersServer-side fulfillment queue
Card salesStripe Terminal needs internet
SMS receiptsTwilio needs internet

When the operator opens an online-required surface offline, the screen renders greyed-out with a centered overlay:

Offline — this section needs internet. Try again when you're back online, or use a section that works offline (Sales, Service drop-off, Customer lookup).

Architecture

Three runtime primitives.

1. ConnectionMonitor (Slice 1 — live)

A small JS module loaded once at app startup. Responsibilities:

  • Listen for browser online / offline events
  • Ping GET /api/health every 30 seconds (and immediately on online event) — browser sometimes lies about online state, the ping is authoritative
  • Maintain window.helmConnection = { state, last_check_at, queue_size, last_error }
  • Dispatch helm:connection-changed event for any other code that wants to react
  • Toggle body.helm-offline class — CSS does the heavy lifting from there

States: online | offline | syncing.

A connection pill lives in the topnav-right:

  • ● Online (green) — default
  • ● Offline (amber) — navigator.onLine === false or /api/health failed
  • ● Syncing · N (blue) — queue drain in progress (slice 5+)
  • ● Online · Pending: N (amber) — online but queue has items (slice 5+)

Click → small panel showing last sync, queue size, last error. Shift-click → dev-only toggle to simulate offline.

2. LocalCache (Slice 3 — planned)

IndexedDB read replica of the reference data needed for offline ops. Hydrated on first sign-in; refreshed every 5 min while online.

Stores:

  • customers_indexid, account_number, display_name, primary_phone, primary_email, popup_note, is_tax_exempt, pricing_tier
  • products_indexid, sku, display_name, manufacturer_name, retail_price_cents, tax_category_id, is_serialized
  • staff_indexid, display_name, initials, role_name
  • shop_config_snapshot — single row mirror of shop_config

Size estimate at Swicked-scale: ~5–10 MB total.

3. MutationQueue (Slice 5+ — planned)

IndexedDB store of pending writes. Each row:

{
id, // client-generated UUID (also the idempotency key)
endpoint, // e.g. /api/customers, /api/tickets
method, // POST | PUT | DELETE
body_json,
enqueued_at,
optimistic_local_id, // local ID the UI used before server confirmation
last_attempt_at,
attempt_count,
last_error
}

Drains FIFO when online. Idempotency keys mean retries are safe (ADR-0015).

Slice 2 — Server-side idempotency (live)

Slice 2 ships the server half of the idempotency contract: a new idempotency_records table (migration 014_idempotency_records.sql) and a withIdempotency(request, env, endpoint, handler) wrapper in src/index.js. The first endpoint to use it is POST /api/audit/manual; further mutating endpoints opt in by wrapping their handlers.

Table shape (see migration 014 for the canonical SQL):

CREATE TABLE idempotency_records (
id INTEGER PRIMARY KEY AUTOINCREMENT,
idempotency_key TEXT NOT NULL UNIQUE, -- client-minted UUID
endpoint TEXT NOT NULL, -- 'POST /api/sales', etc.
request_body_sha256 TEXT NOT NULL, -- guard against key-reuse-with-different-body
response_status INTEGER NOT NULL, -- first run's HTTP status
response_body TEXT NOT NULL, -- first run's exact JSON body
created_at TEXT NOT NULL DEFAULT CURRENT_TIMESTAMP,
expires_at TEXT, -- NULL = never; default policy: created_at + 7 days
actor_staff_id INTEGER REFERENCES staff(id) ON DELETE SET NULL
);

Wrapper semantics:

RequestWrapper behaviour
No Idempotency-Key headerRuns the handler directly (header optional today for backward-compat)
Header + key not seenRuns the handler; caches (status, body, body-hash) in idempotency_records; returns the response
Header + key seen + same body hashReturns cached response with Idempotency-Replay: true header — handler is not re-run
Header + key seen + different body hashHTTP 409 — the client reused a key with new data
Race: two requests with the same key arrive simultaneouslyThe UNIQUE index serializes them; the loser reads the winner's row and returns the same response with Idempotency-Replay: race

Retention is 7 days — long enough for the longest plausible "queue stuck offline then drained" window, short enough to keep the table small.

This is what unblocks a real POST /api/sales: the cashier hits Charge, the network blips, the client retries with the same minted key — and the server processes the sale exactly once. Same primitive will serve the MutationQueue drain in Slices 6–7.

Sync protocol

Startup (every sign-in) — Slice 3+:

  1. GET /api/cache/customers_index → full customer index → write to IndexedDB
  2. GET /api/cache/products_index
  3. GET /api/cache/staff_index
  4. GET /api/cache/shop_config

Each endpoint returns a compact JSON payload of just the fields needed for offline use.

Background refresh — every 5 min while online: re-fetch the same four endpoints, replace local cache atomically.

Online detection:

  • navigator.onLine is the primary signal
  • GET /api/health ping confirms (browser sometimes lies)
  • Both must agree

Offline write flow — Slice 5+:

Reconnect drain:

  1. ConnectionMonitor detects online
  2. Reads MutationQueue in FIFO order
  3. POSTs each with its idempotency key
  4. On 2xx → remove from queue
  5. On 4xx (validation error) → mark dead, surface in a "Needs attention" list (one-tap merge/edit/discard)
  6. On 5xx → retry with exponential backoff (2s, 4s, 8s, … up to 5 min)

Conflict rules

For the bike-shop single-operator-per-shift use case, conflicts are vanishingly rare.

  • Sales (transactions) — append-only. Each sale has a unique client UUID. Never conflicts.
  • Service tickets — created with unique IDs. Never conflicts on creation. Edits during offline are append-mostly; safe in order.
  • New customers — created with client UUID. If the server later finds a duplicate (same name + phone), surface in the existing merge UI (slice 2).
  • Customer edits — last-write-wins per field. Server records both before+after in audit_mutations per audit-everything.
  • Inventory writes — blocked offline. No conflicts possible.

UI primitives

Connection pill — see ConnectionMonitor above.

Toast on transition:

  • Going offline: "Offline — only cash sales and drop-offs are available."
  • Going online: "Reconnected. Syncing N pending operations…" (then) "All synced."

Persistent offline banner — a thin amber strip across the top of every screen when offline: "OFFLINE MODE · Cash and drop-offs only · Reconnecting…"

Screen-level gating via data-offline-capable="true" attribute:

  • Screens without the attribute are greyed when offline
  • Greyed screens show a centered overlay: "This needs internet. Try Sales or Service drop-off."

Nav-level dimming via data-online-required="true":

  • Nav tabs for online-required screens go to 35% opacity and pointer-events: none when offline

Section-level gating via data-offline-readonly="true":

  • Marked sections render but are uneditable when offline
  • Buttons within are disabled with title="Reconnect to edit"

Build sequence

SliceTopicStatus
1Connection detection + UI grey-outBuilt (2026-05-12)
2Idempotency-Key header support + idempotency_records table + withIdempotency wrapperBuilt (2026-05-13)
3LocalCache (customers, products, staff, shop_config)Planned
4Refactor Customers search to read from cachePlanned
5MutationQueue + offline-capable Sales (when Sales is built)Planned
6Service drop-off offlinePlanned
7Service kanban + ticket detail (read-only) offlinePlanned

Each slice is independently shippable. Slice 1 alone gives operators the visible behavior — flip the network off and watch Helm transparently grey the wrong things — which is the credibility primitive. Slices 2–7 fill in the functionality behind that visibility.

What we deliberately don't build

  • Bidirectional sync (server pushing changes to client during a multi-device deployment) — single-shop, single-device assumption holds for v1
  • Encryption of the local cache — the threat model is "operator's family member opens the file in a text editor," not state actors; sensitive data with real threat exposure lives on the server with audit-logged access
  • Conflict resolution UI — conflicts are rare enough to surface as audit-log warnings rather than build a merge interface
  • Multi-tab sync — single-tab assumption for v1
  • Web Worker for sync — main thread is fine at this scale
  • Stripe Terminal store-and-forward — Square does this; it's its own slice with merchant-specific config

Forward-compatibility checklist

  • ☐ Every mutating endpoint accepts Idempotency-Key (backward-compatible — header optional today)
  • ☐ Every screen carries data-offline-capable so the rule is explicit and reviewable in code
  • ☐ LocalCache schema is versioned so future migrations can detect a stale store and re-hydrate
  • ☐ MutationQueue rows store the intent (endpoint + body) not the outcome — so endpoint payload changes don't strand queued items
  • ☐ ConnectionMonitor's API (window.helmConnection, helm:connection-changed event) is the only surface other code couples to — implementation can swap (poll / SSE / WebSocket) without breaking callers

Why this scope

The temptation is to do "real" offline-first — every screen reads exclusively from IndexedDB, writes go through a queue regardless of connectivity, server is just a sync target. That's a different product. Cost-of-build is 3–4× this scope; it would mean rewriting every existing screen.

What the shop owner actually needs is the till keeps working in an outage. Everything else — receiving inventory, building reports, adding SKUs, configuring tax rules — can wait until the connection is back. That's the realistic mode of use.

The bonus: the small scope means we can ship the UI behavior (Slice 1) while the data layer is still being built. The operator sees "offline" appear when they yank the cable even if no screen actually writes offline yet. That visibility is the trust primitive; Slices 2–7 fill in the functionality behind it incrementally.

See also