Skip to main content

Audit everything

If a row in any table changes, an audit row gets written. The audit log is its own table family, hash-chained for tamper-evidence, and complete enough that a forensic reconstruction of any past state of the database is possible by replaying the log backwards from the current state.

v0.3 — wiring live + build identity stamped

Schema + per-endpoint wiring (recordMutation helper) are both live. Every mutating endpoint writes one audit_events row + one audit_mutations row. Migration 013 added build_version to every audit row — see build versioning. The remaining piece is the daily chain-hash verification cron.

The shortest version

Every INSERT, UPDATE, DELETE on a business-relevant table goes through a helper that:

  1. Captures the row state before (or null, on insert)
  2. Applies the change
  3. Captures the row state after (or null, on delete)
  4. Writes one audit_events row (the index) and one audit_mutations row (the snapshot)
  5. Updates audit_events.chain_hash to include the previous event's chain hash

The chain breaks if anyone backdoors the database. The daily cron verifies the chain. Investigations have ground truth.

What gets audited

Every mutation to these tables:

  • customers, customer_bikes, customer_notes, customer_marketing_prefs
  • service_tickets, service_ticket_lines, service_ticket_messages, service_ticket_status_history
  • transactions, transaction_lines, transaction_payments, transaction_refunds
  • inventory_skus, inventory_variants, inventory_adjustments, purchase_orders, purchase_order_lines
  • rentals, rental_bookings, rental_returns
  • trade_ins, consignments
  • staff, roles, role_permissions, staff_screen_permissions, staff_sessions
  • shop_config, service_categories

What does NOT get audited:

  • ai_conversations, ai_messages — these have their own privacy semantics (see AI integration)
  • The audit tables themselves (would loop)
  • Internal Cloudflare bindings or runtime state
  • Search indexes (derived data — rebuildable from the audit log)

The schema

Two tables make up the audit subsystem:

audit_events (the index)

CREATE TABLE audit_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
prev_chain_hash TEXT,
chain_hash TEXT NOT NULL,
at TEXT NOT NULL DEFAULT (datetime('now')),
staff_id INTEGER REFERENCES staff(id),
staff_label TEXT,
action TEXT NOT NULL,
entity_type TEXT NOT NULL,
entity_id TEXT NOT NULL,
ip TEXT,
user_agent TEXT,
request_id TEXT,
build_version TEXT -- added migration 013; see Build versioning
);

chain_hash = sha256(prev_chain_hash || canonical_json(this_row_excluding_chain_hash)).

The first row (id=1) has prev_chain_hash = NULL and a known seed for chain_hash. Every subsequent row's chain_hash depends on all rows before it.

audit_mutations (the snapshot)

CREATE TABLE audit_mutations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
event_id INTEGER NOT NULL REFERENCES audit_events(id),
table_name TEXT NOT NULL,
row_id TEXT NOT NULL,
before_json TEXT,
after_json TEXT,
summary TEXT
);

before_json is the row before the mutation (or NULL on insert). after_json is the row after (or NULL on delete). summary is a human-readable description: "Updated customer's phone from 250-555-1234 to 250-555-5678."

Why two tables instead of one

The index/snapshot split serves two purposes:

  • Index queries ("show me everything Robbie did yesterday") read audit_events only — small, fast.
  • Investigations ("what did customer record 4521 look like before the merge?") join audit_eventsaudit_mutations and reconstruct row history.

A single fat audit table would make the common index queries slow because each row would carry the JSON snapshots inline.

The chain hash

A simple linked-list of hashes. Each row's chain_hash includes:

  • The previous row's chain_hash
  • All other fields of this row, serialized in a canonical order

If a database administrator (or a bad actor with table access) edits a past audit_events row, the rest of the chain no longer verifies. They'd have to recompute every subsequent hash to hide the tamper.

The daily cron job (cron/daily.js) recomputes the chain from row 1 and reports any break to the shop owner via email. Cost: a few seconds of CPU per shop, daily.

For high-stakes investigations, the chain hash is also periodically anchored externally (planned: post the latest hash daily to a Cloudflare KV value or an external timestamping service). Anchoring makes "the entire chain was rewritten" detectable, not just "one row was edited."

The audit helper

Every mutation flows through a wrapper named recordMutation (in src/index.js). It captures the row before/after, derives a human-readable summary, and writes both audit_events and audit_mutations rows together.

// Actual shape (simplified)
async function recordMutation({ env, ctx, actor, action, target_table, target_id, before, after, summary }) {
const prev = await env.DB.prepare(
`SELECT chain_hash FROM audit_events ORDER BY id DESC LIMIT 1`
).first();
const chain_hash = sha256(prev?.chain_hash + canonicalJson({ actor, action, target_table, target_id, at: now }));
const event = await env.DB.prepare(`
INSERT INTO audit_events (prev_chain_hash, chain_hash, actor_id, actor_label, action,
target_table, target_id, ip, user_agent, request_id, summary)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) RETURNING id
`).bind(prev?.chain_hash, chain_hash, actor.id, actor.label, action, target_table, target_id,
ctx.ip, ctx.ua, ctx.request_id, summary).first();
await env.DB.prepare(`
INSERT INTO audit_mutations (event_id, table_name, row_id, before_json, after_json, diff_json)
VALUES (?, ?, ?, ?, ?, ?)
`).bind(event.id, target_table, target_id, JSON.stringify(before), JSON.stringify(after), JSON.stringify(diff(before, after))).run();
}

Endpoints call recordMutation after the mutation succeeds. If the audit write fails (very rare), the endpoint fails too — we do not commit mutations whose audit row could not be written.

This is conservative. The alternative ("commit the mutation, retry the audit later") would mean some mutations exist with no audit row, breaking the chain's completeness. We accept the small availability hit.

Client-only flows: POST /api/audit/manual

Some flows live entirely in the browser today. To keep those in the audit chain, the Worker exposes POST /api/audit/manual which accepts a normalized event shape (event_type, summary, operation, target, context, detail) and routes it through the same recordMutation helper. The hash chain stays intact; the timeline of client actions lives alongside server mutations.

This is the bridge for any future client-only feature that needs to be auditable before it has a server counterpart.

What investigations look like

A typical question and what the audit log enables:

"Did Robbie or James cancel that ticket?"

SELECT e.at, e.staff_label, m.summary
FROM audit_events e JOIN audit_mutations m ON m.event_id = e.id
WHERE m.table_name = 'service_tickets'
AND m.row_id = '2506'
AND m.summary LIKE '%cancel%'
ORDER BY e.at;

"Show me everything done to customer 4521 in March"

SELECT e.at, e.staff_label, m.summary
FROM audit_events e JOIN audit_mutations m ON m.event_id = e.id
WHERE m.table_name LIKE 'customer%'
AND (m.row_id = '4521' OR m.after_json LIKE '%"customer_id":4521%')
AND e.at BETWEEN '2026-03-01' AND '2026-04-01'
ORDER BY e.at;

"What did the row look like before this change?"

SELECT m.before_json
FROM audit_mutations m
WHERE m.event_id = ?;

"Verify the chain" (run by daily cron):

let prev = null;
for await (const row of db.iterate('SELECT * FROM audit_events ORDER BY id')) {
const expected = sha256(prev + canonicalJson({...row, chain_hash: undefined}));
if (row.chain_hash !== expected) throw new ChainBreakError(row.id, row.at);
prev = row.chain_hash;
}

Cost shape

A busy shop does ~2,000 mutations/day (tickets + sales + inventory + messages). Each mutation writes 2 audit rows (event + mutation). 4,000 rows/day, ~1KB each, ~4MB/day. A year of audit data is ~1.5GB. D1 handles this comfortably.

Monthly cron archives audit rows older than 90 days to R2 (cold storage), trimming D1 to the live working set. Archived audit JSON is still queryable via wrangler r2 object get; the operator app doesn't need it for live reads.

Chain-replay recovery (2026-05-19)

The "not a write-ahead log" bullet below is mostly true — but we proved on a real-world loss that the audit chain can serve as recovery for client-only flows that already mirror to it.

A storage-layer issue lost a window of Beta Comments rows from feedback_items + feedback_clusters (migration 017). Because every Beta Comments mutation mirrors to the audit chain via POST /api/audit/manual (see Client-only flows in the audit helper section above), the data wasn't actually gone.

scripts/restore_feedback_from_audit.py ran end-to-end:

  1. SELECT * FROM audit_events WHERE event_type LIKE 'feedback.%' ORDER BY id
  2. For each event in chronological order, decode the after_json snapshot
  3. Reconstruct the feedback_items row + cluster grouping
  4. INSERT OR IGNORE back into the live tables, with restored_from_audit=1 flag set

Result: 49 items / 34 clusters recovered with full fidelity — text, author, timestamps, cluster groupings all intact. The flag lets downstream surfaces tag restored rows so the human knows their lineage.

Why this works (and when it works):

The audit chain captures before_json + after_json per mutation. For any mutation where the after_json snapshot captures the full row state (not just the diff), you can reconstruct the row from the most recent event referencing it. For mutations that only capture the change (e.g. "status moved from open to deferred"), you'd need the original creation event plus all subsequent diffs replayed in order — which the chain provides, but is more work to write the replayer.

Generalising the pattern:

For any client-only flow already mirroring to POST /api/audit/manual, a small per-flow Python script can replay the chain into the primary tables. The recipe is the same: filter events by event_type LIKE '<flow>.%', decode the snapshots, INSERT OR IGNORE back. The audit chain becomes both the forensic record AND the recovery surface, at no extra cost beyond what we already write.

For server-side mutations (where the primary tables ARE the source of truth and backups handle disaster recovery), prefer point-in-time backups over chain replay. Chain replay is for cases where the primary tables briefly disagreed with reality — bugs, misconfigurations, accidental DELETEs — not for ground-up disaster recovery.

What this principle is not

  • Not the primary recovery mechanism for server-side state. The primary tables are the source of truth; backups (D1 export → R2) are the disaster-recovery answer. The chain can serve as recovery for client-only flows already mirroring to it (see Chain-replay recovery above) — but for the audit-only-from-server side, audit is read-side reconstruction, not write-side recovery.
  • Not a backup. Audit is "what happened"; backup is "how to restore." They overlap (a daily snapshot of D1 + the audit since plays the same role) but they're separate concerns.
  • Not GDPR-incompatible. Erasure requests scrub PII fields in audit_mutations.after_json and audit_mutations.before_json for that customer — see data ownership. The audit ID stays; the PII goes.

See also