AI integration

The Claude API is a first-class component of Helm — not a chatbot bolted on, but a query layer that operators reach for when the UI doesn't have a button for what they need.

Drafted from planning · v0.1

The Claude adapter (src/lib/claude.js) and the AI Support bubble (slice 11) are specified but not yet wired. The schema for ai_conversations + ai_messages is in place.

The promise

An operator should be able to ask Helm any question whose answer exists in the system — the bible, the shop's D1 database, the operator's own conversation history — and get an answer in seconds. With citations. Without context-switching to a separate app.

Examples:

"What did this customer buy last spring?"
"Why isn't my deposit refunding for ticket T-N0042?"
"How do I onboard a second cash register?"
"Show me the receipts from yesterday with no customer attached."
"Translate this customer's note to French."

Components

The C4 — Component diagram shows the AI Support bubble talking to the Claude adapter. Zoomed in:

AI bubble UI (button bottom-right + conversation pane)
    │
    ▼
POST /api/ai/query
    │
    ├── Grounding builder (Worker)
    │     ├── Bible chunks (lexical search; vector when slice 11 ships)
    │     ├── D1 structured context (current customer, ticket, transaction)
    │     └── Recent conversation messages (rolling window)
    │
    ├── Claude adapter (Worker)
    │     ├── Prompt-cache the bible chunks (long, stable)
    │     ├── Stream response
    │     └── Tool-use loop (if Claude requests d1_query)
    │
    └── Persist (Worker → D1)
          ├── ai_conversations (one per AI bubble session)
          └── ai_messages (one row per message, role: user|assistant|tool)

RAG against the bible

The bible itself is corpus material — markdown files in this Docusaurus site, all canonical truth about Helm. The build pipeline produces a JSON index of every doc chunked by heading. The Worker ships this index as a Cloudflare KV value (~1MB compressed).

On query:

Lexical search (BM25 across heading + body) returns top-20 chunks
Each chunk has its file path + heading + body
Claude system prompt includes the top-20 with citations

Vector search is the planned upgrade (slice 11) — semantic relevance beats lexical for "translate this to French"-style queries. Until then, lexical works for the operator-question shape.

The system prompt is prompt-cached at the Claude API: bible content rarely changes, so the cache hit rate is ~95%. That's a ~70% cost reduction on RAG-heavy queries.

Structured grounding against shop D1

Lexical search against the bible is one prong. The other is structured: when the operator is on a customer's screen and asks a question, the Worker pre-fetches that customer's record + recent transactions + open tickets and includes them in the user message. This is RAG-style grounding against the structured operational data.

// Pseudo
const grounding = {
  current_screen: 'customer',
  customer: await fetchCustomer(customer_id),
  recent_transactions: await fetchRecentTxns(customer_id, 10),
  open_tickets: await fetchOpenTickets(customer_id),
  bible_chunks: await searchBible(question, 20),
};

This makes Claude's job to reason over the data, not find the data. Hugely improves answer quality and reduces hallucination risk.

Tool use

For questions whose answer requires an arbitrary D1 query, Claude can request it:

{
  "name": "d1_query",
  "input": { "sql": "SELECT * FROM transactions WHERE staff_id = ? AND at > ?", "params": [3, "2026-04-01"] }
}

The Worker validates the SQL (no DDL, no DML, parameterized only, capped at 1000 rows) before executing. If validation fails, the tool returns an error and Claude reformulates.

Tool-use loops are bounded to 5 iterations per query to prevent runaway loops.

"Ask the Hub" — natural-language reporting (mockup live 2026-05-18)

A second AI surface, distinct from the Support bubble: an ad-hoc reporting card where the operator types a question in plain English ("how many bikes did we sell to first-time customers last month?") and gets a table or chart back.

The pattern (mockup card in public/index.html describes the eventual real flow):

Operator question
    ↓
LLM (Claude or GPT-4-class) + read-only D1 schema for this shop
    ↓
LLM writes SQL
    ↓
SQL runs against this shop's D1 (read-only, validated, parameterized)
    ↓
Results render as table or chart in the card

PII protection. The LLM never sees raw customer data — only aggregated or sampled rows when needed for context-building. The schema itself is included in the prompt; the data is not.

Audit. Every Ask-the-Hub query writes an audit row: question text, generated SQL, row count, staff, timestamp. The owner can review what was asked, what got run, and what was returned at any point.

Cost. ~$0.003 per question for the typical Hub-sized schema on Claude. ~3 questions/day × 30 days = ~$0.30/month per shop. Negligible.

Status. The card is a mockup today — UI is in place, the LLM-to-SQL flow is not yet wired. The eventual implementation reuses the Support bubble's Claude adapter + tool-use loop (above), plus the read-only-schema-as-context prompt construction.

Why two surfaces, not one?

	Support bubble	Ask the Hub
Question shape	"Why did this customer's deposit not refund?" — operational	"How many of X did we sell last month?" — analytical
Grounding	Bible + customer record + ticket context	Schema only; LLM constructs the query
Output	Prose answer with citations	Table or chart
When used	Mid-operation, on a specific record	When wondering about shop-wide patterns

They share the Claude adapter + audit-log + cost budget; they differ in prompt construction and output rendering.

Not AI, but adjacent — clicking the brand-mark SVG (top-left of every operator screen) opens a centered modal with the canonical "Does it do this?" feature list: every capability of the Hub, single line per item, grouped by area (Sign in & access, Sales, Customers, Service, Inventory, Reports, Audit, Integrations, AI, Beta feedback, Offline behaviour, Migration, Platform). A live filter input lets you type a keyword (e.g. "tax," "GBP," "audit") to narrow.

The list is maintained verbatim in public/index.html under SECTIONS = [...] so the source-of-truth for "what does Helm do" lives in the same file as the operator UI it describes. Customer-facing sales conversations cite it directly.

Conversation memory

Each AI bubble session is an ai_conversations row. Each turn is an ai_messages row:

role: 'user' | 'assistant' | 'tool'
content_json: the message body (text + tool calls + tool results)
tokens_in / tokens_out / cost_cents
at

The conversation is scoped to the operator's session. On the next sign-in, the operator sees their recent conversations in the bubble dropdown.

Per-customer AI opt-out

Each customers row has ai_optout (default 0). When set to 1:

Structured grounding payloads skip that customer's identifying fields (name, email, phone, address)
Conversations that reference that customer are logged with customer_id = NULL
Bible queries that don't touch customer data are unaffected

The opt-out is exposed in the customer card via a small toggle pill, written to customers.ai_optout, audited via audit_events. See data ownership.

Budget controls

Per-shop monthly cap (default $25/month, configurable in shop_config.ai_budget_cents). When the running total hits the cap:

New queries return a polite "AI is paused for this billing cycle" message
Owner gets an email (and a UI banner) once on cap-hit

Tracking: every API response writes the input/output tokens + computed cost to ai_messages.cost_cents. A nightly cron sums + writes to shop_config.ai_spend_cents_mtd.

Latency budget

Operator-facing latency budget for an AI answer: 2 seconds for first token, 8 seconds for completion. Mostly Claude's TTFT, but the grounding step has to be fast:

Lexical bible search: target < 50ms
D1 grounding queries: target < 200ms
Claude TTFT: ~500-1500ms (Anthropic's number, varies)

If grounding takes > 500ms total, Claude doesn't get the data and quality drops. The grounding step is parallelized.

What this is not

Not a customer chatbot. The bubble is for operators only.
Not autonomous. Every action that touches data (creating tickets, refunding, etc.) is suggested in the chat and confirmed by the operator before the Worker executes.
Not a code generator inside Helm. Code-generation tooling is Claude Code on the dev side; the in-app bubble is a query-and-act assistant.
Not third-party LLMs. Anthropic only. See ADR-0009: Anthropic over OpenAI.

Interpretation Engine (dormant)

The first place a real Claude call landed in the codebase is not the Support bubble — it's the Process Library's Interpretation Engine (migration 081, v0.6.253). The intent: feed Claude a process's "Store action" + "Software process" narratives and have it judge whether the code already covers the workflow (verdict ∈ programmed / partial / needs-code / uncertain), caching every call in process_interpretations.

The schema and POST /api/processes/:id/interpret shipped, and the handler contains a genuine fetch to the Anthropic Messages API. But in v0.6.254 the Anthropic-facing UI was removed, and the endpoint now returns 503 unless ANTHROPIC_API_KEY is set — and it deliberately is not. So no AI call fires in production today. The live path is manual: 📋 Copy-for-review (clipboard prompt → paste into Claude Code) and 📤 Export-prompt (a deterministic codegen template, no model call).

This is consistent with the rest of this page being "specified, not yet wired": the Interpretation Engine is wiring that exists but is gated off. Re-enabling is a ~3-line edit (re-add the buttons) plus setting the secret. When the AI budget + adapter described above land, this endpoint is the natural first consumer.

The promise​

Components​

RAG against the bible​

Structured grounding against shop D1​

Tool use​

"Ask the Hub" — natural-language reporting (mockup live 2026-05-18)​

"Does it do this?" feature-list modal (live 2026-05-18)​

Conversation memory​

Per-customer AI opt-out​

Budget controls​

Latency budget​

What this is not​

Interpretation Engine (dormant)​

See also​