ADR-0009 — Anthropic Claude over OpenAI

Status: Accepted
Date: 2026-04-25
Decision-makers: Tom Anderson

Context

Slice 11 (AI Conversation Memory) needs an LLM behind the in-app Support bubble. The bubble does RAG against:

This bible (long, stable corpus)
The shop's D1 (live operational data)
Operator conversation history (short, dynamic)

We need:

Long-context (the bible is large)
Prompt caching (the bible is stable; we want to cache the system prompt)
Tool-use (for arbitrary D1 queries that ground answers)
Streaming (operators want fast first-token latency)
Reasonable safety alignment (we're processing customer PII in some queries)
Predictable pricing

Alternatives:

Anthropic Claude (Opus, Sonnet, Haiku) — strong long-context, native prompt caching, mature tool-use, good safety alignment, predictable per-token pricing
OpenAI (GPT-4o, GPT-4.1) — strong but smaller context, prompt caching is newer and more limited, tool-use is mature, safety is fine but Anthropic's framing matches our positioning
Gemini Pro / Flash — long context but tooling story is less polished for RAG-heavy use cases
Self-hosted (Llama, Mistral) — viable but requires infra Kvick can't maintain solo

We've also used Claude Code extensively to build Helm itself; the team familiarity reduces ramp-up.

Decision

Use Anthropic Claude as the LLM for the AI Support bubble. Specifically:

Claude Sonnet 4.x as the default model
Claude Haiku for lightweight queries (lookup, classification)
Claude Opus for complex RAG-heavy queries when the user explicitly opts in
Prompt caching on the system prompt (the bible chunks)
Tool-use for D1 queries

API access via the standard https://api.anthropic.com/v1/messages endpoint. One Kvick-owned API key; per-shop budget caps tracked in shop_config.ai_budget_cents.

Consequences

Positive:

Long context fits the entire relevant subset of the bible per request
Prompt caching produces 70-90% cost reduction on stable-system-prompt queries
Tool-use unblocks "answer questions that require arbitrary structured queries"
Anthropic's stance on AI training (enterprise API does not train on our data) aligns with our data ownership commitment
Familiarity from Claude Code reduces engineering ramp

Negative:

Vendor concentration risk: if Anthropic significantly changes pricing or terms, we're exposed
The Claude adapter is Anthropic-specific; swapping to OpenAI later is some work (different API shape, different tool-use protocol)

Mitigations:

The Claude adapter (src/lib/claude.js) is the one isolated swap point
Costs are bounded by per-shop budget cap; a price increase from Anthropic translates to a smaller budget per shop, not unbounded spend
We keep the option of multi-vendor LLM if a credible reason emerges

Notes

Most LLM vendor decisions are reversible within weeks of focused work. This isn't load-bearing infrastructure in the sense that D1 is. We pick Claude on current fit and revisit if the fit changes.

Context​

Decision​

Consequences​

Notes​

See also​

Context

Decision

Consequences

Notes

See also