ADR-0009 — Anthropic Claude over OpenAI
- Status: Accepted
- Date: 2026-04-25
- Decision-makers: Tom Anderson
Context
Slice 11 (AI Conversation Memory) needs an LLM behind the in-app Support bubble. The bubble does RAG against:
- This bible (long, stable corpus)
- The shop's D1 (live operational data)
- Operator conversation history (short, dynamic)
We need:
- Long-context (the bible is large)
- Prompt caching (the bible is stable; we want to cache the system prompt)
- Tool-use (for arbitrary D1 queries that ground answers)
- Streaming (operators want fast first-token latency)
- Reasonable safety alignment (we're processing customer PII in some queries)
- Predictable pricing
Alternatives:
- Anthropic Claude (Opus, Sonnet, Haiku) — strong long-context, native prompt caching, mature tool-use, good safety alignment, predictable per-token pricing
- OpenAI (GPT-4o, GPT-4.1) — strong but smaller context, prompt caching is newer and more limited, tool-use is mature, safety is fine but Anthropic's framing matches our positioning
- Gemini Pro / Flash — long context but tooling story is less polished for RAG-heavy use cases
- Self-hosted (Llama, Mistral) — viable but requires infra Kvick can't maintain solo
We've also used Claude Code extensively to build Helm itself; the team familiarity reduces ramp-up.
Decision
Use Anthropic Claude as the LLM for the AI Support bubble. Specifically:
- Claude Sonnet 4.x as the default model
- Claude Haiku for lightweight queries (lookup, classification)
- Claude Opus for complex RAG-heavy queries when the user explicitly opts in
- Prompt caching on the system prompt (the bible chunks)
- Tool-use for D1 queries
API access via the standard https://api.anthropic.com/v1/messages endpoint. One Kvick-owned API key; per-shop budget caps tracked in shop_config.ai_budget_cents.
Consequences
Positive:
- Long context fits the entire relevant subset of the bible per request
- Prompt caching produces 70-90% cost reduction on stable-system-prompt queries
- Tool-use unblocks "answer questions that require arbitrary structured queries"
- Anthropic's stance on AI training (enterprise API does not train on our data) aligns with our data ownership commitment
- Familiarity from Claude Code reduces engineering ramp
Negative:
- Vendor concentration risk: if Anthropic significantly changes pricing or terms, we're exposed
- The Claude adapter is Anthropic-specific; swapping to OpenAI later is some work (different API shape, different tool-use protocol)
Mitigations:
- The Claude adapter (
src/lib/claude.js) is the one isolated swap point - Costs are bounded by per-shop budget cap; a price increase from Anthropic translates to a smaller budget per shop, not unbounded spend
- We keep the option of multi-vendor LLM if a credible reason emerges
Notes
Most LLM vendor decisions are reversible within weeks of focused work. This isn't load-bearing infrastructure in the sense that D1 is. We pick Claude on current fit and revisit if the fit changes.