Skip to main content

ADR-0009 — Anthropic Claude over OpenAI

  • Status: Accepted
  • Date: 2026-04-25
  • Decision-makers: Tom Anderson

Context

Slice 11 (AI Conversation Memory) needs an LLM behind the in-app Support bubble. The bubble does RAG against:

  • This bible (long, stable corpus)
  • The shop's D1 (live operational data)
  • Operator conversation history (short, dynamic)

We need:

  • Long-context (the bible is large)
  • Prompt caching (the bible is stable; we want to cache the system prompt)
  • Tool-use (for arbitrary D1 queries that ground answers)
  • Streaming (operators want fast first-token latency)
  • Reasonable safety alignment (we're processing customer PII in some queries)
  • Predictable pricing

Alternatives:

  • Anthropic Claude (Opus, Sonnet, Haiku) — strong long-context, native prompt caching, mature tool-use, good safety alignment, predictable per-token pricing
  • OpenAI (GPT-4o, GPT-4.1) — strong but smaller context, prompt caching is newer and more limited, tool-use is mature, safety is fine but Anthropic's framing matches our positioning
  • Gemini Pro / Flash — long context but tooling story is less polished for RAG-heavy use cases
  • Self-hosted (Llama, Mistral) — viable but requires infra Kvick can't maintain solo

We've also used Claude Code extensively to build Helm itself; the team familiarity reduces ramp-up.

Decision

Use Anthropic Claude as the LLM for the AI Support bubble. Specifically:

  • Claude Sonnet 4.x as the default model
  • Claude Haiku for lightweight queries (lookup, classification)
  • Claude Opus for complex RAG-heavy queries when the user explicitly opts in
  • Prompt caching on the system prompt (the bible chunks)
  • Tool-use for D1 queries

API access via the standard https://api.anthropic.com/v1/messages endpoint. One Kvick-owned API key; per-shop budget caps tracked in shop_config.ai_budget_cents.

Consequences

Positive:

  • Long context fits the entire relevant subset of the bible per request
  • Prompt caching produces 70-90% cost reduction on stable-system-prompt queries
  • Tool-use unblocks "answer questions that require arbitrary structured queries"
  • Anthropic's stance on AI training (enterprise API does not train on our data) aligns with our data ownership commitment
  • Familiarity from Claude Code reduces engineering ramp

Negative:

  • Vendor concentration risk: if Anthropic significantly changes pricing or terms, we're exposed
  • The Claude adapter is Anthropic-specific; swapping to OpenAI later is some work (different API shape, different tool-use protocol)

Mitigations:

  • The Claude adapter (src/lib/claude.js) is the one isolated swap point
  • Costs are bounded by per-shop budget cap; a price increase from Anthropic translates to a smaller budget per shop, not unbounded spend
  • We keep the option of multi-vendor LLM if a credible reason emerges

Notes

Most LLM vendor decisions are reversible within weeks of focused work. This isn't load-bearing infrastructure in the sense that D1 is. We pick Claude on current fit and revisit if the fit changes.

See also