AI-assisted development

Claude is a load-bearing collaborator on Helm. Claude Code drives most code-writing sessions. Claude API powers the in-app Support bubble. The bible itself was drafted with AI assistance. This page is about how we use AI on the build side without it eating the practice of engineering.

Drafted from planning · v0.1

The shortest version

Use AI for what AI is good at — recall, transformation, pattern-matching, draft generation, scaffolding. Keep humans in the loop for what humans are good at — judgment about user needs, taste in code structure, end-user empathy, ground truth about reality. Audit the AI's output as if it were an enthusiastic but inexperienced contractor: usually right, sometimes confidently wrong, never trusted blindly.

What Claude does well in this codebase

Greenfield endpoint scaffolding: "given this existing pattern in src/index.js, write the endpoint for POST /api/foo." Output is usually 90% correct on first pass; review fixes the rest.
Schema migrations: "I want a new table service_categories; write the migration file in our convention." Output matches our style consistently.
SQL transformations: "convert this AIM SQL Server query to a SQLite-compatible D1 query." Catches the dialect differences.
Documentation drafting: long-form prose like this bible. The model is faster than I am at the first draft; I edit for accuracy, voice, and tone.
Test fixture generation: realistic-looking but fake customer/transaction data for unit tests.
Refactor identification: "what duplicated patterns are in this file that we should extract?"
Diagnostic review: "this stack trace + this code, what's the bug?" Faster than human pattern-matching for known classes of error.

What Claude does badly (and we work around)

Inventing APIs that don't exist: Claude will sometimes generate code that calls a function that doesn't exist in the codebase. Verify before committing.
Subtle race conditions: AI doesn't reason well about concurrent execution paths. The patterns we use (idempotency keys, optimistic locking) are the human review's job.
End-user product decisions: "should we deprecate this feature" or "what's the right onboarding flow" requires real-world context Claude doesn't have. Use Claude to draft options; pick from them as a human.
Cross-file refactors at scale: even with the codebase in context, Claude will miss usages or update them inconsistently. Use semantic codemod tools (or careful grep + manual edit) instead.
Long-context retention of subtle invariants: in a 4000-line file, Claude might forget that "money values are always integer cents" and produce float arithmetic. Tests catch some; review catches more.

How sessions are structured

A typical Claude Code session looks like:

Open the IDE, write a one-paragraph statement of what we're trying to accomplish
Hand Claude the relevant files (autoincluded by the IDE or manually attached)
Discuss the approach before writing — "before code, what changes need to happen and in what order?"
Implement step-by-step, with Claude writing and the human reviewing each diff before it lands
Run tests, iterate on failures
Commit when green

The pre-implementation discussion catches more bugs than the post-implementation review. Forcing the model to think out loud about the approach surfaces wrong assumptions before they become wrong code.

Prompting style for this codebase

A few conventions that make Claude productive in Helm:

Tell it the principle: "we follow progressive enhancement — make sure the base form works without JS." This anchors the output to the right invariants.
Tell it the existing pattern: "look at how apiCustomersList handles this; mirror that style for the new endpoint."
Demand the SQL inline: "show me the SQL you'll generate before writing the JS." Catches schema misunderstandings early.
Constrain output format: "respond with a diff, not a full file rewrite." Smaller diffs are easier to review.
Insist on file paths and line numbers: when Claude proposes a fix, demand the exact location. Forces specificity.

What we don't delegate

Some tasks stay human, on principle:

Choosing which slice to build next. Product judgment, not pattern matching.
Naming things. Names live forever; humans are responsible for the words.
Schema design. Models tend toward over-normalization; the schema lives close to the business and needs human judgment about what to denormalize.
Security boundaries. Auth, authz, audit decisions get explicit human design and review. Models are good at writing the helpers, not deciding the policy.
Customer conversations. When a shop owner asks "can Helm do X?", the answer is a human's responsibility. Claude can draft the response; the human signs the response.

How AI is used inside Helm itself

A separate question from "how is AI used to build Helm" — but they're adjacent. The in-app AI (slice 11, Support bubble) follows the same philosophy:

Operator stays in control — every action Claude suggests is operator-confirmed before execution
Grounding against actual data, not hallucination
Per-customer opt-out
Cost tracking and budget caps
Audit-logged conversations

See AI integration for the in-product side.

The "is this really better than not using AI" question

Yes, but with caveats. A solo developer has finite hours. Claude Code lets a solo developer ship at the rough rate of a small team. The output quality bar is set by the human's review discipline, not the model's first-pass quality.

Risks to manage:

Skill atrophy: if I never write a Worker handler from scratch, I lose the ability to debug one when AI is unavailable. Mitigation: occasional "no-AI" sessions for fundamentals.
Codebase drift: AI tends toward whatever pattern is most popular online. If Helm has a custom convention, AI will revert to the popular one unless prompted otherwise. Mitigation: the conventions are documented (this bible) and referenced in prompts.
Over-engineering: AI is bad at "no, that's enough." It tends to add hypothetical edge case handling and over-modular structure. Mitigation: the boring tech principle and explicit "keep it simple" prompts.
False confidence: AI's output looks polished even when wrong. The pre-implementation discussion + small-diff reviews + tests catch most of this; the residual risk is real.

The shortest version​

What Claude does well in this codebase​

What Claude does badly (and we work around)​

How sessions are structured​

Prompting style for this codebase​

What we don't delegate​

How AI is used inside Helm itself​

The "is this really better than not using AI" question​

See also​