AI-assisted development
Claude is a load-bearing collaborator on Helm. Claude Code drives most code-writing sessions. Claude API powers the in-app Support bubble. The bible itself was drafted with AI assistance. This page is about how we use AI on the build side without it eating the practice of engineering.
The shortest version
Use AI for what AI is good at — recall, transformation, pattern-matching, draft generation, scaffolding. Keep humans in the loop for what humans are good at — judgment about user needs, taste in code structure, end-user empathy, ground truth about reality. Audit the AI's output as if it were an enthusiastic but inexperienced contractor: usually right, sometimes confidently wrong, never trusted blindly.
What Claude does well in this codebase
- Greenfield endpoint scaffolding: "given this existing pattern in
src/index.js, write the endpoint forPOST /api/foo." Output is usually 90% correct on first pass; review fixes the rest. - Schema migrations: "I want a new table
service_categories; write the migration file in our convention." Output matches our style consistently. - SQL transformations: "convert this AIM SQL Server query to a SQLite-compatible D1 query." Catches the dialect differences.
- Documentation drafting: long-form prose like this bible. The model is faster than I am at the first draft; I edit for accuracy, voice, and tone.
- Test fixture generation: realistic-looking but fake customer/transaction data for unit tests.
- Refactor identification: "what duplicated patterns are in this file that we should extract?"
- Diagnostic review: "this stack trace + this code, what's the bug?" Faster than human pattern-matching for known classes of error.
What Claude does badly (and we work around)
- Inventing APIs that don't exist: Claude will sometimes generate code that calls a function that doesn't exist in the codebase. Verify before committing.
- Subtle race conditions: AI doesn't reason well about concurrent execution paths. The patterns we use (idempotency keys, optimistic locking) are the human review's job.
- End-user product decisions: "should we deprecate this feature" or "what's the right onboarding flow" requires real-world context Claude doesn't have. Use Claude to draft options; pick from them as a human.
- Cross-file refactors at scale: even with the codebase in context, Claude will miss usages or update them inconsistently. Use semantic codemod tools (or careful grep + manual edit) instead.
- Long-context retention of subtle invariants: in a 4000-line file, Claude might forget that "money values are always integer cents" and produce float arithmetic. Tests catch some; review catches more.
How sessions are structured
A typical Claude Code session looks like:
- Open the IDE, write a one-paragraph statement of what we're trying to accomplish
- Hand Claude the relevant files (autoincluded by the IDE or manually attached)
- Discuss the approach before writing — "before code, what changes need to happen and in what order?"
- Implement step-by-step, with Claude writing and the human reviewing each diff before it lands
- Run tests, iterate on failures
- Commit when green
The pre-implementation discussion catches more bugs than the post-implementation review. Forcing the model to think out loud about the approach surfaces wrong assumptions before they become wrong code.
Prompting style for this codebase
A few conventions that make Claude productive in Helm:
- Tell it the principle: "we follow progressive enhancement — make sure the base form works without JS." This anchors the output to the right invariants.
- Tell it the existing pattern: "look at how
apiCustomersListhandles this; mirror that style for the new endpoint." - Demand the SQL inline: "show me the SQL you'll generate before writing the JS." Catches schema misunderstandings early.
- Constrain output format: "respond with a diff, not a full file rewrite." Smaller diffs are easier to review.
- Insist on file paths and line numbers: when Claude proposes a fix, demand the exact location. Forces specificity.
What we don't delegate
Some tasks stay human, on principle:
- Choosing which slice to build next. Product judgment, not pattern matching.
- Naming things. Names live forever; humans are responsible for the words.
- Schema design. Models tend toward over-normalization; the schema lives close to the business and needs human judgment about what to denormalize.
- Security boundaries. Auth, authz, audit decisions get explicit human design and review. Models are good at writing the helpers, not deciding the policy.
- Customer conversations. When a shop owner asks "can Helm do X?", the answer is a human's responsibility. Claude can draft the response; the human signs the response.
How AI is used inside Helm itself
A separate question from "how is AI used to build Helm" — but they're adjacent. The in-app AI (slice 11, Support bubble) follows the same philosophy:
- Operator stays in control — every action Claude suggests is operator-confirmed before execution
- Grounding against actual data, not hallucination
- Per-customer opt-out
- Cost tracking and budget caps
- Audit-logged conversations
See AI integration for the in-product side.
The "is this really better than not using AI" question
Yes, but with caveats. A solo developer has finite hours. Claude Code lets a solo developer ship at the rough rate of a small team. The output quality bar is set by the human's review discipline, not the model's first-pass quality.
Risks to manage:
- Skill atrophy: if I never write a Worker handler from scratch, I lose the ability to debug one when AI is unavailable. Mitigation: occasional "no-AI" sessions for fundamentals.
- Codebase drift: AI tends toward whatever pattern is most popular online. If Helm has a custom convention, AI will revert to the popular one unless prompted otherwise. Mitigation: the conventions are documented (this bible) and referenced in prompts.
- Over-engineering: AI is bad at "no, that's enough." It tends to add hypothetical edge case handling and over-modular structure. Mitigation: the boring tech principle and explicit "keep it simple" prompts.
- False confidence: AI's output looks polished even when wrong. The pre-implementation discussion + small-diff reviews + tests catch most of this; the residual risk is real.
See also
- Slice development pattern — the human-in-the-loop workflow
- Code style — the conventions that constrain the output
- Documentation maintenance — how the bible itself is maintained
- AI integration — the in-product AI, separate concern