Search engine
A real text-search engine for Helm. SQLite FTS5 (built into D1) + BM25 ranking + per-domain field weights + a shared synonym layer. One Worker module (src/search/engine.js), four registered domains (inventory, customers, tickets, estimates). Portable verbatim to any future Kvick app.
Phase 1 (inventory FTS), Phase 2 (customers + tickets + estimates), and Phase 3 (synonyms, owner-editable) shipped as migrations 020-024 with click telemetry (025). Phases 4 (trigram typo tolerance), 5 (recency / popularity / in-stock ranker), and 6 (highlight + did-you-mean + facets) are next. See Helm/NEXT_SESSION_PROMPT.md for the locked design decisions.
Why this exists
The original apiSearch (see global search palette) used plain LIKE queries against the source tables. Fine at low scale; brittle at speed. Three problems forced the upgrade:
- No ranking. "Norco bike" returned 200 rows in arbitrary order; the right Norco bike was buried at row 47.
- No tolerance. A typo (
norkoforNorco) returned nothing. A synonym ("bicycle" vs "bike") returned nothing. - No portability. The smart-query routing was inline in
apiSearch; a future Kvick app would have to rewrite it.
The new engine fixes all three with a single module — register a domain config and the engine does the rest.
Module layout
src/search/
├── engine.js — orchestrator (~17 KB)
├── synonyms.js — synonym-group loader + buildFtsMatch expander
└── domains/
├── inventory.js — products + variants + manufacturers + barcodes
├── customers.js
├── tickets.js
└── estimates.js
engine.js knows nothing about bikes. Each domain config declares:
name/labelftsTable— the FTS5 virtual table name; itsrowidMUST be the source-of-truth PK for hydrationftsColumns— the FTS5 columns, in declaration order (BM25 weight order)weights— per-column BM25 multipliershydrateQuery— the SQL to fetch display rows for a given set ofrowidsrenderForPalette— turns a hydrated row into the universal palette format
FTS5 tables (migrations 020-022)
Four virtual tables, one per domain:
| Migration | Table | Sources | Columns |
|---|---|---|---|
| 020 | search_inventory_fts | product_variants + products + manufacturers + product_barcodes | sku, barcode, mpn, display_name, model_number, manufacturer, display_label |
| 021 | (extension) | (same) | + category, subcategory |
| 022 | search_customers_fts | customers | account_number, display_name, primary_phone, primary_email |
| 022 | search_tickets_fts | service_tickets + customer/bike denormalized | ticket_number, customer_name, bike_serial, bike_model, issue |
| 022 | search_estimates_fts | transactions (kind='estimate') | estimate_number, customer_name, line_summary |
All use porter unicode61 tokenizer (Porter stemming + Unicode case-fold).
Sync strategy: triggers on the source tables. Inventory data spans 4 tables — each source mutation recomputes the right FTS row. The bulk initial index of 9,337 variants runs in ~3s on Swicked's data.
Synonyms (migration 023)
Owner-editable equivalent-term groups, scoped per domain or global.
search_synonym_groups (
id, domain TEXT, -- 'inventory' | 'customers' | … | NULL (global)
description TEXT,
is_active INTEGER NOT NULL DEFAULT 1,
created_at, updated_at
)
search_synonym_terms (
id, group_id FK,
term TEXT NOT NULL, -- lowercased
created_at,
UNIQUE (group_id, term)
)
Migration 024 seeds color + abbreviation groups: red/crimson/ruby, mountain/mtb, bicycle/bike, etc.
Query expansion
engine.js → buildFtsMatch() lowercases each user token, looks it up in the synonym map, and if it's a member of a group renders the FTS5 match as (alt1* OR alt2* OR …). So bicycle red becomes roughly:
(bicycle* OR bike*) AND (red* OR crimson* OR ruby*)
Tokens not in any group pass through unchanged with * for prefix matching.
Admin endpoints
| Endpoint | Purpose |
|---|---|
GET /api/search/synonyms?domain=… | List groups (with their terms) |
POST /api/search/synonyms | Create a new group + its terms |
PUT /api/search/synonyms/:id | Replace description / is_active / terms |
DELETE /api/search/synonyms/:id | Hard delete (CASCADE removes terms) |
Surfaced in Settings → Search → Synonyms.
Click telemetry (migration 025)
Every time a user clicks a result in the palette or a per-screen search, the client POSTs to /api/search/clicks:
search_clicks (
id, query_text, domain, target_table, target_id,
position_in_results INTEGER, -- 0-indexed
device_session_id, staff_id,
created_at
)
Feeds Phase 5 ranker tuning: queries → clicks tell us which results consistently float to the top, which informs popularity boosts.
Endpoint flow
GET /api/search?q=…&types=…&limit=…&hint=…
→ device auth check
→ for each requested type (default: customers, inventory, tickets):
→ engine.queryFTS(domain, query)
→ parseQuery(query)
→ loadSynonyms(domain) [cached per Worker instance]
→ buildFtsMatch(tokens, synonyms)
→ SELECT rowid FROM <ftsTable> WHERE <ftsTable> MATCH ? ORDER BY bm25(…) LIMIT ?
→ hydrateQuery(rowids)
→ renderForPalette(rows)
→ Promise.all merge
→ return { groups: [...] }
Sub-100ms target end-to-end; BM25 on FTS5 is fast, hydration is one indexed lookup per group.
What the four domains find
| Domain | Examples |
|---|---|
| inventory | "Specialized Stumpjumper", "chain lube", a SKU, a barcode, "Norco" (all variants), norko (Phase 4: typo) |
| customers | "Jane Doe", partial phone "250-555", an account number #1234 |
| tickets | T-1234, customer name, bike serial, issue keyword ("brake squeal") |
| estimates | E-####, customer name, line summary |
Customers domain rerank (v0.6.52)
A length-normalization bug surfaced in production: searching "Tom" returned 12 first-name matches but buried Tom's E-bike Rentals at position 13/16. Cause: FTS5 BM25 penalizes multi-token display_names — a long business name with the query token plus several others has lower BM25 than a 1- or 2-word name where the query token is a larger fraction of the document. Two-part fix in v0.6.52:
Engine change (src/search/engine.js). When a domain registers a rerank, widen the merge pool to limit * 3 before invoking it, then trim to limit after. Previously the engine took the top limit items by BM25 first and then reranked only those — so items BM25 had already buried couldn't be promoted. The pool widening lets the domain rerank reach back into items BM25 hid. The engine also now passes the parsed query ({terms: [...]}) into the rerank function so domains can apply token-aware heuristics.
Customers domain rerank (src/search/domains/customers.js). Three-tier match-quality scoring against display_name and the parsed first query term:
| Tier | Match | Score added |
|---|---|---|
| 1 | First-word prefix (display_name starts with the query term) | +1000 |
| 2 | Any-word prefix (any space-separated token in display_name starts with the query term) | +500 |
| 3 | Substring (display_name.includes(queryTerm) anywhere) | +100 |
Score is REPLACED with the tier value rather than added on top of BM25 — so all first-word matches tie at 1000 and fall through to an alphabetical-by-display_name secondary sort. Result: a predictable order that no longer penalizes long business names.
Client cap raised. The Customers-tab search dropdown limit went from 12 → 25 in the same commit so common first names with 10+ matching customers still leave room for the business-name records. Cmd-K palette and other consumers are unaffected.
After: Tom returns 13 Tom-firstname customers (alphabetical) → then Tom's E-bike Rentals (position 13) → then Tommy Nyman → then substring matches (Dan Tomlinson, Kevin Thomson).
This is a one-off rerank specific to the customers domain. Inventory keeps pure BM25 (Phase 5 ranker will tune that separately when it lands). The pattern — rerank for match-quality tiering, alphabetical inside a tier, replace-not-add — generalizes to any domain whose primary field is a name string.
Phases 4-6 (next)
- Phase 4 — trigram fallback. When FTS returns 0 rows, fall back to a 3-gram similarity match.
norko→ findsNorcobecause they sharenor,orc/ork,rco/rkotrigrams. - Phase 5 — custom ranker. BM25 + recency boost + popularity boost (from
search_clicks) + in-stock bonus + ticket-customer bonus. Layered, weighted, tunable from a settings UI. - Phase 6 — UX polish. Match highlighting (
{text, matches: [[start, end]…]}→ client wraps in<mark>), "did you mean", per-domain facets.
Why this is portable
The engine intentionally knows nothing about bikes. Future Kvick apps (the tradespeople tier, a future vertical) register their own domain configs against the same engine and get the same behavior. Synonyms, click telemetry, and the ranker all generalize. The migration files that create the FTS tables don't generalize — those are per-schema — but every domain follows the same shape (FTS5 + triggers + source-table joins).
See also
- Global search palette — the Cmd-K UI that calls
/api/search Helm/NEXT_SESSION_PROMPT.md— locked design decisions, phase deliverables- Slice 3 — Inventory & Tax — the inventory domain's source tables
- Slice 4 — Service Tickets — the tickets domain's source tables
- Slice 5 — Transactions & Payments — the estimates domain (estimates are
transactionsrows)