Search engine

A real text-search engine for Helm. SQLite FTS5 (built into D1) + BM25 ranking + per-domain field weights + a shared synonym layer. One Worker module (src/search/engine.js), four registered domains (inventory, customers, tickets, estimates). Portable verbatim to any future Kvick app.

Phases 1-3 live 2026-05-15; phases 4-6 planned

Phase 1 (inventory FTS), Phase 2 (customers + tickets + estimates), and Phase 3 (synonyms, owner-editable) shipped as migrations 020-024 with click telemetry (025). Phases 4 (trigram typo tolerance), 5 (recency / popularity / in-stock ranker), and 6 (highlight + did-you-mean + facets) are next. See Helm/NEXT_SESSION_PROMPT.md for the locked design decisions.

Why this exists

The original apiSearch (see global search palette) used plain LIKE queries against the source tables. Fine at low scale; brittle at speed. Three problems forced the upgrade:

No ranking. "Norco bike" returned 200 rows in arbitrary order; the right Norco bike was buried at row 47.
No tolerance. A typo (norko for Norco) returned nothing. A synonym ("bicycle" vs "bike") returned nothing.
No portability. The smart-query routing was inline in apiSearch; a future Kvick app would have to rewrite it.

The new engine fixes all three with a single module — register a domain config and the engine does the rest.

Module layout

src/search/
├── engine.js                  — orchestrator (~17 KB)
├── synonyms.js                — synonym-group loader + buildFtsMatch expander
└── domains/
    ├── inventory.js           — products + variants + manufacturers + barcodes
    ├── customers.js
    ├── tickets.js
    └── estimates.js

engine.js knows nothing about bikes. Each domain config declares:

name / label
ftsTable — the FTS5 virtual table name; its rowid MUST be the source-of-truth PK for hydration
ftsColumns — the FTS5 columns, in declaration order (BM25 weight order)
weights — per-column BM25 multipliers
hydrateQuery — the SQL to fetch display rows for a given set of rowids
renderForPalette — turns a hydrated row into the universal palette format

FTS5 tables (migrations 020-022)

Four virtual tables, one per domain:

Migration	Table	Sources	Columns
020	`search_inventory_fts`	`product_variants` + `products` + `manufacturers` + `product_barcodes`	sku, barcode, mpn, display_name, model_number, manufacturer, display_label
021	(extension)	(same)	+ category, subcategory
022	`search_customers_fts`	`customers`	account_number, display_name, primary_phone, primary_email
022	`search_tickets_fts`	`service_tickets` + customer/bike denormalized	ticket_number, customer_name, bike_serial, bike_model, issue
022	`search_estimates_fts`	`transactions` (kind='estimate')	estimate_number, customer_name, line_summary

All use porter unicode61 tokenizer (Porter stemming + Unicode case-fold).

Sync strategy: triggers on the source tables. Inventory data spans 4 tables — each source mutation recomputes the right FTS row. The bulk initial index of 9,337 variants runs in ~3s on Swicked's data.

Synonyms (migration 023)

Owner-editable equivalent-term groups, scoped per domain or global.

search_synonym_groups (
  id, domain TEXT,         -- 'inventory' | 'customers' | … | NULL (global)
  description TEXT,
  is_active INTEGER NOT NULL DEFAULT 1,
  created_at, updated_at
)

search_synonym_terms (
  id, group_id FK,
  term TEXT NOT NULL,      -- lowercased
  created_at,
  UNIQUE (group_id, term)
)

Migration 024 seeds color + abbreviation groups: red/crimson/ruby, mountain/mtb, bicycle/bike, etc.

Query expansion

engine.js → buildFtsMatch() lowercases each user token, looks it up in the synonym map, and if it's a member of a group renders the FTS5 match as (alt1* OR alt2* OR …). So bicycle red becomes roughly:

(bicycle* OR bike*) AND (red* OR crimson* OR ruby*)

Tokens not in any group pass through unchanged with * for prefix matching.

Admin endpoints

Endpoint	Purpose
`GET /api/search/synonyms?domain=…`	List groups (with their terms)
`POST /api/search/synonyms`	Create a new group + its terms
`PUT /api/search/synonyms/:id`	Replace description / is_active / terms
`DELETE /api/search/synonyms/:id`	Hard delete (CASCADE removes terms)

Surfaced in Settings → Search → Synonyms.

Click telemetry (migration 025)

Every time a user clicks a result in the palette or a per-screen search, the client POSTs to /api/search/clicks:

search_clicks (
  id, query_text, domain, target_table, target_id,
  position_in_results INTEGER,        -- 0-indexed
  device_session_id, staff_id,
  created_at
)

Feeds Phase 5 ranker tuning: queries → clicks tell us which results consistently float to the top, which informs popularity boosts.

Endpoint flow

GET /api/search?q=…&types=…&limit=…&hint=…
  → device auth check
  → for each requested type (default: customers, inventory, tickets):
      → engine.queryFTS(domain, query)
          → parseQuery(query)
          → loadSynonyms(domain) [cached per Worker instance]
          → buildFtsMatch(tokens, synonyms)
          → SELECT rowid FROM <ftsTable> WHERE <ftsTable> MATCH ? ORDER BY bm25(…) LIMIT ?
          → hydrateQuery(rowids)
          → renderForPalette(rows)
  → Promise.all merge
  → return { groups: [...] }

Sub-100ms target end-to-end; BM25 on FTS5 is fast, hydration is one indexed lookup per group.

What the four domains find

Domain	Examples
inventory	`"Specialized Stumpjumper"`, `"chain lube"`, a SKU, a barcode, `"Norco"` (all variants), `norko` (Phase 4: typo)
customers	`"Jane Doe"`, partial phone `"250-555"`, an account number `#1234`
tickets	`T-1234`, customer name, bike serial, issue keyword (`"brake squeal"`)
estimates	`E-####`, customer name, line summary

Customers domain rerank (v0.6.52)

A length-normalization bug surfaced in production: searching "Tom" returned 12 first-name matches but buried Tom's E-bike Rentals at position 13/16. Cause: FTS5 BM25 penalizes multi-token display_names — a long business name with the query token plus several others has lower BM25 than a 1- or 2-word name where the query token is a larger fraction of the document. Two-part fix in v0.6.52:

Engine change (src/search/engine.js). When a domain registers a rerank, widen the merge pool to limit * 3 before invoking it, then trim to limit after. Previously the engine took the top limit items by BM25 first and then reranked only those — so items BM25 had already buried couldn't be promoted. The pool widening lets the domain rerank reach back into items BM25 hid. The engine also now passes the parsed query ({terms: [...]}) into the rerank function so domains can apply token-aware heuristics.

Customers domain rerank (src/search/domains/customers.js). Three-tier match-quality scoring against display_name and the parsed first query term:

Tier	Match	Score added
1	First-word prefix (`display_name` starts with the query term)	+1000
2	Any-word prefix (any space-separated token in `display_name` starts with the query term)	+500
3	Substring (`display_name.includes(queryTerm)` anywhere)	+100

Score is REPLACED with the tier value rather than added on top of BM25 — so all first-word matches tie at 1000 and fall through to an alphabetical-by-display_name secondary sort. Result: a predictable order that no longer penalizes long business names.

Client cap raised. The Customers-tab search dropdown limit went from 12 → 25 in the same commit so common first names with 10+ matching customers still leave room for the business-name records. Cmd-K palette and other consumers are unaffected.

After: Tom returns 13 Tom-firstname customers (alphabetical) → then Tom's E-bike Rentals (position 13) → then Tommy Nyman → then substring matches (Dan Tomlinson, Kevin Thomson).

This is a one-off rerank specific to the customers domain. Inventory keeps pure BM25 (Phase 5 ranker will tune that separately when it lands). The pattern — rerank for match-quality tiering, alphabetical inside a tier, replace-not-add — generalizes to any domain whose primary field is a name string.

Phases 4-6 (next)

Phase 4 — trigram fallback. When FTS returns 0 rows, fall back to a 3-gram similarity match. norko → finds Norco because they share nor, orc/ork, rco/rko trigrams.
Phase 5 — custom ranker. BM25 + recency boost + popularity boost (from search_clicks) + in-stock bonus + ticket-customer bonus. Layered, weighted, tunable from a settings UI.
Phase 6 — UX polish. Match highlighting ({text, matches: [[start, end]…]} → client wraps in <mark>), "did you mean", per-domain facets.

Why this is portable

The engine intentionally knows nothing about bikes. Future Kvick apps (the tradespeople tier, a future vertical) register their own domain configs against the same engine and get the same behavior. Synonyms, click telemetry, and the ranker all generalize. The migration files that create the FTS tables don't generalize — those are per-schema — but every domain follows the same shape (FTS5 + triggers + source-table joins).

Why this exists​

Module layout​

FTS5 tables (migrations 020-022)​

Synonyms (migration 023)​

Query expansion​

Admin endpoints​

Click telemetry (migration 025)​

Endpoint flow​

What the four domains find​

Customers domain rerank (v0.6.52)​

Phases 4-6 (next)​

Why this is portable​

See also​