ADR-0016 — Python ETL for AIM migration

Status: Accepted
Date: 2026-04-20
Decision-makers: Tom Anderson

Context

The Swicked migration needs to pull 113K+ rows from an AIM v12 SQL Server (legacy, on-prem) and land them in Helm D1 with the right shapes. This is a one-off-per-shop operation that runs:

Many times during dry-run (each schema fix re-runs the migration)
Once during cutover
Optionally for future shops that migrate from AIM

The Worker is JavaScript. The website is JavaScript. The natural question: should the ETL also be JavaScript?

Options:

JavaScript (Node.js) — language consistency, but tedious (SQL Server client) is slower than Python's pyodbc for bulk reads
Python — strong data-engineering ecosystem (pandas, pyodbc, sqlite3), faster for bulk ETL, dev is more familiar with Python for data work
A managed ETL tool (Airbyte, Stitch) — too heavyweight; the AIM schema is bespoke enough to not fit their connectors

The ETL is genuinely a different category of work from the Worker. It runs on the dev's laptop or a one-off VM, not at the edge. It handles bulk data, not requests. It's I/O-bound to a slow legacy DB.

Decision

Use Python for the AIM migration scripts:

migrate_aim.py — the main ETL, with --slice N flags to migrate per-slice
extract_migrations.py, verify_schema.py — helpers for working with the schema spec doc
Standard library + pyodbc for SQL Server reads
sqlite3 for D1 local target (D1 dev DBs are SQLite files)
The output is .sqlite files; wrangler d1 import loads them into D1

The ETL lives in migrations/ (alongside the SQL migration files) and is run with python migrate_aim.py --all. Not part of the Worker bundle; never executed in production runtime.

Consequences

Positive:

The right tool for the job; bulk ETL is Python's strength
pyodbc works reliably against SQL Server; the JS alternatives have rougher edges
The ETL is straightforward and readable — one function per slice
A future shop migration (different shop, different source system) can swap the extract step but keep the load step

Negative:

Two languages in the repo (Python + JavaScript)
The Helm core developer must be comfortable enough with Python
CI doesn't run the ETL automatically (it requires a SQL Server endpoint)

Mitigations:

Python is widely-known; this isn't a barrier
The ETL has its own README and runbook (migration from AIM)
The ETL output (the .sqlite file) is the artifact; the script is the reproducer

Notes

If a future shop migrates from a different system (e.g., Lightspeed), the extract step changes; the load step (writing to D1) is reusable. The Python codebase is small and scoped enough that this isn't burdensome.

Context​

Decision​

Consequences​

Notes​

See also​

Context

Decision

Consequences

Notes

See also