ADR-0016 — Python ETL for AIM migration
- Status: Accepted
- Date: 2026-04-20
- Decision-makers: Tom Anderson
Context
The Swicked migration needs to pull 113K+ rows from an AIM v12 SQL Server (legacy, on-prem) and land them in Helm D1 with the right shapes. This is a one-off-per-shop operation that runs:
- Many times during dry-run (each schema fix re-runs the migration)
- Once during cutover
- Optionally for future shops that migrate from AIM
The Worker is JavaScript. The website is JavaScript. The natural question: should the ETL also be JavaScript?
Options:
- JavaScript (Node.js) — language consistency, but
tedious(SQL Server client) is slower than Python'spyodbcfor bulk reads - Python — strong data-engineering ecosystem (
pandas,pyodbc,sqlite3), faster for bulk ETL, dev is more familiar with Python for data work - A managed ETL tool (Airbyte, Stitch) — too heavyweight; the AIM schema is bespoke enough to not fit their connectors
The ETL is genuinely a different category of work from the Worker. It runs on the dev's laptop or a one-off VM, not at the edge. It handles bulk data, not requests. It's I/O-bound to a slow legacy DB.
Decision
Use Python for the AIM migration scripts:
migrate_aim.py— the main ETL, with--slice Nflags to migrate per-sliceextract_migrations.py,verify_schema.py— helpers for working with the schema spec doc- Standard library +
pyodbcfor SQL Server reads sqlite3for D1 local target (D1 dev DBs are SQLite files)- The output is
.sqlitefiles;wrangler d1 importloads them into D1
The ETL lives in migrations/ (alongside the SQL migration files) and is run with python migrate_aim.py --all. Not part of the Worker bundle; never executed in production runtime.
Consequences
Positive:
- The right tool for the job; bulk ETL is Python's strength
pyodbcworks reliably against SQL Server; the JS alternatives have rougher edges- The ETL is straightforward and readable — one function per slice
- A future shop migration (different shop, different source system) can swap the extract step but keep the load step
Negative:
- Two languages in the repo (Python + JavaScript)
- The Helm core developer must be comfortable enough with Python
- CI doesn't run the ETL automatically (it requires a SQL Server endpoint)
Mitigations:
- Python is widely-known; this isn't a barrier
- The ETL has its own README and runbook (migration from AIM)
- The ETL output (the
.sqlitefile) is the artifact; the script is the reproducer
Notes
If a future shop migrates from a different system (e.g., Lightspeed), the extract step changes; the load step (writing to D1) is reusable. The Python codebase is small and scoped enough that this isn't burdensome.
See also
- Migration from AIM
- Tech stack summary
- Current state — current migration status