Skip to main content

ADR-0016 — Python ETL for AIM migration

  • Status: Accepted
  • Date: 2026-04-20
  • Decision-makers: Tom Anderson

Context

The Swicked migration needs to pull 113K+ rows from an AIM v12 SQL Server (legacy, on-prem) and land them in Helm D1 with the right shapes. This is a one-off-per-shop operation that runs:

  • Many times during dry-run (each schema fix re-runs the migration)
  • Once during cutover
  • Optionally for future shops that migrate from AIM

The Worker is JavaScript. The website is JavaScript. The natural question: should the ETL also be JavaScript?

Options:

  • JavaScript (Node.js) — language consistency, but tedious (SQL Server client) is slower than Python's pyodbc for bulk reads
  • Python — strong data-engineering ecosystem (pandas, pyodbc, sqlite3), faster for bulk ETL, dev is more familiar with Python for data work
  • A managed ETL tool (Airbyte, Stitch) — too heavyweight; the AIM schema is bespoke enough to not fit their connectors

The ETL is genuinely a different category of work from the Worker. It runs on the dev's laptop or a one-off VM, not at the edge. It handles bulk data, not requests. It's I/O-bound to a slow legacy DB.

Decision

Use Python for the AIM migration scripts:

  • migrate_aim.py — the main ETL, with --slice N flags to migrate per-slice
  • extract_migrations.py, verify_schema.py — helpers for working with the schema spec doc
  • Standard library + pyodbc for SQL Server reads
  • sqlite3 for D1 local target (D1 dev DBs are SQLite files)
  • The output is .sqlite files; wrangler d1 import loads them into D1

The ETL lives in migrations/ (alongside the SQL migration files) and is run with python migrate_aim.py --all. Not part of the Worker bundle; never executed in production runtime.

Consequences

Positive:

  • The right tool for the job; bulk ETL is Python's strength
  • pyodbc works reliably against SQL Server; the JS alternatives have rougher edges
  • The ETL is straightforward and readable — one function per slice
  • A future shop migration (different shop, different source system) can swap the extract step but keep the load step

Negative:

  • Two languages in the repo (Python + JavaScript)
  • The Helm core developer must be comfortable enough with Python
  • CI doesn't run the ETL automatically (it requires a SQL Server endpoint)

Mitigations:

  • Python is widely-known; this isn't a barrier
  • The ETL has its own README and runbook (migration from AIM)
  • The ETL output (the .sqlite file) is the artifact; the script is the reproducer

Notes

If a future shop migrates from a different system (e.g., Lightspeed), the extract step changes; the load step (writing to D1) is reusable. The Python codebase is small and scoped enough that this isn't burdensome.

See also