Database Migration: Diff's MySQL → New Managed MySQL 8.0¶
How we move OMS's data from the existing Diff-managed database onto the new managed MySQL 8.0 (Azure Database for MySQL or AWS RDS).
Pre-work — things we need to know before scoping this¶
We can't pick a strategy without these data points. Get them first:
| Data point | How to get it | Where to record |
|---|---|---|
| Current MySQL version | SSH a prod server, mysql --version or SELECT VERSION(); |
Likely 5.7 (matches dev) — confirm |
| DB size on disk | On the DB server: du -sh /var/lib/mysql/<dbname>/ or AWS RDS console "Storage" |
This is the single biggest input |
| Largest tables | SELECT TABLE_NAME, ROUND((DATA_LENGTH+INDEX_LENGTH)/1024/1024/1024,2) AS gb FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='<db>' ORDER BY gb DESC LIMIT 10; |
Drives migration time |
| Daily write rate (rough) | Datadog: MySQL writes/sec average | Affects "can we accept downtime?" |
| Any stored procedures / triggers / views | SHOW PROCEDURE STATUS WHERE Db='<db>', SHOW TRIGGERS, SHOW FULL TABLES WHERE Table_type='VIEW' |
Likely none in a Rails app — confirm |
| Where the current DB actually lives | RDS instance? Self-hosted on one of the EC2s? | Check config/deploy/production.rb — if no db: role server, it's external |
The version jump: 5.7 → 8.0¶
MySQL 5.7 reached EOL in October 2023. The new managed DB will be 8.0. This is not a free version bump — there are real compat concerns:
- Reserved word changes. 8.0 added
RANK,ROW_NUMBER,WINDOW, etc. as reserved words. If any column/table name uses these, queries break. Easy to grep for. utf8→utf8mb4defaults. Our config/database.yml already usesutf8mb4so we're fine, but verify on the actual prod DB.- Default authentication plugin changed (
mysql_native_password→caching_sha2_password). Make sure the mysql2 gem version supports the new default. Ourmysql2 >= 0.4.4should be fine; we runmysql2 0.5.xper Gemfile.lock. sql_mode— 8.0 has stricter defaults. May reject queries that 5.7 silently coerced. Found by running tests.
Action: before any data move, run our full RSpec suite against a fresh MySQL 8.0 instance with the current schema loaded. Fix any failures. Until this passes, we're not migrating anything.
Strategy A — mysqldump + restore (simple, with downtime)¶
Use when: DB is small (< ~50 GB) and we can afford a maintenance window.
┌──────────────────────────────────────────────────────────────┐
│ Step 1 (T-1 day): dry run in non-prod │
│ mysqldump from old → mysql import to new staging DB │
│ Run smoke tests, verify row counts, app boots │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Step 2 (cutover, e.g. Sat 2am): │
│ a. Put OMS in maintenance mode (503 page via ingress) │
│ b. Wait for in-flight Sidekiq jobs to drain │
│ c. mysqldump --single-transaction from old prod DB │
│ d. Compress, upload to S3 │
│ e. Restore on new managed MySQL 8.0 │
│ f. Run integrity checks │
│ g. Update DATABASE_URL Secret in k8s │
│ h. Roll the web + sidekiq Deployments │
│ i. Smoke test │
│ j. Drop maintenance mode │
└──────────────────────────────────────────────────────────────┘
Estimated downtime by size: - 10 GB → ~30 min - 50 GB → ~2 hours - 100 GB → ~4 hours
Pros: dead simple, easy to reason about, rollback is just "point back at the old DB." Cons: hard downtime — bad for an OMS that processes orders 24/7.
Strategy B — Replication + cutover (low downtime)¶
Use when: DB is mid-size (50 GB – 1 TB) and downtime needs to be minutes, not hours.
┌─────────────────┐ ┌─────────────────────┐
│ Old prod MySQL │ ───── binlog ──────► │ New managed MySQL │
│ (5.7, Diff) │ replication │ 8.0 (replica mode) │
└─────────────────┘ └─────────────────────┘
│
│ catches up over hours/days
▼
"lag = 0" (caught up)
│
cutover window ▼
1. Briefly stop writes to old DB (web → 503, drain Sidekiq)
2. Wait for replica to catch up final transactions (seconds)
3. Promote new DB to primary (stop replication)
4. Update DATABASE_URL Secret, roll Deployments
5. Resume traffic
│
▼
Total downtime: 5–15 min
Tools to do this: - AWS Database Migration Service (DMS) — supports MySQL → MySQL with replication, including version upgrade. Recommended if landing on RDS. - Azure Database Migration Service — equivalent for Azure. - Manual MySQL replication — set up new DB as a replica of old using binlog. More fiddly, no managed UI; only do this if DMS isn't available.
Pros: minutes of downtime, well-established pattern. Cons: more setup; needs network connectivity from new DB to old DB (VPN/peering or public endpoint with TLS); needs replication user on old DB.
Strategy C — Dual-write (zero downtime, complex)¶
Use when: DB is huge (> 1 TB) or downtime is unacceptable.
App writes to both old and new DBs simultaneously for a period; reads are gradually shifted from old to new; once confidence is high, old DB is decommissioned.
This is heavyweight engineering — usually not worth it unless the size forces our hand. Recommendation: skip this; aim for Strategy B.
Recommendation¶
Until we have the DB size, hedge: plan for Strategy B (DMS-driven replication). If we discover the DB is small (<50 GB), downgrade to Strategy A.
Sequence of work¶
Order matters. Don't skip steps.
- Get the data points at the top of this doc.
- Provision new managed MySQL 8.0 (Senith). Apply our schema; load a small sample of prod data.
- Run RSpec suite against MySQL 8.0 in CI. Fix failures. Don't proceed until green.
- Full dry-run migration in non-prod: mysqldump from staging → restore on a staging-tier MySQL 8.0 → boot the staging app pointed at it → run smoke tests. Repeat until frictionless.
- Set up replication from prod to new MySQL 8.0 (DMS). Let it catch up over a few days. Monitor replication lag.
- Schedule a maintenance window. Communicate to internal users + any external callers (Shopify webhooks tolerate retries; SFCC partners may want notice).
- Cutover during the window. Have rollback ready (point
DATABASE_URLback at the old DB if anything goes sideways). - Decommission old DB after a cooling-off period (a week minimum, kept read-only for forensics).
Risks to call out to leadership¶
- Schema rollbacks during cutover are painful. If we deploy a migration that touches a hot table during the cutover, we can't easily back it out. Freeze schema migrations from a week before cutover until a week after.
- Sidekiq jobs hold long DB transactions. Confirm with Rails team which jobs do this; pause them during the cutover window or accept short retries.
- AUTO_INCREMENT counters. When we cut over, ensure the new DB's auto- increment values are at-or-above the old DB's. mysqldump preserves this; replication preserves this. Just verify post-migration before unfreezing writes.
- Datadog DB connection metrics. Have a dashboard ready showing connection count, error rate, slow query rate before/after. Watch it like a hawk for the first hour after cutover.
What changes in the app — nothing¶
Rails reads DATABASE_URL from env. Switching DBs = updating one Secret +
rolling the Deployments. No code change. This is the win of getting the env
var refactor done before the migration.