Skip to content
Build bf98f58

Database Migration: Diff's MySQL → New Managed MySQL 8.0

How we move OMS's data from the existing Diff-managed database onto the new managed MySQL 8.0 (Azure Database for MySQL or AWS RDS).

Pre-work — things we need to know before scoping this

We can't pick a strategy without these data points. Get them first:

Data point How to get it Where to record
Current MySQL version SSH a prod server, mysql --version or SELECT VERSION(); Likely 5.7 (matches dev) — confirm
DB size on disk On the DB server: du -sh /var/lib/mysql/<dbname>/ or AWS RDS console "Storage" This is the single biggest input
Largest tables SELECT TABLE_NAME, ROUND((DATA_LENGTH+INDEX_LENGTH)/1024/1024/1024,2) AS gb FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA='<db>' ORDER BY gb DESC LIMIT 10; Drives migration time
Daily write rate (rough) Datadog: MySQL writes/sec average Affects "can we accept downtime?"
Any stored procedures / triggers / views SHOW PROCEDURE STATUS WHERE Db='<db>', SHOW TRIGGERS, SHOW FULL TABLES WHERE Table_type='VIEW' Likely none in a Rails app — confirm
Where the current DB actually lives RDS instance? Self-hosted on one of the EC2s? Check config/deploy/production.rb — if no db: role server, it's external

The version jump: 5.7 → 8.0

MySQL 5.7 reached EOL in October 2023. The new managed DB will be 8.0. This is not a free version bump — there are real compat concerns:

  1. Reserved word changes. 8.0 added RANK, ROW_NUMBER, WINDOW, etc. as reserved words. If any column/table name uses these, queries break. Easy to grep for.
  2. utf8utf8mb4 defaults. Our config/database.yml already uses utf8mb4 so we're fine, but verify on the actual prod DB.
  3. Default authentication plugin changed (mysql_native_passwordcaching_sha2_password). Make sure the mysql2 gem version supports the new default. Our mysql2 >= 0.4.4 should be fine; we run mysql2 0.5.x per Gemfile.lock.
  4. sql_mode — 8.0 has stricter defaults. May reject queries that 5.7 silently coerced. Found by running tests.

Action: before any data move, run our full RSpec suite against a fresh MySQL 8.0 instance with the current schema loaded. Fix any failures. Until this passes, we're not migrating anything.

Strategy A — mysqldump + restore (simple, with downtime)

Use when: DB is small (< ~50 GB) and we can afford a maintenance window.

   ┌──────────────────────────────────────────────────────────────┐
   │  Step 1 (T-1 day): dry run in non-prod                       │
   │     mysqldump from old → mysql import to new staging DB      │
   │     Run smoke tests, verify row counts, app boots            │
   └──────────────────────────────────────────────────────────────┘
   ┌──────────────────────────────────────────────────────────────┐
   │  Step 2 (cutover, e.g. Sat 2am):                             │
   │     a. Put OMS in maintenance mode (503 page via ingress)    │
   │     b. Wait for in-flight Sidekiq jobs to drain              │
   │     c. mysqldump --single-transaction from old prod DB       │
   │     d. Compress, upload to S3                                │
   │     e. Restore on new managed MySQL 8.0                      │
   │     f. Run integrity checks                                  │
   │     g. Update DATABASE_URL Secret in k8s                     │
   │     h. Roll the web + sidekiq Deployments                    │
   │     i. Smoke test                                            │
   │     j. Drop maintenance mode                                 │
   └──────────────────────────────────────────────────────────────┘

Estimated downtime by size: - 10 GB → ~30 min - 50 GB → ~2 hours - 100 GB → ~4 hours

Pros: dead simple, easy to reason about, rollback is just "point back at the old DB." Cons: hard downtime — bad for an OMS that processes orders 24/7.

Strategy B — Replication + cutover (low downtime)

Use when: DB is mid-size (50 GB – 1 TB) and downtime needs to be minutes, not hours.

   ┌─────────────────┐                       ┌─────────────────────┐
   │ Old prod MySQL  │  ───── binlog ──────► │ New managed MySQL    │
   │   (5.7, Diff)   │     replication        │   8.0 (replica mode) │
   └─────────────────┘                       └─────────────────────┘
                                                       │ catches up over hours/days
                                              "lag = 0" (caught up)
                              cutover window           ▼
        1. Briefly stop writes to old DB (web → 503, drain Sidekiq)
        2. Wait for replica to catch up final transactions (seconds)
        3. Promote new DB to primary (stop replication)
        4. Update DATABASE_URL Secret, roll Deployments
        5. Resume traffic
                                              Total downtime: 5–15 min

Tools to do this: - AWS Database Migration Service (DMS) — supports MySQL → MySQL with replication, including version upgrade. Recommended if landing on RDS. - Azure Database Migration Service — equivalent for Azure. - Manual MySQL replication — set up new DB as a replica of old using binlog. More fiddly, no managed UI; only do this if DMS isn't available.

Pros: minutes of downtime, well-established pattern. Cons: more setup; needs network connectivity from new DB to old DB (VPN/peering or public endpoint with TLS); needs replication user on old DB.

Strategy C — Dual-write (zero downtime, complex)

Use when: DB is huge (> 1 TB) or downtime is unacceptable.

App writes to both old and new DBs simultaneously for a period; reads are gradually shifted from old to new; once confidence is high, old DB is decommissioned.

This is heavyweight engineering — usually not worth it unless the size forces our hand. Recommendation: skip this; aim for Strategy B.

Recommendation

Until we have the DB size, hedge: plan for Strategy B (DMS-driven replication). If we discover the DB is small (<50 GB), downgrade to Strategy A.

Sequence of work

Order matters. Don't skip steps.

  1. Get the data points at the top of this doc.
  2. Provision new managed MySQL 8.0 (Senith). Apply our schema; load a small sample of prod data.
  3. Run RSpec suite against MySQL 8.0 in CI. Fix failures. Don't proceed until green.
  4. Full dry-run migration in non-prod: mysqldump from staging → restore on a staging-tier MySQL 8.0 → boot the staging app pointed at it → run smoke tests. Repeat until frictionless.
  5. Set up replication from prod to new MySQL 8.0 (DMS). Let it catch up over a few days. Monitor replication lag.
  6. Schedule a maintenance window. Communicate to internal users + any external callers (Shopify webhooks tolerate retries; SFCC partners may want notice).
  7. Cutover during the window. Have rollback ready (point DATABASE_URL back at the old DB if anything goes sideways).
  8. Decommission old DB after a cooling-off period (a week minimum, kept read-only for forensics).

Risks to call out to leadership

  1. Schema rollbacks during cutover are painful. If we deploy a migration that touches a hot table during the cutover, we can't easily back it out. Freeze schema migrations from a week before cutover until a week after.
  2. Sidekiq jobs hold long DB transactions. Confirm with Rails team which jobs do this; pause them during the cutover window or accept short retries.
  3. AUTO_INCREMENT counters. When we cut over, ensure the new DB's auto- increment values are at-or-above the old DB's. mysqldump preserves this; replication preserves this. Just verify post-migration before unfreezing writes.
  4. Datadog DB connection metrics. Have a dashboard ready showing connection count, error rate, slow query rate before/after. Watch it like a hawk for the first hour after cutover.

What changes in the app — nothing

Rails reads DATABASE_URL from env. Switching DBs = updating one Secret + rolling the Deployments. No code change. This is the win of getting the env var refactor done before the migration.