Skip to content
Build bf98f58

OMS Kubernetes Migration: Detail Docs

Companion to OMS Migration Plan: AWS EC2 → Azure AKS. The plan page is the executive view; these pages are the implementation-level detail produced by the Rails team during the prep work on the docker-setup-chnages branch.

Reading order

Doc What it covers
Workloads The five Kubernetes Deployments (web + three Sidekiq capsules + scheduler) with replica counts, commands, resource targets, and termination grace periods.
Environment variables Every env var the app reads. The input to the ConfigMap and Secret manifests.
External services Inbound and outbound integrations: Shopify, SFCC, Amazon SP-API, Cirro 3PL, Azure Service Bus, Postmark, Klaviyo. Egress firewall inputs.
Scheduled jobs Why sidekiq-scheduler must run as a singleton. Why we are not migrating to Kubernetes CronJobs.
Known issues Ten things in the codebase that need attention before cutover, including four specific filesystem-write call sites.
Decisions needed Open questions for leadership in suggested decision order.
Database migration MySQL 5.7 → 8.0: three strategy options compared, recommendation, sequence of work, risks.
Rails team role Clean ownership split: Rails team, DevOps (Senith), joint, leadership.
Shopify integration Shopify Partners app ownership, in-tree shopify-app-admin engine, OAuth tokens, webhook URLs, cutover risks.

Status

Rails-side prep is committed on the docker-setup-chnages branch. Every change is backwards-compatible — gated on environment variables — so the existing Capistrano deploy is unaffected.

Delivered:

  • Production Dockerfile (multi-stage, non-root, jemalloc, wkhtmltopdf)
  • .dockerignore for build context hygiene
  • Per-capsule Sidekiq configs (sidekiq.default.yml, sidekiq.limited.yml, sidekiq.single.yml, sidekiq.scheduler.yml)
  • /up health endpoint for liveness and readiness probes
  • STDOUT logging gated on RAILS_LOG_TO_STDOUT
  • Force-SSL, ASSUME_SSL, and hosts allowlist via env vars
  • Env-driven AWS S3 config (AWS_* env vars)
  • Env-driven Datadog StatsD destination (DD_AGENT_HOST, DD_DOGSTATSD_PORT)
  • Sidekiq graceful-shutdown timeout (:timeout: 25)
  • Build-resilient credential lookups (rescue wrappers in database.yml, secrets.yml, storage.yml)
  • A local-dev/ validation harness — kind-based local cluster that mirrors the target shape

Still owed:

  • Refactor of four filesystem-write call sites to stream to S3 (see Known issues)
  • MySQL 8.0 compatibility pass (RSpec against 8.0)
  • GitHub Actions workflow to build the image (replaces Capistrano)
  • Cutover smoke test script

See Rails team role for the full breakdown.