Skip to content
Build bf98f58

Rails Team's Role in This Migration

A clean split of responsibilities so Senith (DevOps) and the Rails team don't trip over each other or miss things.

Rails team owns

The application layer. Anything inside this Git repo, plus the artifact that gets deployed.

Deliverable Status Notes
Production Dockerfile ✅ Done docker/production/Dockerfile
.dockerignore ✅ Done .dockerignore
/up health endpoint ✅ Done config/routes.rb
Logs to stdout (env-gated) ✅ Done config/initializers/semantic_logger.rb
Force SSL + hosts allowlist (env-gated) ✅ Done config/environments/production.rb
ActiveStorage env-driven ✅ Done config/storage.yml, config/initializers/aws_s3.rb
Datadog StatsD env-driven ✅ Done config/initializers/datadog.rb
Sidekiq graceful shutdown timeout ✅ Done config/sidekiq.yml
Per-capsule Sidekiq YAML slices ✅ Done config/sidekiq.default.yml, config/sidekiq.limited.yml, config/sidekiq.single.yml, config/sidekiq.scheduler.yml
Build-resilient credential lookups ✅ Done rescue wrappers in config/database.yml, config/secrets.yml, config/storage.yml, and several initializers.
Inventory docs for Senith ✅ Done this folder, plus local-dev/ folder in the OMS repo validation harness
Filesystem-write refactors ⏳ Owed 4 call sites in known-issues.md — stream to S3 instead
MySQL 8.0 compatibility pass ⏳ Owed Run RSpec against MySQL 8.0; fix breakages
GitHub Actions: build + push image ⏳ Owed Replaces Capistrano. Includes assets:precompile step with production master key as a BuildKit secret
Smoke test script for cutover ⏳ Owed Curl /up + a few critical endpoints + enqueue+process a test Sidekiq job
Decommission Capistrano files ⏳ After cutover Capfile, config/deploy*.rb, lib/capistrano/

Senith (DevOps) owns

Everything outside the Git repo. The platform.

Item Notes
Kubernetes cluster (AKS or EKS) Region, node pool sizing, autoscaling config
Container registry (ACR or ECR) Plus image pull secret
Managed MySQL 8.0 Instance class, storage, backups, multi-AZ
Managed Redis Sized for our Sidekiq throughput
Ingress controller + TLS nginx-ingress / app gateway, cert-manager
Secret store + sync Azure Key Vault (or AWS Secrets Manager) + External Secrets Operator
Datadog Agent DaemonSet on each node
Network: NAT gateway with static egress IP For partner allowlists
DNS records Map oms.popsockets.com to the new ingress
Argo CD or other rollout mechanism How kubectl apply actually happens in CI
Helm chart / kustomize manifests Deployments, Services, Ingress, ConfigMap, Secret refs, HPA
Pre-deploy db:migrate Job Helm pre-upgrade hook or Argo Sync hook
Backups + DR plan DB snapshots, S3 versioning

Joint — both sides need to be in the room

Item Why both
Database migration cutover Rails team owns app readiness + smoke tests; Senith owns the data move
Resource sizing per workload Rails team pulls Datadog numbers; Senith translates to k8s requests/limits
Env var → Secret/ConfigMap mapping Rails team owns the list (in env-vars.md); Senith owns how they're delivered
Ingress hostnames Rails team needs them in RAILS_ALLOWED_HOSTS; Senith configures them on the ingress
Rollback plan Rails team builds the smoke test gate; Senith builds the rollback mechanism

Decisions that block both sides

These need leadership input. Listed in priority order; see decisions-needed.md for full detail.

  1. Cloud target — Azure or AWS?
  2. Object storage — stay on S3 or move to Azure Blob?
  3. MySQL 8.0 cutover timing
  4. Static egress IP — confirm the partner list

Workflow once we start

  1. Senith provisions the cluster + managed services in a sandbox/staging environment.
  2. Rails team builds and pushes the image to the registry (manually first, GitHub Actions later).
  3. Senith deploys the image as a Helm release with our manifests.
  4. Rails team runs smoke tests against the staging deployment. Iterate until clean.
  5. Repeat for sandbox, sandbox2, staging.
  6. Rehearse production migration in staging — full DB cutover, full smoke pass.
  7. Production cutover during scheduled maintenance window.
  8. Old infra (Diff) remains read-only standby for 1–2 weeks before decommission.

What to escalate

  • Anything in the encrypted credentials file that needs to leave it — flag to Senith so it ends up in the Secret store, not the image.
  • Any Sidekiq job that runs > 25 seconds — flag to Senith so the terminationGracePeriodSeconds is sized correctly.
  • Any external integration with a static-IP allowlist on their side — needs the new NAT egress IP communicated.
  • Any feature behind a Flipper flag that should be re-evaluated during the migration window.
  • Any data we hold that has data-residency rules (PII, payment, etc.).

How to talk about this with leadership

When the manager asks "what's left on your side?":

"Code changes: shipped. Production Dockerfile: shipped. Per-capsule Sidekiq config files: shipped. Documentation for Senith: shipped. Still owed: refactor of 4 filesystem-write call sites to stream to S3, run RSpec against MySQL 8.0, GitHub Actions workflow to build the image (including the asset-precompile step), and a smoke test script for cutover. None of those block Senith from starting — he can begin provisioning the cluster today."

That's a clean, accurate summary. Don't oversell what's done, don't understate what's left.