Rails Team's Role in This Migration¶
A clean split of responsibilities so Senith (DevOps) and the Rails team don't trip over each other or miss things.
Rails team owns¶
The application layer. Anything inside this Git repo, plus the artifact that gets deployed.
| Deliverable | Status | Notes |
|---|---|---|
| Production Dockerfile | ✅ Done | docker/production/Dockerfile |
.dockerignore |
✅ Done | .dockerignore |
/up health endpoint |
✅ Done | config/routes.rb |
| Logs to stdout (env-gated) | ✅ Done | config/initializers/semantic_logger.rb |
| Force SSL + hosts allowlist (env-gated) | ✅ Done | config/environments/production.rb |
| ActiveStorage env-driven | ✅ Done | config/storage.yml, config/initializers/aws_s3.rb |
| Datadog StatsD env-driven | ✅ Done | config/initializers/datadog.rb |
| Sidekiq graceful shutdown timeout | ✅ Done | config/sidekiq.yml |
| Per-capsule Sidekiq YAML slices | ✅ Done | config/sidekiq.default.yml, config/sidekiq.limited.yml, config/sidekiq.single.yml, config/sidekiq.scheduler.yml |
| Build-resilient credential lookups | ✅ Done | rescue wrappers in config/database.yml, config/secrets.yml, config/storage.yml, and several initializers. |
| Inventory docs for Senith | ✅ Done | this folder, plus local-dev/ folder in the OMS repo validation harness |
| Filesystem-write refactors | ⏳ Owed | 4 call sites in known-issues.md — stream to S3 instead |
| MySQL 8.0 compatibility pass | ⏳ Owed | Run RSpec against MySQL 8.0; fix breakages |
| GitHub Actions: build + push image | ⏳ Owed | Replaces Capistrano. Includes assets:precompile step with production master key as a BuildKit secret |
| Smoke test script for cutover | ⏳ Owed | Curl /up + a few critical endpoints + enqueue+process a test Sidekiq job |
| Decommission Capistrano files | ⏳ After cutover | Capfile, config/deploy*.rb, lib/capistrano/ |
Senith (DevOps) owns¶
Everything outside the Git repo. The platform.
| Item | Notes |
|---|---|
| Kubernetes cluster (AKS or EKS) | Region, node pool sizing, autoscaling config |
| Container registry (ACR or ECR) | Plus image pull secret |
| Managed MySQL 8.0 | Instance class, storage, backups, multi-AZ |
| Managed Redis | Sized for our Sidekiq throughput |
| Ingress controller + TLS | nginx-ingress / app gateway, cert-manager |
| Secret store + sync | Azure Key Vault (or AWS Secrets Manager) + External Secrets Operator |
| Datadog Agent | DaemonSet on each node |
| Network: NAT gateway with static egress IP | For partner allowlists |
| DNS records | Map oms.popsockets.com to the new ingress |
| Argo CD or other rollout mechanism | How kubectl apply actually happens in CI |
| Helm chart / kustomize manifests | Deployments, Services, Ingress, ConfigMap, Secret refs, HPA |
Pre-deploy db:migrate Job |
Helm pre-upgrade hook or Argo Sync hook |
| Backups + DR plan | DB snapshots, S3 versioning |
Joint — both sides need to be in the room¶
| Item | Why both |
|---|---|
| Database migration cutover | Rails team owns app readiness + smoke tests; Senith owns the data move |
| Resource sizing per workload | Rails team pulls Datadog numbers; Senith translates to k8s requests/limits |
| Env var → Secret/ConfigMap mapping | Rails team owns the list (in env-vars.md); Senith owns how they're delivered |
| Ingress hostnames | Rails team needs them in RAILS_ALLOWED_HOSTS; Senith configures them on the ingress |
| Rollback plan | Rails team builds the smoke test gate; Senith builds the rollback mechanism |
Decisions that block both sides¶
These need leadership input. Listed in priority order; see decisions-needed.md for full detail.
- Cloud target — Azure or AWS?
- Object storage — stay on S3 or move to Azure Blob?
- MySQL 8.0 cutover timing
- Static egress IP — confirm the partner list
Workflow once we start¶
- Senith provisions the cluster + managed services in a sandbox/staging environment.
- Rails team builds and pushes the image to the registry (manually first, GitHub Actions later).
- Senith deploys the image as a Helm release with our manifests.
- Rails team runs smoke tests against the staging deployment. Iterate until clean.
- Repeat for sandbox, sandbox2, staging.
- Rehearse production migration in staging — full DB cutover, full smoke pass.
- Production cutover during scheduled maintenance window.
- Old infra (Diff) remains read-only standby for 1–2 weeks before decommission.
What to escalate¶
- Anything in the encrypted credentials file that needs to leave it — flag to Senith so it ends up in the Secret store, not the image.
- Any Sidekiq job that runs > 25 seconds — flag to Senith so the
terminationGracePeriodSecondsis sized correctly. - Any external integration with a static-IP allowlist on their side — needs the new NAT egress IP communicated.
- Any feature behind a Flipper flag that should be re-evaluated during the migration window.
- Any data we hold that has data-residency rules (PII, payment, etc.).
How to talk about this with leadership¶
When the manager asks "what's left on your side?":
"Code changes: shipped. Production Dockerfile: shipped. Per-capsule Sidekiq config files: shipped. Documentation for Senith: shipped. Still owed: refactor of 4 filesystem-write call sites to stream to S3, run RSpec against MySQL 8.0, GitHub Actions workflow to build the image (including the asset-precompile step), and a smoke test script for cutover. None of those block Senith from starting — he can begin provisioning the cluster today."
That's a clean, accurate summary. Don't oversell what's done, don't understate what's left.