Rails Team's Role in This Migration¶

A clean split of responsibilities so Senith (DevOps) and the Rails team don't trip over each other or miss things.

Rails team owns¶

The application layer. Anything inside this Git repo, plus the artifact that gets deployed.

Deliverable	Status	Notes
Production Dockerfile	✅ Done	docker/production/Dockerfile
`.dockerignore`	✅ Done	.dockerignore
`/up` health endpoint	✅ Done	config/routes.rb
Logs to stdout (env-gated)	✅ Done	config/initializers/semantic_logger.rb
Force SSL + hosts allowlist (env-gated)	✅ Done	config/environments/production.rb
ActiveStorage env-driven	✅ Done	config/storage.yml, config/initializers/aws_s3.rb
Datadog StatsD env-driven	✅ Done	config/initializers/datadog.rb
Sidekiq graceful shutdown timeout	✅ Done	config/sidekiq.yml
Per-capsule Sidekiq YAML slices	✅ Done	config/sidekiq.default.yml, config/sidekiq.limited.yml, config/sidekiq.single.yml, config/sidekiq.scheduler.yml
Build-resilient credential lookups	✅ Done	`rescue` wrappers in config/database.yml, config/secrets.yml, config/storage.yml, and several initializers.
Inventory docs for Senith	✅ Done	this folder, plus `local-dev/` folder in the OMS repo validation harness
Filesystem-write refactors	⏳ Owed	4 call sites in known-issues.md — stream to S3 instead
MySQL 8.0 compatibility pass	⏳ Owed	Run RSpec against MySQL 8.0; fix breakages
GitHub Actions: build + push image	⏳ Owed	Replaces Capistrano. Includes `assets:precompile` step with production master key as a BuildKit secret
Smoke test script for cutover	⏳ Owed	Curl `/up` + a few critical endpoints + enqueue+process a test Sidekiq job
Decommission Capistrano files	⏳ After cutover	Capfile, config/deploy*.rb, lib/capistrano/

Senith (DevOps) owns¶

Everything outside the Git repo. The platform.

Item	Notes
Kubernetes cluster (AKS or EKS)	Region, node pool sizing, autoscaling config
Container registry (ACR or ECR)	Plus image pull secret
Managed MySQL 8.0	Instance class, storage, backups, multi-AZ
Managed Redis	Sized for our Sidekiq throughput
Ingress controller + TLS	nginx-ingress / app gateway, cert-manager
Secret store + sync	Azure Key Vault (or AWS Secrets Manager) + External Secrets Operator
Datadog Agent	DaemonSet on each node
Network: NAT gateway with static egress IP	For partner allowlists
DNS records	Map `oms.popsockets.com` to the new ingress
Argo CD or other rollout mechanism	How `kubectl apply` actually happens in CI
Helm chart / kustomize manifests	Deployments, Services, Ingress, ConfigMap, Secret refs, HPA
Pre-deploy `db:migrate` Job	Helm pre-upgrade hook or Argo Sync hook
Backups + DR plan	DB snapshots, S3 versioning

Joint — both sides need to be in the room¶

Item	Why both
Database migration cutover	Rails team owns app readiness + smoke tests; Senith owns the data move
Resource sizing per workload	Rails team pulls Datadog numbers; Senith translates to k8s requests/limits
Env var → Secret/ConfigMap mapping	Rails team owns the list (in env-vars.md); Senith owns how they're delivered
Ingress hostnames	Rails team needs them in `RAILS_ALLOWED_HOSTS`; Senith configures them on the ingress
Rollback plan	Rails team builds the smoke test gate; Senith builds the rollback mechanism

Decisions that block both sides¶

These need leadership input. Listed in priority order; see decisions-needed.md for full detail.

Cloud target — Azure or AWS?
Object storage — stay on S3 or move to Azure Blob?
MySQL 8.0 cutover timing
Static egress IP — confirm the partner list

Workflow once we start¶

Senith provisions the cluster + managed services in a sandbox/staging environment.
Rails team builds and pushes the image to the registry (manually first, GitHub Actions later).
Senith deploys the image as a Helm release with our manifests.
Rails team runs smoke tests against the staging deployment. Iterate until clean.
Repeat for sandbox, sandbox2, staging.
Rehearse production migration in staging — full DB cutover, full smoke pass.
Production cutover during scheduled maintenance window.
Old infra (Diff) remains read-only standby for 1–2 weeks before decommission.

What to escalate¶

Anything in the encrypted credentials file that needs to leave it — flag to Senith so it ends up in the Secret store, not the image.
Any Sidekiq job that runs > 25 seconds — flag to Senith so the terminationGracePeriodSeconds is sized correctly.
Any external integration with a static-IP allowlist on their side — needs the new NAT egress IP communicated.
Any feature behind a Flipper flag that should be re-evaluated during the migration window.
Any data we hold that has data-residency rules (PII, payment, etc.).

How to talk about this with leadership¶

When the manager asks "what's left on your side?":

"Code changes: shipped. Production Dockerfile: shipped. Per-capsule Sidekiq config files: shipped. Documentation for Senith: shipped. Still owed: refactor of 4 filesystem-write call sites to stream to S3, run RSpec against MySQL 8.0, GitHub Actions workflow to build the image (including the asset-precompile step), and a smoke test script for cutover. None of those block Senith from starting — he can begin provisioning the cluster today."

That's a clean, accurate summary. Don't oversell what's done, don't understate what's left.