Workloads¶

The OMS app is not a single process. The container image is the same for every workload below — only the command differs. Each row should map to a distinct k8s Deployment (or Job / CronJob) so they can be scaled, restarted, and resourced independently.

Deployments¶

Name	Command	Replicas	Notes
web	`bundle exec puma -C config/puma.rb`	2+ (HPA on CPU)	HTTP on 3000. Probes hit `/up`. Tune via `WEB_CONCURRENCY` (workers) and `RAILS_MAX_THREADS` (threads). Pool size in config/database.yml is currently 30 — make sure DB max-connections covers `replicas × workers × threads`.
sidekiq-default	`bundle exec sidekiq -C config/sidekiq.default.yml`	2+	Default capsule, concurrency 5, queues `urgent, high, default, low`. `SIDEKIQ_SCHEDULER_ENABLED=false`.
sidekiq-limited	`bundle exec sidekiq -C config/sidekiq.limited.yml`	1–2	Concurrency 4, queues `nav, urgent, high, default, low`. The `nav` queue is exclusive to this Deployment.
sidekiq-single	`bundle exec sidekiq -C config/sidekiq.single.yml`	1 (do not autoscale)	Concurrency 1, queues `camel, urgent, high, default, engine`. The `camel` queue handles Cirro 3PL exports and PrintStation order creation — these jobs are not safe to run concurrently. `replicas: 1` enforces cluster-wide single-thread processing.
sidekiq-scheduler	`bundle exec sidekiq -C config/sidekiq.scheduler.yml`	1 (singleton, strategy: Recreate)	`SIDEKIQ_SCHEDULER_ENABLED=true`. Hosts the sidekiq-scheduler thread. If two run, every scheduled job fires twice. Schedules are stored in the DB-backed `RecurringJob` model — see scheduled-jobs.md.

About per-capsule config files: the original config/sidekiq.yml defines all three capsules together (used by the current Capistrano deploy). For k8s, the Rails team has split it into four per-Deployment files (sidekiq.default.yml, sidekiq.limited.yml, sidekiq.single.yml, sidekiq.scheduler.yml) so each Deployment runs exactly one capsule and can be scaled and resourced independently. The original sidekiq.yml is unchanged so the Capistrano deploy is unaffected.

Jobs¶

Name	When	Command	Notes
db-migrate	Pre-deploy hook	`bundle exec rails db:migrate`	Runs against the target DB before web/worker pods roll. Use a Helm pre-upgrade hook or Argo Sync hook. Must complete before new pods start serving.
assets-precompile	CI build-time	`bundle exec rails assets:precompile`	Runs in the CI image-build pipeline (GitHub Actions), not in this repo's Dockerfile. The CI step passes the production `RAILS_MASTER_KEY` via BuildKit secret (`--mount=type=secret`) so the key never lands in an image layer. See the comment in docker/production/Dockerfile for the exact pattern.

CronJobs (optional alternative to sidekiq-scheduler)¶

The current design uses sidekiq-scheduler with schedules in the DB. We can keep that (singleton scheduler pod) or convert recurring jobs to k8s CronJobs. Tradeoffs:

Keep sidekiq-scheduler: schedules live in the DB, business users edit via UI, no k8s changes per schedule change. Single point of failure (the scheduler pod), but the queue still drains if it briefly restarts.
Move to k8s CronJobs: visibility in kubectl get cronjob, native retries/concurrency policy, but every schedule change is a k8s manifest change. Loses dynamic-via-DB editing.

Recommendation: keep sidekiq-scheduler for now (no business workflow change), revisit later.

Resource profile (TBD — needs production data)¶

We need to pull real numbers from Datadog before sizing. Placeholder targets based on typical Rails/Sidekiq:

Workload	Request CPU	Limit CPU	Request RAM	Limit RAM
web (per pod)	500m	1500m	768Mi	1.5Gi
sidekiq-default	300m	1000m	1Gi	2Gi
sidekiq-limited	250m	800m	768Mi	1.5Gi
sidekiq-single	200m	600m	512Mi	1Gi
sidekiq-scheduler	100m	300m	384Mi	768Mi

Action item for Rails team: pull P95 CPU/RAM per process from the last 30 days of Datadog and replace the table above before final rollout.

Termination grace periods¶

web: terminationGracePeriodSeconds: 30 — Puma drains in-flight requests on SIGTERM.
sidekiq-*: terminationGracePeriodSeconds: 60 — must be ≥ Sidekiq's :timeout: 25 (set in every per-capsule config file) plus margin. If we discover jobs that take >25s, raise :timeout AND the grace period together.

Pod disruption budgets¶

web: minAvailable: 1
sidekiq-default / sidekiq-limited: minAvailable: 1
sidekiq-single / sidekiq-scheduler: do not set a PDB requiring

0 — they are 1-replica singletons, a PDB blocks node drains.