Workloads¶
The OMS app is not a single process. The container image is the same for
every workload below — only the command differs. Each row should map to a
distinct k8s Deployment (or Job / CronJob) so they can be scaled,
restarted, and resourced independently.
Deployments¶
| Name | Command | Replicas | Notes |
|---|---|---|---|
| web | bundle exec puma -C config/puma.rb |
2+ (HPA on CPU) | HTTP on 3000. Probes hit /up. Tune via WEB_CONCURRENCY (workers) and RAILS_MAX_THREADS (threads). Pool size in config/database.yml is currently 30 — make sure DB max-connections covers replicas × workers × threads. |
| sidekiq-default | bundle exec sidekiq -C config/sidekiq.default.yml |
2+ | Default capsule, concurrency 5, queues urgent, high, default, low. SIDEKIQ_SCHEDULER_ENABLED=false. |
| sidekiq-limited | bundle exec sidekiq -C config/sidekiq.limited.yml |
1–2 | Concurrency 4, queues nav, urgent, high, default, low. The nav queue is exclusive to this Deployment. |
| sidekiq-single | bundle exec sidekiq -C config/sidekiq.single.yml |
1 (do not autoscale) | Concurrency 1, queues camel, urgent, high, default, engine. The camel queue handles Cirro 3PL exports and PrintStation order creation — these jobs are not safe to run concurrently. replicas: 1 enforces cluster-wide single-thread processing. |
| sidekiq-scheduler | bundle exec sidekiq -C config/sidekiq.scheduler.yml |
1 (singleton, strategy: Recreate) | SIDEKIQ_SCHEDULER_ENABLED=true. Hosts the sidekiq-scheduler thread. If two run, every scheduled job fires twice. Schedules are stored in the DB-backed RecurringJob model — see scheduled-jobs.md. |
About per-capsule config files: the original config/sidekiq.yml defines all three capsules together (used by the current Capistrano deploy). For k8s, the Rails team has split it into four per-Deployment files (
sidekiq.default.yml,sidekiq.limited.yml,sidekiq.single.yml,sidekiq.scheduler.yml) so each Deployment runs exactly one capsule and can be scaled and resourced independently. The originalsidekiq.ymlis unchanged so the Capistrano deploy is unaffected.
Jobs¶
| Name | When | Command | Notes |
|---|---|---|---|
| db-migrate | Pre-deploy hook | bundle exec rails db:migrate |
Runs against the target DB before web/worker pods roll. Use a Helm pre-upgrade hook or Argo Sync hook. Must complete before new pods start serving. |
| assets-precompile | CI build-time | bundle exec rails assets:precompile |
Runs in the CI image-build pipeline (GitHub Actions), not in this repo's Dockerfile. The CI step passes the production RAILS_MASTER_KEY via BuildKit secret (--mount=type=secret) so the key never lands in an image layer. See the comment in docker/production/Dockerfile for the exact pattern. |
CronJobs (optional alternative to sidekiq-scheduler)¶
The current design uses sidekiq-scheduler with schedules in the DB. We can
keep that (singleton scheduler pod) or convert recurring jobs to k8s
CronJobs. Tradeoffs:
- Keep sidekiq-scheduler: schedules live in the DB, business users edit via UI, no k8s changes per schedule change. Single point of failure (the scheduler pod), but the queue still drains if it briefly restarts.
- Move to k8s CronJobs: visibility in
kubectl get cronjob, native retries/concurrency policy, but every schedule change is a k8s manifest change. Loses dynamic-via-DB editing.
Recommendation: keep sidekiq-scheduler for now (no business workflow change), revisit later.
Resource profile (TBD — needs production data)¶
We need to pull real numbers from Datadog before sizing. Placeholder targets based on typical Rails/Sidekiq:
| Workload | Request CPU | Limit CPU | Request RAM | Limit RAM |
|---|---|---|---|---|
| web (per pod) | 500m | 1500m | 768Mi | 1.5Gi |
| sidekiq-default | 300m | 1000m | 1Gi | 2Gi |
| sidekiq-limited | 250m | 800m | 768Mi | 1.5Gi |
| sidekiq-single | 200m | 600m | 512Mi | 1Gi |
| sidekiq-scheduler | 100m | 300m | 384Mi | 768Mi |
Action item for Rails team: pull P95 CPU/RAM per process from the last 30 days of Datadog and replace the table above before final rollout.
Termination grace periods¶
- web:
terminationGracePeriodSeconds: 30— Puma drains in-flight requests on SIGTERM. - sidekiq-*:
terminationGracePeriodSeconds: 60— must be ≥ Sidekiq's:timeout: 25(set in every per-capsule config file) plus margin. If we discover jobs that take >25s, raise:timeoutAND the grace period together.
Pod disruption budgets¶
- web:
minAvailable: 1 - sidekiq-default / sidekiq-limited:
minAvailable: 1 - sidekiq-single / sidekiq-scheduler: do not set a PDB requiring
0 — they are 1-replica singletons, a PDB blocks node drains.