Known Issues — Things That Break Under Kubernetes¶
Things in the codebase that won't behave correctly in a containerized, multi-pod environment. Each item lists the call site, the symptom, and the fix path.
1. Local filesystem writes (will silently lose data)¶
Several jobs write to the pod's local filesystem. Pods are ephemeral — these files vanish on restart, and pods don't share disk, so a multi-replica worker tier will lose track of "where did the file go."
| Call site | What it writes | Fix |
|---|---|---|
| app/utilities/drive_commerce_utility.rb:190 | Recipe assets fetched from Drive Commerce, written via File.open(saved_file_path, 'wb'). |
Stream straight to S3 (the file is later uploaded anyway). |
| app/jobs/batching/print_station_upload_job.rb:199 | Per-batch directories under a working dir, via FileUtils.mkdir_p(sub_batch_directory). |
Use a Tempfile / Dir.mktmpdir (lives under /tmp, fine within a single job execution) and upload pieces to S3 before the job ends. |
| app/helpers/orders_helper.rb:223 | FileUtils.mkdir_p(sub_directory) for order-related artifacts. |
Same approach. |
| app/services/batching/generate_csv.rb:101 | FileUtils.mkdir_p(@local_root_csv_directory) for batched CSVs. |
Same approach — write to Tempfile, upload to S3. |
Interim mitigation if we can't refactor before cutover: mount an
emptyDir volume at the working path. This keeps the writes per-pod
(still ephemeral) but at least the path exists and is writable. Only safe
if no two pods need to read each other's files — confirm with Rails team
per call site before relying on this.
2. Datadog StatsD destination was hardcoded¶
config/initializers/datadog.rb:43
previously hardcoded Datadog::Statsd.new('localhost', 8125, ...).
Fixed in this branch: now reads DD_AGENT_HOST / DD_DOGSTATSD_PORT
with localhost / 8125 as defaults so the existing Capistrano deploy is
unaffected.
For k8s, the standard pattern with the Datadog Agent as a DaemonSet is to inject the host node IP via the downward API:
If running the agent as a sidecar instead, leave DD_AGENT_HOST=localhost.
3. Logging to a file by default¶
Was: config/initializers/semantic_logger.rb
wrote JSON to log/<env>.log for any non-development environment. In k8s
that file is on an ephemeral pod disk and no log shipper picks it up.
Fixed in this branch: when RAILS_LOG_TO_STDOUT=1 is set, semantic_logger
writes JSON to stdout. Behavior unchanged when the var is unset (current
Capistrano deploy keeps using files).
4. Wkhtmltopdf binary path¶
The pdfkit and imgkit initializers respect WKHTMLTOPDF_PATH /
WKHTMLTOIMAGE_PATH env vars. The production Dockerfile already sets them
to /usr/bin/wkhtmltopdf and /usr/bin/wkhtmltoimage (from the Debian
package). No further action required.
5. ActiveStorage region/bucket hardcoding¶
Was: config/storage.yml and
config/initializers/aws_s3.rb
hardcoded us-east-1 and read keys from Rails credentials only.
Fixed in this branch: env vars (AWS_REGION, AWS_S3_BUCKET,
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) take precedence; if no access
key is set, aws-sdk falls back to its default credential chain (IRSA / pod
identity). This is the recommended config for k8s.
6. Force-SSL was off¶
Was: config.force_ssl = false in
config/environments/production.rb.
Fixed in this branch: gated on RAILS_FORCE_SSL and RAILS_ASSUME_SSL.
For k8s, set both to 1 so the app:
- Redirects HTTP → HTTPS at the app layer (defense in depth behind the
ingress)
- Sets Strict-Transport-Security, Secure cookie flag
- Trusts the X-Forwarded-Proto: https header from the ingress so it
doesn't see incoming requests as plain HTTP and redirect-loop
7. Hosts allowlist¶
Rails (with config.load_defaults 7.0) requires explicit config.hosts.
Without it, the app rejects all requests with "Blocked hosts" errors when
hit on a hostname Rails doesn't recognize.
Fixed in this branch: set RAILS_ALLOWED_HOSTS to a CSV of accepted
hostnames (e.g. oms.popsockets.com,oms-internal.popsockets.com).
8. Capistrano deploy assumptions¶
config/deploy.rb and
config/deploy/ reference RVM paths
(/usr/local/rvm/gems/ruby-3.4.8/wrappers), Passenger (sudo
passenger-config restart-app), and shared symlinks for the master key.
None of this applies in k8s.
Plan: keep the Capistrano files in place during dual-running of the old infra and the new k8s deploy. Delete after cutover.
9. Sidekiq Web admin UI exposure¶
config/routes.rb mounts /sidekiq, /flipper,
/event_store behind a CanAccessInternalConfigUI constraint. Verify with
Rails team that this constraint still gates the right people in the new
auth setup before the ingress is exposed.
10. MySQL 5.7 in dev/compose¶
docker-compose.yml uses MySQL 5.7. MySQL 5.7 is
EOL as of October 2023. Production should target MySQL 8.0 / Aurora
MySQL 8 on the new infra. Run the test suite against 8.0 before cutover —
the schema may have minor incompatibilities (mostly around utf8mb4
collations and reserved words).