Skip to content
Build bf98f58

Known Issues — Things That Break Under Kubernetes

Things in the codebase that won't behave correctly in a containerized, multi-pod environment. Each item lists the call site, the symptom, and the fix path.

1. Local filesystem writes (will silently lose data)

Several jobs write to the pod's local filesystem. Pods are ephemeral — these files vanish on restart, and pods don't share disk, so a multi-replica worker tier will lose track of "where did the file go."

Call site What it writes Fix
app/utilities/drive_commerce_utility.rb:190 Recipe assets fetched from Drive Commerce, written via File.open(saved_file_path, 'wb'). Stream straight to S3 (the file is later uploaded anyway).
app/jobs/batching/print_station_upload_job.rb:199 Per-batch directories under a working dir, via FileUtils.mkdir_p(sub_batch_directory). Use a Tempfile / Dir.mktmpdir (lives under /tmp, fine within a single job execution) and upload pieces to S3 before the job ends.
app/helpers/orders_helper.rb:223 FileUtils.mkdir_p(sub_directory) for order-related artifacts. Same approach.
app/services/batching/generate_csv.rb:101 FileUtils.mkdir_p(@local_root_csv_directory) for batched CSVs. Same approach — write to Tempfile, upload to S3.

Interim mitigation if we can't refactor before cutover: mount an emptyDir volume at the working path. This keeps the writes per-pod (still ephemeral) but at least the path exists and is writable. Only safe if no two pods need to read each other's files — confirm with Rails team per call site before relying on this.

2. Datadog StatsD destination was hardcoded

config/initializers/datadog.rb:43 previously hardcoded Datadog::Statsd.new('localhost', 8125, ...). Fixed in this branch: now reads DD_AGENT_HOST / DD_DOGSTATSD_PORT with localhost / 8125 as defaults so the existing Capistrano deploy is unaffected.

For k8s, the standard pattern with the Datadog Agent as a DaemonSet is to inject the host node IP via the downward API:

env:
  - name: DD_AGENT_HOST
    valueFrom:
      fieldRef:
        fieldPath: status.hostIP

If running the agent as a sidecar instead, leave DD_AGENT_HOST=localhost.

3. Logging to a file by default

Was: config/initializers/semantic_logger.rb wrote JSON to log/<env>.log for any non-development environment. In k8s that file is on an ephemeral pod disk and no log shipper picks it up.

Fixed in this branch: when RAILS_LOG_TO_STDOUT=1 is set, semantic_logger writes JSON to stdout. Behavior unchanged when the var is unset (current Capistrano deploy keeps using files).

4. Wkhtmltopdf binary path

The pdfkit and imgkit initializers respect WKHTMLTOPDF_PATH / WKHTMLTOIMAGE_PATH env vars. The production Dockerfile already sets them to /usr/bin/wkhtmltopdf and /usr/bin/wkhtmltoimage (from the Debian package). No further action required.

5. ActiveStorage region/bucket hardcoding

Was: config/storage.yml and config/initializers/aws_s3.rb hardcoded us-east-1 and read keys from Rails credentials only.

Fixed in this branch: env vars (AWS_REGION, AWS_S3_BUCKET, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) take precedence; if no access key is set, aws-sdk falls back to its default credential chain (IRSA / pod identity). This is the recommended config for k8s.

6. Force-SSL was off

Was: config.force_ssl = false in config/environments/production.rb.

Fixed in this branch: gated on RAILS_FORCE_SSL and RAILS_ASSUME_SSL. For k8s, set both to 1 so the app: - Redirects HTTP → HTTPS at the app layer (defense in depth behind the ingress) - Sets Strict-Transport-Security, Secure cookie flag - Trusts the X-Forwarded-Proto: https header from the ingress so it doesn't see incoming requests as plain HTTP and redirect-loop

7. Hosts allowlist

Rails (with config.load_defaults 7.0) requires explicit config.hosts. Without it, the app rejects all requests with "Blocked hosts" errors when hit on a hostname Rails doesn't recognize.

Fixed in this branch: set RAILS_ALLOWED_HOSTS to a CSV of accepted hostnames (e.g. oms.popsockets.com,oms-internal.popsockets.com).

8. Capistrano deploy assumptions

config/deploy.rb and config/deploy/ reference RVM paths (/usr/local/rvm/gems/ruby-3.4.8/wrappers), Passenger (sudo passenger-config restart-app), and shared symlinks for the master key. None of this applies in k8s.

Plan: keep the Capistrano files in place during dual-running of the old infra and the new k8s deploy. Delete after cutover.

9. Sidekiq Web admin UI exposure

config/routes.rb mounts /sidekiq, /flipper, /event_store behind a CanAccessInternalConfigUI constraint. Verify with Rails team that this constraint still gates the right people in the new auth setup before the ingress is exposed.

10. MySQL 5.7 in dev/compose

docker-compose.yml uses MySQL 5.7. MySQL 5.7 is EOL as of October 2023. Production should target MySQL 8.0 / Aurora MySQL 8 on the new infra. Run the test suite against 8.0 before cutover — the schema may have minor incompatibilities (mostly around utf8mb4 collations and reserved words).