Book Consultation Submit Ticket

Docker Container Runtime Troubleshooting Deep Dive

A practical guide to diagnosing and resolving common Docker container runtime issues, with a focus on OOMKilled (exit code 137). Includes step-by-step diagnosis, commands, risk controls, rollback, and verification.

Docker Container Runtime Troubleshooting Deep Dive
DevOps 6min 9 views 2026-06-13
Dockercontainer runtimetroubleshootingOOMKilledexit code 137

Scenario

A production container running a Java application repeatedly exits with code 137 (OOMKilled). Application logs show no errors, but the container crashes every few minutes.

Symptoms

  • docker ps -a shows container status as Exited (137)
  • docker inspect <container> shows State.ExitCode: 137 and State.OOMKilled: true
  • Host dmesg contains Out of memory: Kill process messages
  • Container memory usage approaches or exceeds the limit

Diagnosis

  1. Check exit code: docker inspect <container> --format='{{.State.ExitCode}}'
  2. Verify OOM: docker inspect <container> --format='{{.State.OOMKilled}}'
  3. Monitor memory usage: docker stats <container> or read cgroup file: cat /sys/fs/cgroup/memory/docker/<container-id>/memory.usage_in_bytes
  4. Check host kernel logs: dmesg | grep -i oom
  5. Review application memory settings: JVM -Xmx and -Xms arguments

Commands

  • View container details: docker inspect <container>
  • View logs: docker logs <container>
  • Real-time stats: docker stats --no-stream
  • Adjust memory limits and restart: docker update --memory 512m --memory-swap 512m <container> then docker restart <container>

Risk Controls

  • Set proper limits: Use --memory and --memory-swap (set equal to disable swap)
  • Memory reservation: Use --memory-reservation for soft limits
  • Monitoring: Integrate with Prometheus, set alerts for memory usage > 80%
  • Production best practices: Use Kubernetes ResourceQuota and LimitRange

Rollback

  • Revert to previous image: docker rollback <container> or redeploy old image
  • Restore original resource limits: docker update --memory <original> --memory-swap <original> <container>

Verification

  • Container stays Up for extended period
  • docker stats shows memory usage below set limits
  • No new OOM entries in dmesg

When to Submit an OpsGlobal Ticket

  • Persistent OOMKilled despite correct resource limits
  • Host-level memory exhaustion affecting multiple containers
  • Need assistance tuning application memory (e.g., JVM heap)
  • Kernel-level container runtime errors (e.g., shim failures, segfaults)

Use cases

Useful for teams handling DevOps issues and needing a clear troubleshooting and delivery workflow.

Problem background

A practical guide to diagnosing and resolving common Docker container runtime issues, with a focus on OOMKilled (exit code 137). Includes step-by-step diagnosis, commands, risk controls, rollback, and verification.

Troubleshooting steps

Confirm impact and recent changes, collect logs, configuration and metrics, then apply fixes from low to high risk.

Command examples

Replace sample resource names with real values and store passwords, tokens and keys in environment variables.

Risks

Before production changes, confirm backups, access boundaries, change windows and rollback paths.

Rollback plan

Keep original configuration and release versions; roll back config, images or database changes if metrics degrade.

Deliverables

Root-cause notes, key commands, remediation steps, verification results and follow-up recommendations.

!

Need help with a similar technical issue?

If your servers, Kubernetes, Docker, CI/CD, databases or monitoring systems have similar issues, submit logs and config files for remote diagnosis.

Ticket Contact on WhatsApp Consult