Scenario
An e-commerce platform uses Nginx as an API gateway for routing and rate limiting. During a flash sale, traffic spikes cause high latency and request timeouts.
Symptoms
- API response time jumps from ~50ms to >500ms;
- 504 errors exceed 5% of total requests;
- Nginx error log shows repeated
upstream timed outmessages.
Diagnosis
- Check worker processes:
ps aux | grep nginx– ensure workers == CPU cores. - Connection stats: Enable stub_status and query
curl http://localhost/nginx_statusfor active connections, waiting queue. - Upstream response time: Parse access log for upstream response time:
tail -f /var/log/nginx/access.log | awk '{print $NF}'. - System resources:
top -bn1 | head -20for CPU/memory;vmstat 1 10for context switches.
Tuning Commands
Modify /etc/nginx/nginx.conf:
events {
worker_connections 4096; # Increase based on ulimit -n
use epoll; # High-performance event model on Linux
multi_accept on; # Accept all new connections at once
}
http {
keepalive_timeout 65;
keepalive_requests 1000; # Max requests per keepalive connection
proxy_http_version 1.1;
proxy_set_header Connection "";
upstream backend {
keepalive 32; # Connection pool to upstream
server 10.0.0.1:8080;
}
server {
location /api/ {
proxy_pass http://backend;
proxy_buffer_size 8k; # Increase buffers to reduce disk I/O
proxy_buffers 8 8k;
proxy_busy_buffers_size 16k;
}
}
}
Test: nginx -t; reload: nginx -s reload.
Risk Controls
- Perform changes during low traffic;
- Backup config:
cp /etc/nginx/nginx.conf /etc/nginx/nginx.conf.bak; - Adjust parameters incrementally, monitor dashboards;
- Prepare rollback script.
Rollback
If issues occur after reload:
cp /etc/nginx/nginx.conf.bak /etc/nginx/nginx.conf
nginx -t && nginx -s reload
Verification
- Load test:
ab -n 10000 -c 200 https://your-api.example.com/api/– measure latency and error rate. - Real-time monitoring: dump config with
nginx -Tand tail access log to observe upstream times. - Compare before/after: latency, throughput, error percentage.
When to Submit an OpsGlobal Ticket
- Performance targets still not met after tuning;
- Need kernel parameter adjustments (e.g., net.core.somaxconn) or OS-level optimization;
- Suspect Nginx bug or require custom modules;
- Lack production change window or need dedicated SLA.
Use cases
Useful for teams handling Performance issues and needing a clear troubleshooting and delivery workflow.
Problem background
A deep practical guide on tuning Nginx as an API gateway for performance, covering scenario identification, symptom analysis, diagnosis, configuration commands, risk controls, rollback, verification, and when to escalate to OpsGlobal.
Troubleshooting steps
Confirm impact and recent changes, collect logs, configuration and metrics, then apply fixes from low to high risk.
Command examples
Replace sample resource names with real values and store passwords, tokens and keys in environment variables.
Risks
Before production changes, confirm backups, access boundaries, change windows and rollback paths.
Rollback plan
Keep original configuration and release versions; roll back config, images or database changes if metrics degrade.
Deliverables
Root-cause notes, key commands, remediation steps, verification results and follow-up recommendations.
Need help with a similar technical issue?
If your servers, Kubernetes, Docker, CI/CD, databases or monitoring systems have similar issues, submit logs and config files for remote diagnosis.