Scenario
A database administrator reports that backup and recovery operations take too long, affecting production availability. For example, a 200GB MySQL database takes 4 hours with default mysqldump, and a similarly sized PostgreSQL database takes 3 hours with pg_dump. Recovery operations also frequently time out.
Symptoms
- Backup command execution time far exceeds expectations (e.g., backup speed below 50MB/s)
- High disk I/O wait during recovery (iostat shows %util > 90%)
- Increased replication lag (especially during backups)
- Backup file size too large (close to raw data size when uncompressed)
Diagnosis
MySQL
- Examine current backup command:
mysqldump --single-transaction --routines --triggers --events --all-databases > backup.sql - Monitor performance during backup:
pt-query-digest --processlist --interval=1orSHOW FULL PROCESSLIST; - Analyze disk performance:
iostat -x 1to check disk utilization - Check InnoDB buffer pool size:
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests';
PostgreSQL
- Examine backup command:
pg_dump -Fc -j 4 -f backup.dump dbname(-j for parallel jobs) - Use pg_stat_statements to analyze queries:
SELECT query, calls, total_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10; - Check WAL generation rate:
pg_waldump -p /path/to/pg_wal/orSELECT * FROM pg_stat_wal; - Monitor I/O:
iostat -x 1
Commands
Performance-Optimized Backup Commands
MySQL
- Compressed dump:
mysqldump --single-transaction --compress --all-databases > backup.sql.gz - Parallel export using mydumper:
mydumper -h localhost -u root -p password -o /backup --threads=4 - Physical backup with XtraBackup:
xtrabackup --backup --parallel=4 --target-dir=/backup
PostgreSQL
- Parallel pg_dump:
pg_dump -Fd -j 8 -f /backup/dumpdir dbname(directory format) - Physical backup with pg_basebackup:
pg_basebackup -D /backup -X fetch -P -v - Adjust parallelism:
pg_dump -Fc -j $(nproc) -f backup.dump dbname
Recovery Optimization
MySQL
- Point-in-time recovery using mysqlbinlog:
mysqlbinlog --start-datetime="2023-01-01 00:00:00" binlog.000001 | mysql -u root - Disable foreign key checks during import:
SET FOREIGN_KEY_CHECKS=0;then re-enable after.
PostgreSQL
- Parallel restore with pg_restore:
pg_restore -d dbname -j 4 -Fc backup.dump - Use streaming replication for fast recovery: configure standby and enable hot standby.
Risk Controls
- Before backup: Ensure sufficient disk space (recommended 1.5x data size). Use
--single-transactionto avoid table locks (MySQL InnoDB). Setlock_wait_timeoutto prevent long lock waits. - During backup: Monitor system load, avoid full backups during peak hours. Use
niceto lower backup process priority:nice -n 19 mysqldump ... - Before recovery: Backup the target database (if it exists). Test the recovery process in a non-production environment.
- During recovery: Disable auto-commit and index maintenance if possible to speed up recovery.
Rollback
If backup or recovery fails, terminate operations immediately:
- MySQL: Kill the backup process, check file integrity, restore from last successful backup if needed.
- PostgreSQL: Terminate pg_dump or pg_restore process, clean up partial files. Use pg_ctl to restart service after confirming data consistency.
Ensure a recent full backup is available. If corrupted, roll back using a snapshot or physical backup.
Verification
Backup Verification
- MySQL: After
mysqldump --all-databases | gzip > backup.sql.gz, decompress and check tables:gunzip -c backup.sql.gz | grep -i "CREATE TABLE". Usemysqlcheckto check table integrity. - PostgreSQL: Use
pg_restore -l backup.dumpto list contents, orzcat backup.sql.gz | head -n 50. Restore to a test database and runANALYZE.
Recovery Verification
- Compare row counts:
SELECT COUNT(*) FROM table;between original and restored database. - Compute checksums:
mysqldump --all-databases --order-by-primary | md5sumcompare backup and restored output. - Run application tests: ensure connections work and queries return correct results.
When to Submit an OpsGlobal Ticket
- After optimization, backups or recovery still fail to meet SLA (e.g., longer than 4 hours)
- Frequent backup corruption or recovery failures
- Need expert assistance configuring physical backups (e.g., XtraBackup or pg_basebackup) or high-availability architecture
- Suspected underlying storage or network issues requiring infrastructure-level analysis
By following this guide, you can significantly improve database backup and recovery performance, ensuring business continuity.
Use cases
Useful for teams handling Database issues and needing a clear troubleshooting and delivery workflow.
Problem background
Learn how to diagnose and improve backup and recovery performance for MySQL and PostgreSQL. This guide covers common symptoms, diagnostic commands, risk controls, rollback procedures, and verification steps for production environments.
Troubleshooting steps
Confirm impact and recent changes, collect logs, configuration and metrics, then apply fixes from low to high risk.
Command examples
Replace sample resource names with real values and store passwords, tokens and keys in environment variables.
Risks
Before production changes, confirm backups, access boundaries, change windows and rollback paths.
Rollback plan
Keep original configuration and release versions; roll back config, images or database changes if metrics degrade.
Deliverables
Root-cause notes, key commands, remediation steps, verification results and follow-up recommendations.
Need help with a similar technical issue?
If your servers, Kubernetes, Docker, CI/CD, databases or monitoring systems have similar issues, submit logs and config files for remote diagnosis.