Book Consultation Submit Ticket

Optimizing Backup and Recovery Performance for MySQL and PostgreSQL in Kubernetes

A deep dive into diagnosing and improving backup/restore speed for MySQL and PostgreSQL databases running on Kubernetes, with actionable commands and risk controls.

Optimizing Backup and Recovery Performance for MySQL and PostgreSQL in Kubernetes
Database 6min 19 views 2026-06-18
KubernetesSREMySQLPostgreSQLBackup Recovery

Scenario

Production databases often face performance degradation during backup and recovery operations, especially when running on Kubernetes where storage, network, and resource limits add complexity. Backup windows may stretch, and restore times can exceed SLAs.

Symptoms

  • Backup duration consistently exceeds the allowed window (e.g., >4 hours for 500GB).
  • Restore operations are extremely slow; a 500GB backup may take over 6 hours.
  • High I/O wait, low CPU utilization, and high disk latency observed in monitoring tools.
  • Backup tools (pg_dump, XtraBackup) timeout or fail with cryptic errors.

Diagnosis

  1. Check resource constraints: bash kubectl top pods -n database Verify CPU and memory limits are adequate.
  2. Analyze I/O performance: bash iostat -x 1 Look for high %util and await values.
  3. Database internals: - MySQL: sql SHOW ENGINE INNODB STATUS\G; Check adaptive hash index contention and log waits. - PostgreSQL: sql SELECT * FROM pg_stat_activity WHERE state = 'active'; Identify long-running queries potentially locking resources.
  4. Review backup tool logs: - XtraBackup: Check xtrabackup_log for errors. - pg_dump: Use --verbose to see progress.

Commands

MySQL Backup Optimization

  • Parallel backup: bash xtrabackup --backup --parallel=4 --target-dir=/backup
  • Compressed backup: bash xtrabackup --backup --compress --compress-threads=4 --target-dir=/backup
  • Throttling to limit production impact: bash xtrabackup --backup --throttle=100 --target-dir=/backup

PostgreSQL Backup Optimization

  • Directory format for parallelism: bash pg_dump -Fd -j 4 -f /backup mydb
  • Compressed custom format: bash pg_dump -Fc -Z 9 -f /backup/dump.gz mydb
  • Selective backup: Exclude large tables or unnecessary indexes.

Restore Optimization

  • MySQL: Use --apply-log with increased memory: bash xtrabackup --prepare --use-memory=4G --target-dir=/backup
  • PostgreSQL: Increase maintenance_work_mem before restore: sql SET maintenance_work_mem = '1GB'; Use parallel restore with pg_restore -j.

Risk Controls

  • Schedule backups during low traffic.
  • Use I/O throttling to prevent impacting online services.
  • Store backups on dedicated persistent volumes to avoid contention.
  • Always test restore performance in a staging environment first.

Rollback

  • If backup fails due to parameter changes, revert to defaults (e.g., disable parallelism).
  • If a restore yields corrupt data, immediately stop and restore from a known-good snapshot or full backup.

Verification

  • After restore, run consistency checks:
  • MySQL: checksum table or pt-table-checksum.
  • PostgreSQL: pg_checksums or ANALYZE.
  • Validate that business queries return correct results.

When to Submit an OpsGlobal Ticket

  • Backup/restore times persistently exceed SLA (e.g., >4 hours).
  • Data corruption or failed restoration occurs.
  • You need architectural recommendations (e.g., storage class, database configuration tuning).
  • You are uncertain about risk control configurations.

Following these steps, most performance issues can be resolved. If problems persist, OpsGlobal's SRE team can provide deep analysis and optimization.

Use cases

Useful for teams handling Database issues and needing a clear troubleshooting and delivery workflow.

Problem background

A deep dive into diagnosing and improving backup/restore speed for MySQL and PostgreSQL databases running on Kubernetes, with actionable commands and risk controls.

Troubleshooting steps

Confirm impact and recent changes, collect logs, configuration and metrics, then apply fixes from low to high risk.

Command examples

Replace sample resource names with real values and store passwords, tokens and keys in environment variables.

Risks

Before production changes, confirm backups, access boundaries, change windows and rollback paths.

Rollback plan

Keep original configuration and release versions; roll back config, images or database changes if metrics degrade.

Deliverables

Root-cause notes, key commands, remediation steps, verification results and follow-up recommendations.

!

Need help with a similar technical issue?

If your servers, Kubernetes, Docker, CI/CD, databases or monitoring systems have similar issues, submit logs and config files for remote diagnosis.

Ticket Contact on WhatsApp Consult