Book Consultation Submit Ticket

Mastering Cloud Capacity Autoscaling: Balancing Performance and Cost in Kubernetes

A practical guide to implementing intelligent autoscaling strategies that optimize both performance and cost, with real-world commands and risk controls.

Mastering Cloud Capacity Autoscaling: Balancing Performance and Cost in Kubernetes
Cloud Migration 6min 4 views 2026-06-20
KubernetesSRECost Optimization

Scenario

A company migrated its e-commerce platform to a cloud-native Kubernetes environment. Initially, node and Pod counts were manually configured, leading to performance issues during peak times and resource waste during off-peak hours. Monthly cloud bills increased by over 30%.

Symptoms

  • Cloud bills continuously rise, yet average CPU/memory utilization is below 30%.
  • User response times exceed 5 seconds during peak hours, with occasional timeouts.
  • Fixed number of nodes, Pod requests frequently hit rate limits.

Diagnosis

  1. Use kubectl top pods and kubectl top nodes to view resource usage.
  2. Analyze cluster autoscaler logs: kubectl logs -n kube-system cluster-autoscaler.
  3. Check HorizontalPodAutoscaler configuration: kubectl describe hpa <hpa-name>.

Commands

Configure HorizontalPodAutoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Apply: kubectl apply -f hpa.yaml.

Configure Cluster Autoscaler (assuming AWS)

Ensure node groups have min/max sizes set. Annotate the auto-scaling group:

kubectl annotate nodegroup <nodegroup-name> cluster-autoscaler.kubernetes.io/min-size=2
kubectl annotate nodegroup <nodegroup-name> cluster-autoscaler.kubernetes.io/max-size=20

Use VerticalPodAutoscaler to optimize resource requests

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/vertical-pod-autoscaler/deploy/vpa-v1-crd.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/vertical-pod-autoscaler/deploy/recommender-deployment.yaml

Then create a VPA:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 1
        memory: 512Mi

Risk Controls

  • Use PodDisruptionBudget to ensure availability:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-app
  • Set HPA cool-down periods to avoid thrashing: add behavior field in HPA.
  • Use Spot instances to reduce compute cost, but implement proper interruption handling (e.g., priority scheduling).

Rollback

  1. Delete or modify HPA/VPA config: kubectl delete hpa web-app-hpa.
  2. Revert node group to original sizes.
  3. Monitor metrics until stable.

Verification

  • Check Pod count changes with load: kubectl get pods -w.
  • Compare bills using cloud cost tools (e.g., AWS Cost Explorer).
  • Performance metrics: response time <1s, error rate <0.1%.

When to Submit an OpsGlobal Ticket

  • Autoscaling fails to meet SLAs (e.g., still timeout during peaks).
  • Node group fails to scale down (residual Spot interruption issues).
  • Need complex cost allocation or budgeting strategies.

Use cases

Useful for teams handling Cloud Migration issues and needing a clear troubleshooting and delivery workflow.

Problem background

A practical guide to implementing intelligent autoscaling strategies that optimize both performance and cost, with real-world commands and risk controls.

Troubleshooting steps

Confirm impact and recent changes, collect logs, configuration and metrics, then apply fixes from low to high risk.

Command examples

Replace sample resource names with real values and store passwords, tokens and keys in environment variables.

Risks

Before production changes, confirm backups, access boundaries, change windows and rollback paths.

Rollback plan

Keep original configuration and release versions; roll back config, images or database changes if metrics degrade.

Deliverables

Root-cause notes, key commands, remediation steps, verification results and follow-up recommendations.

!

Need help with a similar technical issue?

If your servers, Kubernetes, Docker, CI/CD, databases or monitoring systems have similar issues, submit logs and config files for remote diagnosis.

Ticket Contact on WhatsApp Consult