Cloud Capacity Autoscaling Kubernetes Performance Cost Optimization Guide

Mastering Cloud Capacity Autoscaling: Balancing Performance and Cost in Kubernetes

Cloud Migration 6min 4 views 2026-06-20

KubernetesSRECost Optimization

Scenario

A company migrated its e-commerce platform to a cloud-native Kubernetes environment. Initially, node and Pod counts were manually configured, leading to performance issues during peak times and resource waste during off-peak hours. Monthly cloud bills increased by over 30%.

Symptoms

Cloud bills continuously rise, yet average CPU/memory utilization is below 30%.
User response times exceed 5 seconds during peak hours, with occasional timeouts.
Fixed number of nodes, Pod requests frequently hit rate limits.

Diagnosis

Use kubectl top pods and kubectl top nodes to view resource usage.
Analyze cluster autoscaler logs: kubectl logs -n kube-system cluster-autoscaler.
Check HorizontalPodAutoscaler configuration: kubectl describe hpa <hpa-name>.

Commands

Configure HorizontalPodAutoscaler (HPA)

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Apply: kubectl apply -f hpa.yaml.

Configure Cluster Autoscaler (assuming AWS)

Ensure node groups have min/max sizes set. Annotate the auto-scaling group:

kubectl annotate nodegroup <nodegroup-name> cluster-autoscaler.kubernetes.io/min-size=2
kubectl annotate nodegroup <nodegroup-name> cluster-autoscaler.kubernetes.io/max-size=20

Use VerticalPodAutoscaler to optimize resource requests

kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/vertical-pod-autoscaler/deploy/vpa-v1-crd.yaml
kubectl apply -f https://raw.githubusercontent.com/kubernetes/autoscaler/master/vertical-pod-autoscaler/deploy/recommender-deployment.yaml

Then create a VPA:

apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
spec:
  targetRef:
    apiVersion: "apps/v1"
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: '*'
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 1
        memory: 512Mi

Risk Controls

Use PodDisruptionBudget to ensure availability:

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: web-app

Set HPA cool-down periods to avoid thrashing: add behavior field in HPA.
Use Spot instances to reduce compute cost, but implement proper interruption handling (e.g., priority scheduling).

Rollback

Delete or modify HPA/VPA config: kubectl delete hpa web-app-hpa.
Revert node group to original sizes.
Monitor metrics until stable.

Verification

Check Pod count changes with load: kubectl get pods -w.
Compare bills using cloud cost tools (e.g., AWS Cost Explorer).
Performance metrics: response time <1s, error rate <0.1%.

When to Submit an OpsGlobal Ticket

Autoscaling fails to meet SLAs (e.g., still timeout during peaks).
Node group fails to scale down (residual Spot interruption issues).
Need complex cost allocation or budgeting strategies.

Use cases

Useful for teams handling Cloud Migration issues and needing a clear troubleshooting and delivery workflow.

Problem background

A practical guide to implementing intelligent autoscaling strategies that optimize both performance and cost, with real-world commands and risk controls.

Troubleshooting steps

Confirm impact and recent changes, collect logs, configuration and metrics, then apply fixes from low to high risk.

Command examples

Replace sample resource names with real values and store passwords, tokens and keys in environment variables.

Risks

Before production changes, confirm backups, access boundaries, change windows and rollback paths.

Rollback plan

Keep original configuration and release versions; roll back config, images or database changes if metrics degrade.

Deliverables

Root-cause notes, key commands, remediation steps, verification results and follow-up recommendations.

Related service CTA

If you are facing a similar Mastering Cloud Capacity Autoscaling: Balancing Performance and Cost in Kubernetes issue, submit a ticket for remote OpsGlobal support.

Need help with a similar technical issue?

If your servers, Kubernetes, Docker, CI/CD, databases or monitoring systems have similar issues, submit logs and config files for remote diagnosis.

Submit Incident Ticket Book Technical Consultation

Book Technical Consultation Back to Blog