In this episode, we'll discuss Kubernetes Horizontal Pod Autoscaler (HPA) for automatic scaling. We'll learn how HPA works, how to configure autoscaling based on CPU, memory, and custom metrics, and best practices for production autoscaling.

Note
If you want to read the previous episode, you can click the Episode 29 thumbnail below
In the previous episode, we learned about computational resources - CPU and memory requests and limits. In episode 30, we'll discuss Horizontal Pod Autoscaler (HPA), which automatically scales the number of Pods based on observed metrics like CPU utilization, memory usage, or custom metrics.
Note: Here I'll be using a Kubernetes Cluster installed through K3s.
Manual scaling works for predictable workloads, but real-world applications face variable traffic patterns. HPA automatically adjusts replica counts to match demand, ensuring optimal resource utilization and application performance without manual intervention.
Horizontal Pod Autoscaler (HPA) automatically scales the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed metrics.
Think of HPA like an automatic thermostat - it monitors temperature (metrics) and adjusts heating/cooling (Pod count) to maintain your desired comfort level (target utilization). You set the target, HPA handles the adjustments.
Key characteristics of HPA:
HPA solves critical scaling challenges:
Without HPA, you either over-provision (wasting money) or under-provision (risking outages).
HPA runs a control loop every 15 seconds (default):
HPA uses this formula to calculate desired replicas:
desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]Example:
desiredReplicas = ceil[3 * (80 / 50)] = ceil[4.8] = 5HPA scales from 3 to 5 replicas.
HPA requires Metrics Server to collect resource metrics:
# Check if Metrics Server is installed
kubectl get deployment metrics-server -n kube-systemIf not installed:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlImportant
Important: HPA cannot function without Metrics Server. Install it before creating HPA resources.
Before creating HPA, ensure:
Create a Deployment first:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 2
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80
resources:
requests:
cpu: "100m" # Required for HPA
memory: "128Mi"
limits:
cpu: "200m"
memory: "256Mi"Create HPA:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70Apply:
kubectl apply -f web-deployment.yml
kubectl apply -f web-hpa.ymlBehavior:
Create HPA using kubectl:
kubectl autoscale deployment web-app --cpu-percent=70 --min=2 --max=10This creates the same HPA as the YAML above.
kubectl get hpaOutput:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web-app-hpa Deployment/web-app 45%/70% 2 10 2 5mColumns:
Detailed view:
kubectl describe hpa web-app-hpaScale based on memory utilization:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: memory-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80Behavior:
Scale based on CPU OR memory (whichever triggers first):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: multi-metric-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80Behavior:
Scale based on application-specific metrics (requires custom metrics adapter):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metric-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"Use cases:
Note
Custom metrics require additional setup like Prometheus Adapter or custom metrics API server.
Control scaling behavior with v2 API:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: behavior-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min before scale down
policies:
- type: Percent
value: 50 # Max 50% of current replicas
periodSeconds: 60 # Per minute
- type: Pods
value: 2 # Max 2 Pods
periodSeconds: 60 # Per minute
selectPolicy: Min # Use most conservative policy
scaleUp:
stabilizationWindowSeconds: 0 # Scale up immediately
policies:
- type: Percent
value: 100 # Max 100% of current replicas
periodSeconds: 15 # Per 15 seconds
- type: Pods
value: 4 # Max 4 Pods
periodSeconds: 15 # Per 15 seconds
selectPolicy: Max # Use most aggressive policyScale Down Behavior:
Scale Up Behavior:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "200m"
memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
name: web-service
spec:
selector:
app: web
ports:
- port: 80
targetPort: 80
type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
spec:
replicas: 2
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: myapi:latest
ports:
- containerPort: 8080
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 2
maxReplicas: 15
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75apiVersion: apps/v1
kind: Deployment
metadata:
name: worker
spec:
replicas: 5
selector:
matchLabels:
app: worker
template:
metadata:
labels:
app: worker
spec:
containers:
- name: worker
image: myworker:latest
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "400m"
memory: "512Mi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: worker
minReplicas: 5
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # 10 minutes
policies:
- type: Pods
value: 5
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Pods
value: 10
periodSeconds: 30Create a load generator to test HPA:
# Run load generator
kubectl run -it --rm load-generator --image=busybox:1.36 --restart=Never -- /bin/sh
# Inside the container, generate load
while true; do wget -q -O- http://web-service; doneIn another terminal, watch HPA:
kubectl get hpa web-app-hpa --watchOutput shows scaling:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web-app-hpa Deployment/web-app 45%/70% 2 10 2 1m
web-app-hpa Deployment/web-app 85%/70% 2 10 2 2m
web-app-hpa Deployment/web-app 85%/70% 2 10 3 2m
web-app-hpa Deployment/web-app 78%/70% 2 10 3 3m
web-app-hpa Deployment/web-app 72%/70% 2 10 4 3m
web-app-hpa Deployment/web-app 68%/70% 2 10 4 4mWatch Pods scaling:
kubectl get pods -l app=web --watchStop the load generator (Ctrl+C), watch HPA scale down:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web-app-hpa Deployment/web-app 68%/70% 2 10 4 5m
web-app-hpa Deployment/web-app 35%/70% 2 10 4 6m
web-app-hpa Deployment/web-app 35%/70% 2 10 4 11m
web-app-hpa Deployment/web-app 35%/70% 2 10 3 11m
web-app-hpa Deployment/web-app 35%/70% 2 10 2 12mNote the 5-minute stabilization window before scale down.
Without explicit behavior configuration:
Scale Up:
Scale Down:
Slow, steady scaling:
behavior:
scaleDown:
stabilizationWindowSeconds: 600
policies:
- type: Pods
value: 1
periodSeconds: 120
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60Fast response to load:
behavior:
scaleDown:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 30
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 200
periodSeconds: 15Only scale up, never down:
behavior:
scaleDown:
selectPolicy: DisabledProblem: HPA cannot calculate utilization without requests.
# Bad: No requests
containers:
- name: app
image: myapp:latest
# No resources specifiedSolution: Always define requests:
# Good: Requests defined
containers:
- name: app
image: myapp:latest
resources:
requests:
cpu: "100m"
memory: "128Mi"Problem: HPA shows "unknown" for metrics.
kubectl get hpa
# TARGETS: <unknown>/70%Solution: Install Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlProblem: Constant scaling up and down (flapping).
# Bad: 30% is too low
target:
type: Utilization
averageUtilization: 30Solution: Use reasonable targets (60-80%):
# Good: 70% allows headroom
target:
type: Utilization
averageUtilization: 70Problem: HPA cannot scale.
# Bad: No scaling range
minReplicas: 5
maxReplicas: 5Solution: Provide scaling range:
# Good: Can scale 2-10
minReplicas: 2
maxReplicas: 10Problem: Manually scaling Deployment conflicts with HPA.
# Bad: Manual scale conflicts with HPA
kubectl scale deployment web-app --replicas=20Solution: Let HPA manage replicas, or delete HPA:
# Either delete HPA
kubectl delete hpa web-app-hpa
# Or let HPA manage scaling
# (Don't manually scale)CPU targets:
Memory targets:
# Consider:
# - Minimum for availability (min >= 2)
# - Maximum for cost control
# - Expected traffic patterns
minReplicas: 3 # High availability
maxReplicas: 20 # Cost limitPrevent rapid scaling oscillations:
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5 minutes
scaleUp:
stabilizationWindowSeconds: 0 # ImmediateTrack HPA behavior:
# Watch HPA
kubectl get hpa --watch
# View events
kubectl describe hpa web-app-hpa
# Check metrics
kubectl top podsPrevent runaway scaling:
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
spec:
hard:
requests.cpu: "50"
requests.memory: "100Gi"
pods: "100"Always test before production:
Add annotations:
metadata:
annotations:
hpa.note: "Target 70% based on load testing, scales 2-20 replicas"kubectl get hpa
# TARGETS: <unknown>/70%Causes:
Solutions:
# Check Metrics Server
kubectl get deployment metrics-server -n kube-system
# Check Pod requests
kubectl get deployment web-app -o yaml | grep -A 5 resources
# Check Pod status
kubectl get pods -l app=webkubectl describe hpa web-app-hpa
# Events show: "unable to get metrics"Solutions:
# Verify target exists
kubectl get deployment web-app
# Check HPA configuration
kubectl get hpa web-app-hpa -o yaml
# View HPA events
kubectl describe hpa web-app-hpaHPA scales up and down repeatedly.
Solutions:
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # Increase from 300kubectl get hpa
# REPLICAS: 10 (at maxReplicas)
# TARGETS: 95%/70% (still high)Solutions:
Use together:
kubectl get hpa
kubectl get hpa -o wide
kubectl get hpa --all-namespaceskubectl describe hpa web-app-hpakubectl get hpa web-app-hpa -o yamlkubectl get hpa web-app-hpa --watchkubectl top pods -l app=web
kubectl top nodeskubectl delete hpa web-app-hpaDeployment continues running with current replica count.
In episode 30, we've explored Horizontal Pod Autoscaler (HPA) in Kubernetes in depth. We've learned how HPA automatically scales Pods based on metrics, how to configure autoscaling policies, and best practices for production use.
Key takeaways:
Horizontal Pod Autoscaler is essential for running dynamic, cost-efficient workloads in Kubernetes. By understanding HPA configuration and behavior, you can ensure your applications automatically scale to meet demand while optimizing resource utilization and costs.
Are you getting a clearer understanding of Horizontal Pod Autoscaler in Kubernetes? Keep your learning momentum going and look forward to the next episode!
Note
If you want to continue to the next episode, you can click the Episode 31 thumbnail below