Di episode ini kita akan coba bahas Kubernetes Horizontal Pod Autoscaler (HPA) untuk automatic scaling. Kita akan mempelajari bagaimana HPA work, cara configure autoscaling based on CPU, memory, dan custom metric, dan best practice untuk production autoscaling.

Catatan
Untuk kalian yang ingin membaca episode sebelumnya, bisa click thumbnail episode 29 di bawah ini
Di episode sebelumnya kita sudah belajar tentang computational resource - CPU dan memory request dan limit. Selanjutnya di episode 30 kali ini, kita akan coba bahas Horizontal Pod Autoscaler (HPA), yang automatically scale number Pod based on observed metric seperti CPU utilization, memory usage, atau custom metric.
Catatan: Disini saya akan menggunakan Kubernetes Cluster yang di install melalui K3s.
Manual scaling work untuk predictable workload, tapi real-world application face variable traffic pattern. HPA automatically adjust replica count untuk match demand, ensuring optimal resource utilization dan application performance tanpa manual intervention.
Horizontal Pod Autoscaler (HPA) automatically scale number Pod di Deployment, ReplicaSet, atau StatefulSet based on observed metric.
Bayangkan HPA seperti automatic thermostat - dia monitor temperature (metric) dan adjust heating/cooling (Pod count) untuk maintain desired comfort level kalian (target utilization). Kalian set target, HPA handle adjustment.
Karakteristik kunci HPA:
HPA solve critical scaling challenge:
Tanpa HPA, kalian either over-provision (wasting money) atau under-provision (risking outage).
HPA run control loop setiap 15 detik (default):
HPA gunakan formula ini untuk calculate desired replica:
desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]Contoh:
desiredReplicas = ceil[3 * (80 / 50)] = ceil[4.8] = 5HPA scale dari 3 ke 5 replica.
HPA require Metrics Server untuk collect resource metric:
# Check jika Metrics Server installed
kubectl get deployment metrics-server -n kube-systemJika belum installed:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlImportant
Penting: HPA tidak bisa function tanpa Metrics Server. Install dulu sebelum create HPA resource.
Sebelum create HPA, ensure:
Create Deployment dulu:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 2
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80
resources:
requests:
cpu: "100m" # Required untuk HPA
memory: "128Mi"
limits:
cpu: "200m"
memory: "256Mi"Create HPA:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70Apply:
kubectl apply -f web-deployment.yml
kubectl apply -f web-hpa.ymlBehavior:
Create HPA menggunakan kubectl:
kubectl autoscale deployment web-app --cpu-percent=70 --min=2 --max=10Ini create HPA yang sama dengan YAML di atas.
kubectl get hpaOutput:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web-app-hpa Deployment/web-app 45%/70% 2 10 2 5mColumn:
Detailed view:
kubectl describe hpa web-app-hpaScale based on memory utilization:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: memory-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80Behavior:
Scale based on CPU OR memory (whichever trigger first):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: multi-metric-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80Behavior:
Scale based on application-specific metric (require custom metric adapter):
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-metric-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "1000"Use case:
Catatan
Custom metric require additional setup seperti Prometheus Adapter atau custom metric API server.
Control scaling behavior dengan v2 API:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: behavior-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Wait 5 min sebelum scale down
policies:
- type: Percent
value: 50 # Max 50% dari current replica
periodSeconds: 60 # Per menit
- type: Pods
value: 2 # Max 2 Pod
periodSeconds: 60 # Per menit
selectPolicy: Min # Gunakan most conservative policy
scaleUp:
stabilizationWindowSeconds: 0 # Scale up immediately
policies:
- type: Percent
value: 100 # Max 100% dari current replica
periodSeconds: 15 # Per 15 detik
- type: Pods
value: 4 # Max 4 Pod
periodSeconds: 15 # Per 15 detik
selectPolicy: Max # Gunakan most aggressive policyScale Down Behavior:
Scale Up Behavior:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "200m"
memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
name: web-service
spec:
selector:
app: web
ports:
- port: 80
targetPort: 80
type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70apiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
spec:
replicas: 2
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: myapi:latest
ports:
- containerPort: 8080
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-service-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
minReplicas: 2
maxReplicas: 15
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 75apiVersion: apps/v1
kind: Deployment
metadata:
name: worker
spec:
replicas: 5
selector:
matchLabels:
app: worker
template:
metadata:
labels:
app: worker
spec:
containers:
- name: worker
image: myworker:latest
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "400m"
memory: "512Mi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: worker-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: worker
minReplicas: 5
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # 10 menit
policies:
- type: Pods
value: 5
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Pods
value: 10
periodSeconds: 30Create load generator untuk test HPA:
# Run load generator
kubectl run -it --rm load-generator --image=busybox:1.36 --restart=Never -- /bin/sh
# Inside container, generate load
while true; do wget -q -O- http://web-service; doneDi terminal lain, watch HPA:
kubectl get hpa web-app-hpa --watchOutput show scaling:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web-app-hpa Deployment/web-app 45%/70% 2 10 2 1m
web-app-hpa Deployment/web-app 85%/70% 2 10 2 2m
web-app-hpa Deployment/web-app 85%/70% 2 10 3 2m
web-app-hpa Deployment/web-app 78%/70% 2 10 3 3m
web-app-hpa Deployment/web-app 72%/70% 2 10 4 3m
web-app-hpa Deployment/web-app 68%/70% 2 10 4 4mWatch Pod scaling:
kubectl get pods -l app=web --watchStop load generator (Ctrl+C), watch HPA scale down:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
web-app-hpa Deployment/web-app 68%/70% 2 10 4 5m
web-app-hpa Deployment/web-app 35%/70% 2 10 4 6m
web-app-hpa Deployment/web-app 35%/70% 2 10 4 11m
web-app-hpa Deployment/web-app 35%/70% 2 10 3 11m
web-app-hpa Deployment/web-app 35%/70% 2 10 2 12mPerhatikan 5-minute stabilization window sebelum scale down.
Tanpa explicit behavior configuration:
Scale Up:
Scale Down:
Slow, steady scaling:
behavior:
scaleDown:
stabilizationWindowSeconds: 600
policies:
- type: Pods
value: 1
periodSeconds: 120
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60Fast response ke load:
behavior:
scaleDown:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 50
periodSeconds: 30
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 200
periodSeconds: 15Hanya scale up, never down:
behavior:
scaleDown:
selectPolicy: DisabledProblem: HPA tidak bisa calculate utilization tanpa request.
# Bad: No request
containers:
- name: app
image: myapp:latest
# No resources specifiedSolusi: Selalu define request:
# Good: Request defined
containers:
- name: app
image: myapp:latest
resources:
requests:
cpu: "100m"
memory: "128Mi"Problem: HPA show "unknown" untuk metric.
kubectl get hpa
# TARGETS: <unknown>/70%Solusi: Install Metrics Server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yamlProblem: Constant scaling up dan down (flapping).
# Bad: 30% terlalu rendah
target:
type: Utilization
averageUtilization: 30Solusi: Gunakan reasonable target (60-80%):
# Good: 70% allow headroom
target:
type: Utilization
averageUtilization: 70Problem: HPA tidak bisa scale.
# Bad: No scaling range
minReplicas: 5
maxReplicas: 5Solusi: Provide scaling range:
# Good: Bisa scale 2-10
minReplicas: 2
maxReplicas: 10Problem: Manually scaling Deployment conflict dengan HPA.
# Bad: Manual scale conflict dengan HPA
kubectl scale deployment web-app --replicas=20Solusi: Let HPA manage replica, atau delete HPA:
# Either delete HPA
kubectl delete hpa web-app-hpa
# Atau let HPA manage scaling
# (Jangan manually scale)CPU target:
Memory target:
# Consider:
# - Minimum untuk availability (min >= 2)
# - Maximum untuk cost control
# - Expected traffic pattern
minReplicas: 3 # High availability
maxReplicas: 20 # Cost limitPrevent rapid scaling oscillation:
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # 5 menit
scaleUp:
stabilizationWindowSeconds: 0 # ImmediateTrack HPA behavior:
# Watch HPA
kubectl get hpa --watch
# View event
kubectl describe hpa web-app-hpa
# Check metric
kubectl top podsPrevent runaway scaling:
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
spec:
hard:
requests.cpu: "50"
requests.memory: "100Gi"
pods: "100"Selalu test sebelum production:
Add annotation:
metadata:
annotations:
hpa.note: "Target 70% based on load testing, scale 2-20 replica"kubectl get hpa
# TARGETS: <unknown>/70%Cause:
Solusi:
# Check Metrics Server
kubectl get deployment metrics-server -n kube-system
# Check Pod request
kubectl get deployment web-app -o yaml | grep -A 5 resources
# Check Pod status
kubectl get pods -l app=webkubectl describe hpa web-app-hpa
# Event show: "unable to get metrics"Solusi:
# Verify target exist
kubectl get deployment web-app
# Check HPA configuration
kubectl get hpa web-app-hpa -o yaml
# View HPA event
kubectl describe hpa web-app-hpaHPA scale up dan down repeatedly.
Solusi:
behavior:
scaleDown:
stabilizationWindowSeconds: 600 # Increase dari 300kubectl get hpa
# REPLICAS: 10 (at maxReplicas)
# TARGETS: 95%/70% (still high)Solusi:
Gunakan together:
kubectl get hpa
kubectl get hpa -o wide
kubectl get hpa --all-namespaceskubectl describe hpa web-app-hpakubectl get hpa web-app-hpa -o yamlkubectl get hpa web-app-hpa --watchkubectl top pods -l app=web
kubectl top nodeskubectl delete hpa web-app-hpaDeployment continue running dengan current replica count.
Pada episode 30 ini, kita telah membahas Horizontal Pod Autoscaler (HPA) di Kubernetes secara mendalam. Kita sudah belajar bagaimana HPA automatically scale Pod based on metric, cara configure autoscaling policy, dan best practice untuk production use.
Key takeaway:
Horizontal Pod Autoscaler essential untuk running dynamic, cost-efficient workload di Kubernetes. Dengan memahami HPA configuration dan behavior, kalian bisa ensure application automatically scale untuk meet demand sambil optimizing resource utilization dan cost.
Bagaimana, makin jelas kan tentang Horizontal Pod Autoscaler di Kubernetes? Jadi, pastikan tetap semangat belajar dan nantikan episode selanjutnya!
Catatan
Untuk kalian yang ingin melanjutkan ke episode selanjutnya, bisa click thumbnail episode 31 di bawah ini