Belajar Kubernetes - Episode 30 - Pengenalan dan Penjelasan Horizontal Pod Autoscaler

Belajar Kubernetes - Episode 30 - Pengenalan dan Penjelasan Horizontal Pod Autoscaler

Di episode ini kita akan coba bahas Kubernetes Horizontal Pod Autoscaler (HPA) untuk automatic scaling. Kita akan mempelajari bagaimana HPA work, cara configure autoscaling based on CPU, memory, dan custom metric, dan best practice untuk production autoscaling.

Arman Dwi Pangestu
Arman Dwi PangestuApril 5, 2026
0 views
9 min read

Pendahuluan

Catatan

Untuk kalian yang ingin membaca episode sebelumnya, bisa click thumbnail episode 29 di bawah ini

Episode 29Episode 29

Di episode sebelumnya kita sudah belajar tentang computational resource - CPU dan memory request dan limit. Selanjutnya di episode 30 kali ini, kita akan coba bahas Horizontal Pod Autoscaler (HPA), yang automatically scale number Pod based on observed metric seperti CPU utilization, memory usage, atau custom metric.

Catatan: Disini saya akan menggunakan Kubernetes Cluster yang di install melalui K3s.

Manual scaling work untuk predictable workload, tapi real-world application face variable traffic pattern. HPA automatically adjust replica count untuk match demand, ensuring optimal resource utilization dan application performance tanpa manual intervention.

Apa Itu Horizontal Pod Autoscaler?

Horizontal Pod Autoscaler (HPA) automatically scale number Pod di Deployment, ReplicaSet, atau StatefulSet based on observed metric.

Bayangkan HPA seperti automatic thermostat - dia monitor temperature (metric) dan adjust heating/cooling (Pod count) untuk maintain desired comfort level kalian (target utilization). Kalian set target, HPA handle adjustment.

Karakteristik kunci HPA:

  • Automatic scaling - Adjust replica tanpa manual intervention
  • Metric-based - Scale based on CPU, memory, atau custom metric
  • Configurable target - Set desired utilization threshold
  • Min/max bound - Define scaling limit
  • Cool-down period - Prevent rapid scaling oscillation
  • Multiple metric - Scale based on multiple condition
  • Work dengan Deployment - Compatible dengan standard workload

Kenapa Gunakan HPA?

HPA solve critical scaling challenge:

  • Handle traffic spike - Automatically add Pod during high load
  • Cost optimization - Scale down during low traffic period
  • Maintain performance - Keep response time consistent
  • Reduce manual work - No need to watch metric constantly
  • Predictable behavior - Consistent scaling logic
  • Resource efficiency - Right-size capacity automatically
  • 24/7 operation - Scale even ketika kalian sleeping
  • Business continuity - Handle unexpected load gracefully

Tanpa HPA, kalian either over-provision (wasting money) atau under-provision (risking outage).

Bagaimana HPA Bekerja

HPA Control Loop

HPA run control loop setiap 15 detik (default):

  1. Query metric - Fetch current resource usage dari Metrics Server
  2. Calculate desired replica - Compare current vs target utilization
  3. Scale jika needed - Adjust replica count jika threshold exceeded
  4. Wait untuk stabilization - Cool-down sebelum next scaling decision

Scaling Algorithm

HPA gunakan formula ini untuk calculate desired replica:

plaintext
desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]

Contoh:

  • Current replica: 3
  • Current CPU usage: 80%
  • Target CPU usage: 50%
plaintext
desiredReplicas = ceil[3 * (80 / 50)] = ceil[4.8] = 5

HPA scale dari 3 ke 5 replica.

Metrics Server Requirement

HPA require Metrics Server untuk collect resource metric:

Kubernetesbash
# Check jika Metrics Server installed
kubectl get deployment metrics-server -n kube-system

Jika belum installed:

Kubernetesbash
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Important

Penting: HPA tidak bisa function tanpa Metrics Server. Install dulu sebelum create HPA resource.

Membuat HPA

Prerequisite

Sebelum create HPA, ensure:

  1. Metrics Server installed - Untuk resource metric
  2. Resource request defined - HPA need baseline untuk percentage calculation
  3. Deployment exist - HPA target existing workload

Basic HPA dengan CPU

Create Deployment dulu:

Kubernetesweb-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: web-app
spec:
    replicas: 2
    selector:
        matchLabels:
            app: web
    template:
        metadata:
            labels:
                app: web
        spec:
            containers:
                - name: nginx
                  image: nginx:1.25
                  ports:
                      - containerPort: 80
                  resources:
                      requests:
                          cpu: "100m"      # Required untuk HPA
                          memory: "128Mi"
                      limits:
                          cpu: "200m"
                          memory: "256Mi"

Create HPA:

Kubernetesweb-hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: web-app-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    minReplicas: 2
    maxReplicas: 10
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 70

Apply:

Kubernetesbash
kubectl apply -f web-deployment.yml
kubectl apply -f web-hpa.yml

Behavior:

  • Maintain 2-10 replica
  • Scale up ketika CPU > 70%
  • Scale down ketika CPU < 70%

HPA dengan kubectl

Create HPA menggunakan kubectl:

Kubernetesbash
kubectl autoscale deployment web-app --cpu-percent=70 --min=2 --max=10

Ini create HPA yang sama dengan YAML di atas.

Checking HPA Status

Kubernetesbash
kubectl get hpa

Output:

Kubernetesbash
NAME          REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
web-app-hpa   Deployment/web-app   45%/70%   2         10        2          5m

Column:

  • TARGETS: Current/Target utilization
  • REPLICAS: Current number Pod
  • MINPODS/MAXPODS: Scaling boundary

Detailed view:

Kubernetesbash
kubectl describe hpa web-app-hpa

HPA dengan Memory

Scale based on memory utilization:

Kubernetesmemory-hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: memory-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    minReplicas: 2
    maxReplicas: 10
    metrics:
        - type: Resource
          resource:
              name: memory
              target:
                  type: Utilization
                  averageUtilization: 80

Behavior:

  • Scale ketika memory usage > 80%
  • Based on memory request di Pod spec

HPA dengan Multiple Metric

Scale based on CPU OR memory (whichever trigger first):

Kubernetesmulti-metric-hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: multi-metric-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    minReplicas: 2
    maxReplicas: 10
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 70
        - type: Resource
          resource:
              name: memory
              target:
                  type: Utilization
                  averageUtilization: 80

Behavior:

  • Scale up jika CPU > 70% OR memory > 80%
  • Gunakan metric requiring most replica
  • Scale down ketika both metric below target

HPA dengan Custom Metric

Scale based on application-specific metric (require custom metric adapter):

Kubernetescustom-metric-hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: custom-metric-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    minReplicas: 2
    maxReplicas: 10
    metrics:
        - type: Pods
          pods:
              metric:
                  name: http_requests_per_second
              target:
                  type: AverageValue
                  averageValue: "1000"

Use case:

  • HTTP request per second
  • Queue length
  • Database connection
  • Custom business metric

Catatan

Custom metric require additional setup seperti Prometheus Adapter atau custom metric API server.

HPA Behavior Configuration

Control scaling behavior dengan v2 API:

Kubernetesbehavior-hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: behavior-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    minReplicas: 2
    maxReplicas: 10
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 70
    behavior:
        scaleDown:
            stabilizationWindowSeconds: 300  # Wait 5 min sebelum scale down
            policies:
                - type: Percent
                  value: 50              # Max 50% dari current replica
                  periodSeconds: 60      # Per menit
                - type: Pods
                  value: 2               # Max 2 Pod
                  periodSeconds: 60      # Per menit
            selectPolicy: Min            # Gunakan most conservative policy
        scaleUp:
            stabilizationWindowSeconds: 0    # Scale up immediately
            policies:
                - type: Percent
                  value: 100             # Max 100% dari current replica
                  periodSeconds: 15      # Per 15 detik
                - type: Pods
                  value: 4               # Max 4 Pod
                  periodSeconds: 15      # Per 15 detik
            selectPolicy: Max            # Gunakan most aggressive policy

Scale Down Behavior:

  • Wait 5 menit sebelum scaling down (stabilization)
  • Remove max 50% Pod OR 2 Pod per menit
  • Gunakan most conservative policy (Min)

Scale Up Behavior:

  • Scale up immediately (no stabilization)
  • Add max 100% Pod OR 4 Pod per 15 detik
  • Gunakan most aggressive policy (Max)

Contoh Praktis

Contoh 1: Web Application

Kubernetesweb-app-complete.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: web-app
spec:
    replicas: 3
    selector:
        matchLabels:
            app: web
    template:
        metadata:
            labels:
                app: web
        spec:
            containers:
                - name: nginx
                  image: nginx:1.25
                  ports:
                      - containerPort: 80
                  resources:
                      requests:
                          cpu: "100m"
                          memory: "128Mi"
                      limits:
                          cpu: "200m"
                          memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
    name: web-service
spec:
    selector:
        app: web
    ports:
        - port: 80
          targetPort: 80
    type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: web-app-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    minReplicas: 3
    maxReplicas: 20
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 70

Contoh 2: API Service dengan Memory Scaling

Kubernetesapi-service-hpa.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: api-service
spec:
    replicas: 2
    selector:
        matchLabels:
            app: api
    template:
        metadata:
            labels:
                app: api
        spec:
            containers:
                - name: api
                  image: myapi:latest
                  ports:
                      - containerPort: 8080
                  resources:
                      requests:
                          cpu: "250m"
                          memory: "256Mi"
                      limits:
                          cpu: "500m"
                          memory: "512Mi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: api-service-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: api-service
    minReplicas: 2
    maxReplicas: 15
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 60
        - type: Resource
          resource:
              name: memory
              target:
                  type: Utilization
                  averageUtilization: 75

Contoh 3: Background Worker

Kubernetesworker-hpa.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: worker
spec:
    replicas: 5
    selector:
        matchLabels:
            app: worker
    template:
        metadata:
            labels:
                app: worker
        spec:
            containers:
                - name: worker
                  image: myworker:latest
                  resources:
                      requests:
                          cpu: "200m"
                          memory: "256Mi"
                      limits:
                          cpu: "400m"
                          memory: "512Mi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: worker-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: worker
    minReplicas: 5
    maxReplicas: 50
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 80
    behavior:
        scaleDown:
            stabilizationWindowSeconds: 600  # 10 menit
            policies:
                - type: Pods
                  value: 5
                  periodSeconds: 60
        scaleUp:
            stabilizationWindowSeconds: 0
            policies:
                - type: Pods
                  value: 10
                  periodSeconds: 30

Testing HPA

Generate Load

Create load generator untuk test HPA:

Kubernetesbash
# Run load generator
kubectl run -it --rm load-generator --image=busybox:1.36 --restart=Never -- /bin/sh
 
# Inside container, generate load
while true; do wget -q -O- http://web-service; done

Watch HPA in Action

Di terminal lain, watch HPA:

Kubernetesbash
kubectl get hpa web-app-hpa --watch

Output show scaling:

Kubernetesbash
NAME          REFERENCE            TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
web-app-hpa   Deployment/web-app   45%/70%    2         10        2          1m
web-app-hpa   Deployment/web-app   85%/70%    2         10        2          2m
web-app-hpa   Deployment/web-app   85%/70%    2         10        3          2m
web-app-hpa   Deployment/web-app   78%/70%    2         10        3          3m
web-app-hpa   Deployment/web-app   72%/70%    2         10        4          3m
web-app-hpa   Deployment/web-app   68%/70%    2         10        4          4m

Watch Pod scaling:

Kubernetesbash
kubectl get pods -l app=web --watch

Stop Load

Stop load generator (Ctrl+C), watch HPA scale down:

Kubernetesbash
NAME          REFERENCE            TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
web-app-hpa   Deployment/web-app   68%/70%    2         10        4          5m
web-app-hpa   Deployment/web-app   35%/70%    2         10        4          6m
web-app-hpa   Deployment/web-app   35%/70%    2         10        4          11m
web-app-hpa   Deployment/web-app   35%/70%    2         10        3          11m
web-app-hpa   Deployment/web-app   35%/70%    2         10        2          12m

Perhatikan 5-minute stabilization window sebelum scale down.

HPA Scaling Policy

Default Behavior

Tanpa explicit behavior configuration:

Scale Up:

  • Immediate (no stabilization)
  • Max 100% increase atau 4 Pod per 15 detik

Scale Down:

  • 5-minute stabilization window
  • Max 50% decrease atau 1 Pod per menit

Conservative Scaling

Slow, steady scaling:

Kubernetesyml
behavior:
    scaleDown:
        stabilizationWindowSeconds: 600
        policies:
            - type: Pods
              value: 1
              periodSeconds: 120
    scaleUp:
        stabilizationWindowSeconds: 60
        policies:
            - type: Pods
              value: 2
              periodSeconds: 60

Aggressive Scaling

Fast response ke load:

Kubernetesyml
behavior:
    scaleDown:
        stabilizationWindowSeconds: 60
        policies:
            - type: Percent
              value: 50
              periodSeconds: 30
    scaleUp:
        stabilizationWindowSeconds: 0
        policies:
            - type: Percent
              value: 200
              periodSeconds: 15

Disable Scale Down

Hanya scale up, never down:

Kubernetesyml
behavior:
    scaleDown:
        selectPolicy: Disabled

Kesalahan Umum dan Pitfall

Kesalahan 1: No Resource Request

Problem: HPA tidak bisa calculate utilization tanpa request.

Kubernetesyml
# Bad: No request
containers:
    - name: app
      image: myapp:latest
      # No resources specified

Solusi: Selalu define request:

Kubernetesyml
# Good: Request defined
containers:
    - name: app
      image: myapp:latest
      resources:
          requests:
              cpu: "100m"
              memory: "128Mi"

Kesalahan 2: Metrics Server Not Installed

Problem: HPA show "unknown" untuk metric.

Kubernetesbash
kubectl get hpa
# TARGETS: <unknown>/70%

Solusi: Install Metrics Server:

Kubernetesbash
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Kesalahan 3: Target Terlalu Rendah

Problem: Constant scaling up dan down (flapping).

Kubernetesyml
# Bad: 30% terlalu rendah
target:
    type: Utilization
    averageUtilization: 30

Solusi: Gunakan reasonable target (60-80%):

Kubernetesyml
# Good: 70% allow headroom
target:
    type: Utilization
    averageUtilization: 70

Kesalahan 4: Min = Max Replica

Problem: HPA tidak bisa scale.

Kubernetesyml
# Bad: No scaling range
minReplicas: 5
maxReplicas: 5

Solusi: Provide scaling range:

Kubernetesyml
# Good: Bisa scale 2-10
minReplicas: 2
maxReplicas: 10

Kesalahan 5: Conflicting Manual Scaling

Problem: Manually scaling Deployment conflict dengan HPA.

Kubernetesbash
# Bad: Manual scale conflict dengan HPA
kubectl scale deployment web-app --replicas=20

Solusi: Let HPA manage replica, atau delete HPA:

Kubernetesbash
# Either delete HPA
kubectl delete hpa web-app-hpa
 
# Atau let HPA manage scaling
# (Jangan manually scale)

Best Practice

Set Appropriate Target

CPU target:

  • Web app: 60-70%
  • API: 70-80%
  • Batch job: 80-90%

Memory target:

  • Generally: 75-85%
  • Leave headroom untuk spike

Define Min/Max Carefully

Kubernetesyml
# Consider:
# - Minimum untuk availability (min >= 2)
# - Maximum untuk cost control
# - Expected traffic pattern
minReplicas: 3    # High availability
maxReplicas: 20   # Cost limit

Gunakan Stabilization Window

Prevent rapid scaling oscillation:

Kubernetesyml
behavior:
    scaleDown:
        stabilizationWindowSeconds: 300  # 5 menit
    scaleUp:
        stabilizationWindowSeconds: 0    # Immediate

Monitor HPA Decision

Track HPA behavior:

Kubernetesbash
# Watch HPA
kubectl get hpa --watch
 
# View event
kubectl describe hpa web-app-hpa
 
# Check metric
kubectl top pods

Combine dengan Resource Quota

Prevent runaway scaling:

Kubernetesyml
apiVersion: v1
kind: ResourceQuota
metadata:
    name: compute-quota
spec:
    hard:
        requests.cpu: "50"
        requests.memory: "100Gi"
        pods: "100"

Test Scaling Behavior

Selalu test sebelum production:

  1. Deploy dengan HPA
  2. Generate realistic load
  3. Observe scaling behavior
  4. Adjust target dan policy
  5. Repeat sampai satisfied

Document Scaling Decision

Add annotation:

Kubernetesyml
metadata:
    annotations:
        hpa.note: "Target 70% based on load testing, scale 2-20 replica"

Troubleshooting HPA

HPA Show Unknown Metric

Kubernetesbash
kubectl get hpa
# TARGETS: <unknown>/70%

Cause:

  1. Metrics Server not installed
  2. Metrics Server not ready
  3. No resource request defined
  4. Pod not ready

Solusi:

Kubernetesbash
# Check Metrics Server
kubectl get deployment metrics-server -n kube-system
 
# Check Pod request
kubectl get deployment web-app -o yaml | grep -A 5 resources
 
# Check Pod status
kubectl get pods -l app=web

HPA Not Scaling

Kubernetesbash
kubectl describe hpa web-app-hpa
# Event show: "unable to get metrics"

Solusi:

Kubernetesbash
# Verify target exist
kubectl get deployment web-app
 
# Check HPA configuration
kubectl get hpa web-app-hpa -o yaml
 
# View HPA event
kubectl describe hpa web-app-hpa

Rapid Scaling (Flapping)

HPA scale up dan down repeatedly.

Solusi:

  1. Increase stabilization window
  2. Adjust target utilization
  3. Review scaling policy
  4. Check application behavior
Kubernetesyml
behavior:
    scaleDown:
        stabilizationWindowSeconds: 600  # Increase dari 300

HPA Reach Max Replica

Kubernetesbash
kubectl get hpa
# REPLICAS: 10 (at maxReplicas)
# TARGETS: 95%/70% (still high)

Solusi:

  1. Increase maxReplicas
  2. Optimize application performance
  3. Lower target utilization
  4. Add more node capacity

HPA vs VPA vs Cluster Autoscaler

Horizontal Pod Autoscaler (HPA)

  • Scale number Pod
  • Based on metric
  • Fast response
  • Untuk stateless workload

Vertical Pod Autoscaler (VPA)

  • Adjust Pod resource request/limit
  • Based on historical usage
  • Require Pod restart
  • Untuk right-sizing resource

Cluster Autoscaler

  • Add/remove node
  • Based on pending Pod
  • Slower response
  • Untuk cluster capacity

Gunakan together:

  • HPA: Scale Pod untuk traffic
  • VPA: Right-size Pod resource
  • Cluster Autoscaler: Adjust cluster size

Melihat Detail HPA

Get HPA

Kubernetesbash
kubectl get hpa
kubectl get hpa -o wide
kubectl get hpa --all-namespaces

Describe HPA

Kubernetesbash
kubectl describe hpa web-app-hpa

View HPA YAML

Kubernetesbash
kubectl get hpa web-app-hpa -o yaml

Watch HPA

Kubernetesbash
kubectl get hpa web-app-hpa --watch

Check Current Metric

Kubernetesbash
kubectl top pods -l app=web
kubectl top nodes

Menghapus HPA

Kubernetesbash
kubectl delete hpa web-app-hpa

Deployment continue running dengan current replica count.

Penutup

Pada episode 30 ini, kita telah membahas Horizontal Pod Autoscaler (HPA) di Kubernetes secara mendalam. Kita sudah belajar bagaimana HPA automatically scale Pod based on metric, cara configure autoscaling policy, dan best practice untuk production use.

Key takeaway:

  • HPA automatically scale Pod replica based on metric
  • Require Metrics Server untuk resource metric
  • Resource request required untuk percentage-based scaling
  • Scale based on CPU, memory, atau custom metric
  • minReplicas/maxReplicas define scaling boundary
  • Target utilization determine kapan scale
  • Stabilization window prevent rapid oscillation
  • Scale up typically immediate, scale down delayed
  • Behavior policy control scaling rate dan timing
  • Multiple metric gunakan highest replica requirement
  • Default scale down: 5-minute stabilization
  • Default scale up: immediate response
  • Test thoroughly sebelum production deployment
  • Monitor HPA decision dan adjust as needed
  • Combine dengan Resource Quota untuk safety

Horizontal Pod Autoscaler essential untuk running dynamic, cost-efficient workload di Kubernetes. Dengan memahami HPA configuration dan behavior, kalian bisa ensure application automatically scale untuk meet demand sambil optimizing resource utilization dan cost.

Bagaimana, makin jelas kan tentang Horizontal Pod Autoscaler di Kubernetes? Jadi, pastikan tetap semangat belajar dan nantikan episode selanjutnya!

Catatan

Untuk kalian yang ingin melanjutkan ke episode selanjutnya, bisa click thumbnail episode 31 di bawah ini

Episode 31Episode 31

Related Posts