Learning Kubernetes - Episode 30 - Introduction and Explanation of Horizontal Pod Autoscaler

Learning Kubernetes - Episode 30 - Introduction and Explanation of Horizontal Pod Autoscaler

In this episode, we'll discuss Kubernetes Horizontal Pod Autoscaler (HPA) for automatic scaling. We'll learn how HPA works, how to configure autoscaling based on CPU, memory, and custom metrics, and best practices for production autoscaling.

Arman Dwi Pangestu
Arman Dwi PangestuApril 5, 2026
0 views
9 min read

Introduction

Note

If you want to read the previous episode, you can click the Episode 29 thumbnail below

Episode 29Episode 29

In the previous episode, we learned about computational resources - CPU and memory requests and limits. In episode 30, we'll discuss Horizontal Pod Autoscaler (HPA), which automatically scales the number of Pods based on observed metrics like CPU utilization, memory usage, or custom metrics.

Note: Here I'll be using a Kubernetes Cluster installed through K3s.

Manual scaling works for predictable workloads, but real-world applications face variable traffic patterns. HPA automatically adjusts replica counts to match demand, ensuring optimal resource utilization and application performance without manual intervention.

What Is Horizontal Pod Autoscaler?

Horizontal Pod Autoscaler (HPA) automatically scales the number of Pods in a Deployment, ReplicaSet, or StatefulSet based on observed metrics.

Think of HPA like an automatic thermostat - it monitors temperature (metrics) and adjusts heating/cooling (Pod count) to maintain your desired comfort level (target utilization). You set the target, HPA handles the adjustments.

Key characteristics of HPA:

  • Automatic scaling - Adjusts replicas without manual intervention
  • Metric-based - Scales based on CPU, memory, or custom metrics
  • Configurable targets - Set desired utilization thresholds
  • Min/max bounds - Define scaling limits
  • Cool-down periods - Prevents rapid scaling oscillations
  • Multiple metrics - Scale based on multiple conditions
  • Works with Deployments - Compatible with standard workloads

Why Use HPA?

HPA solves critical scaling challenges:

  • Handle traffic spikes - Automatically add Pods during high load
  • Cost optimization - Scale down during low traffic periods
  • Maintain performance - Keep response times consistent
  • Reduce manual work - No need to watch metrics constantly
  • Predictable behavior - Consistent scaling logic
  • Resource efficiency - Right-size capacity automatically
  • 24/7 operation - Scales even when you're sleeping
  • Business continuity - Handles unexpected load gracefully

Without HPA, you either over-provision (wasting money) or under-provision (risking outages).

How HPA Works

HPA Control Loop

HPA runs a control loop every 15 seconds (default):

  1. Query metrics - Fetch current resource usage from Metrics Server
  2. Calculate desired replicas - Compare current vs target utilization
  3. Scale if needed - Adjust replica count if threshold exceeded
  4. Wait for stabilization - Cool-down before next scaling decision

Scaling Algorithm

HPA uses this formula to calculate desired replicas:

plaintext
desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]

Example:

  • Current replicas: 3
  • Current CPU usage: 80%
  • Target CPU usage: 50%
plaintext
desiredReplicas = ceil[3 * (80 / 50)] = ceil[4.8] = 5

HPA scales from 3 to 5 replicas.

Metrics Server Requirement

HPA requires Metrics Server to collect resource metrics:

Kubernetesbash
# Check if Metrics Server is installed
kubectl get deployment metrics-server -n kube-system

If not installed:

Kubernetesbash
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Important

Important: HPA cannot function without Metrics Server. Install it before creating HPA resources.

Creating HPA

Prerequisites

Before creating HPA, ensure:

  1. Metrics Server installed - For resource metrics
  2. Resource requests defined - HPA needs baseline for percentage calculations
  3. Deployment exists - HPA targets existing workloads

Basic HPA with CPU

Create a Deployment first:

Kubernetesweb-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: web-app
spec:
    replicas: 2
    selector:
        matchLabels:
            app: web
    template:
        metadata:
            labels:
                app: web
        spec:
            containers:
                - name: nginx
                  image: nginx:1.25
                  ports:
                      - containerPort: 80
                  resources:
                      requests:
                          cpu: "100m"      # Required for HPA
                          memory: "128Mi"
                      limits:
                          cpu: "200m"
                          memory: "256Mi"

Create HPA:

Kubernetesweb-hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: web-app-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    minReplicas: 2
    maxReplicas: 10
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 70

Apply:

Kubernetesbash
kubectl apply -f web-deployment.yml
kubectl apply -f web-hpa.yml

Behavior:

  • Maintains 2-10 replicas
  • Scales up when CPU > 70%
  • Scales down when CPU < 70%

HPA with kubectl

Create HPA using kubectl:

Kubernetesbash
kubectl autoscale deployment web-app --cpu-percent=70 --min=2 --max=10

This creates the same HPA as the YAML above.

Checking HPA Status

Kubernetesbash
kubectl get hpa

Output:

Kubernetesbash
NAME          REFERENCE            TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
web-app-hpa   Deployment/web-app   45%/70%   2         10        2          5m

Columns:

  • TARGETS: Current/Target utilization
  • REPLICAS: Current number of Pods
  • MINPODS/MAXPODS: Scaling boundaries

Detailed view:

Kubernetesbash
kubectl describe hpa web-app-hpa

HPA with Memory

Scale based on memory utilization:

Kubernetesmemory-hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: memory-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    minReplicas: 2
    maxReplicas: 10
    metrics:
        - type: Resource
          resource:
              name: memory
              target:
                  type: Utilization
                  averageUtilization: 80

Behavior:

  • Scales when memory usage > 80%
  • Based on memory requests in Pod spec

HPA with Multiple Metrics

Scale based on CPU OR memory (whichever triggers first):

Kubernetesmulti-metric-hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: multi-metric-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    minReplicas: 2
    maxReplicas: 10
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 70
        - type: Resource
          resource:
              name: memory
              target:
                  type: Utilization
                  averageUtilization: 80

Behavior:

  • Scales up if CPU > 70% OR memory > 80%
  • Uses the metric requiring most replicas
  • Scales down when both metrics below target

HPA with Custom Metrics

Scale based on application-specific metrics (requires custom metrics adapter):

Kubernetescustom-metric-hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: custom-metric-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    minReplicas: 2
    maxReplicas: 10
    metrics:
        - type: Pods
          pods:
              metric:
                  name: http_requests_per_second
              target:
                  type: AverageValue
                  averageValue: "1000"

Use cases:

  • HTTP requests per second
  • Queue length
  • Database connections
  • Custom business metrics

Note

Custom metrics require additional setup like Prometheus Adapter or custom metrics API server.

HPA Behavior Configuration

Control scaling behavior with v2 API:

Kubernetesbehavior-hpa.yml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: behavior-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    minReplicas: 2
    maxReplicas: 10
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 70
    behavior:
        scaleDown:
            stabilizationWindowSeconds: 300  # Wait 5 min before scale down
            policies:
                - type: Percent
                  value: 50              # Max 50% of current replicas
                  periodSeconds: 60      # Per minute
                - type: Pods
                  value: 2               # Max 2 Pods
                  periodSeconds: 60      # Per minute
            selectPolicy: Min            # Use most conservative policy
        scaleUp:
            stabilizationWindowSeconds: 0    # Scale up immediately
            policies:
                - type: Percent
                  value: 100             # Max 100% of current replicas
                  periodSeconds: 15      # Per 15 seconds
                - type: Pods
                  value: 4               # Max 4 Pods
                  periodSeconds: 15      # Per 15 seconds
            selectPolicy: Max            # Use most aggressive policy

Scale Down Behavior:

  • Wait 5 minutes before scaling down (stabilization)
  • Remove max 50% of Pods OR 2 Pods per minute
  • Use most conservative policy (Min)

Scale Up Behavior:

  • Scale up immediately (no stabilization)
  • Add max 100% of Pods OR 4 Pods per 15 seconds
  • Use most aggressive policy (Max)

Practical Examples

Example 1: Web Application

Kubernetesweb-app-complete.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: web-app
spec:
    replicas: 3
    selector:
        matchLabels:
            app: web
    template:
        metadata:
            labels:
                app: web
        spec:
            containers:
                - name: nginx
                  image: nginx:1.25
                  ports:
                      - containerPort: 80
                  resources:
                      requests:
                          cpu: "100m"
                          memory: "128Mi"
                      limits:
                          cpu: "200m"
                          memory: "256Mi"
---
apiVersion: v1
kind: Service
metadata:
    name: web-service
spec:
    selector:
        app: web
    ports:
        - port: 80
          targetPort: 80
    type: LoadBalancer
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: web-app-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    minReplicas: 3
    maxReplicas: 20
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 70

Example 2: API Service with Memory Scaling

Kubernetesapi-service-hpa.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: api-service
spec:
    replicas: 2
    selector:
        matchLabels:
            app: api
    template:
        metadata:
            labels:
                app: api
        spec:
            containers:
                - name: api
                  image: myapi:latest
                  ports:
                      - containerPort: 8080
                  resources:
                      requests:
                          cpu: "250m"
                          memory: "256Mi"
                      limits:
                          cpu: "500m"
                          memory: "512Mi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: api-service-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: api-service
    minReplicas: 2
    maxReplicas: 15
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 60
        - type: Resource
          resource:
              name: memory
              target:
                  type: Utilization
                  averageUtilization: 75

Example 3: Background Worker

Kubernetesworker-hpa.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: worker
spec:
    replicas: 5
    selector:
        matchLabels:
            app: worker
    template:
        metadata:
            labels:
                app: worker
        spec:
            containers:
                - name: worker
                  image: myworker:latest
                  resources:
                      requests:
                          cpu: "200m"
                          memory: "256Mi"
                      limits:
                          cpu: "400m"
                          memory: "512Mi"
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
    name: worker-hpa
spec:
    scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: worker
    minReplicas: 5
    maxReplicas: 50
    metrics:
        - type: Resource
          resource:
              name: cpu
              target:
                  type: Utilization
                  averageUtilization: 80
    behavior:
        scaleDown:
            stabilizationWindowSeconds: 600  # 10 minutes
            policies:
                - type: Pods
                  value: 5
                  periodSeconds: 60
        scaleUp:
            stabilizationWindowSeconds: 0
            policies:
                - type: Pods
                  value: 10
                  periodSeconds: 30

Testing HPA

Generate Load

Create a load generator to test HPA:

Kubernetesbash
# Run load generator
kubectl run -it --rm load-generator --image=busybox:1.36 --restart=Never -- /bin/sh
 
# Inside the container, generate load
while true; do wget -q -O- http://web-service; done

Watch HPA in Action

In another terminal, watch HPA:

Kubernetesbash
kubectl get hpa web-app-hpa --watch

Output shows scaling:

Kubernetesbash
NAME          REFERENCE            TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
web-app-hpa   Deployment/web-app   45%/70%    2         10        2          1m
web-app-hpa   Deployment/web-app   85%/70%    2         10        2          2m
web-app-hpa   Deployment/web-app   85%/70%    2         10        3          2m
web-app-hpa   Deployment/web-app   78%/70%    2         10        3          3m
web-app-hpa   Deployment/web-app   72%/70%    2         10        4          3m
web-app-hpa   Deployment/web-app   68%/70%    2         10        4          4m

Watch Pods scaling:

Kubernetesbash
kubectl get pods -l app=web --watch

Stop Load

Stop the load generator (Ctrl+C), watch HPA scale down:

Kubernetesbash
NAME          REFERENCE            TARGETS    MINPODS   MAXPODS   REPLICAS   AGE
web-app-hpa   Deployment/web-app   68%/70%    2         10        4          5m
web-app-hpa   Deployment/web-app   35%/70%    2         10        4          6m
web-app-hpa   Deployment/web-app   35%/70%    2         10        4          11m
web-app-hpa   Deployment/web-app   35%/70%    2         10        3          11m
web-app-hpa   Deployment/web-app   35%/70%    2         10        2          12m

Note the 5-minute stabilization window before scale down.

HPA Scaling Policies

Default Behavior

Without explicit behavior configuration:

Scale Up:

  • Immediate (no stabilization)
  • Max 100% increase or 4 Pods per 15 seconds

Scale Down:

  • 5-minute stabilization window
  • Max 50% decrease or 1 Pod per minute

Conservative Scaling

Slow, steady scaling:

Kubernetesyml
behavior:
    scaleDown:
        stabilizationWindowSeconds: 600
        policies:
            - type: Pods
              value: 1
              periodSeconds: 120
    scaleUp:
        stabilizationWindowSeconds: 60
        policies:
            - type: Pods
              value: 2
              periodSeconds: 60

Aggressive Scaling

Fast response to load:

Kubernetesyml
behavior:
    scaleDown:
        stabilizationWindowSeconds: 60
        policies:
            - type: Percent
              value: 50
              periodSeconds: 30
    scaleUp:
        stabilizationWindowSeconds: 0
        policies:
            - type: Percent
              value: 200
              periodSeconds: 15

Disable Scale Down

Only scale up, never down:

Kubernetesyml
behavior:
    scaleDown:
        selectPolicy: Disabled

Common Mistakes and Pitfalls

Mistake 1: No Resource Requests

Problem: HPA cannot calculate utilization without requests.

Kubernetesyml
# Bad: No requests
containers:
    - name: app
      image: myapp:latest
      # No resources specified

Solution: Always define requests:

Kubernetesyml
# Good: Requests defined
containers:
    - name: app
      image: myapp:latest
      resources:
          requests:
              cpu: "100m"
              memory: "128Mi"

Mistake 2: Metrics Server Not Installed

Problem: HPA shows "unknown" for metrics.

Kubernetesbash
kubectl get hpa
# TARGETS: <unknown>/70%

Solution: Install Metrics Server:

Kubernetesbash
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Mistake 3: Target Too Low

Problem: Constant scaling up and down (flapping).

Kubernetesyml
# Bad: 30% is too low
target:
    type: Utilization
    averageUtilization: 30

Solution: Use reasonable targets (60-80%):

Kubernetesyml
# Good: 70% allows headroom
target:
    type: Utilization
    averageUtilization: 70

Mistake 4: Min = Max Replicas

Problem: HPA cannot scale.

Kubernetesyml
# Bad: No scaling range
minReplicas: 5
maxReplicas: 5

Solution: Provide scaling range:

Kubernetesyml
# Good: Can scale 2-10
minReplicas: 2
maxReplicas: 10

Mistake 5: Conflicting Manual Scaling

Problem: Manually scaling Deployment conflicts with HPA.

Kubernetesbash
# Bad: Manual scale conflicts with HPA
kubectl scale deployment web-app --replicas=20

Solution: Let HPA manage replicas, or delete HPA:

Kubernetesbash
# Either delete HPA
kubectl delete hpa web-app-hpa
 
# Or let HPA manage scaling
# (Don't manually scale)

Best Practices

Set Appropriate Targets

CPU targets:

  • Web apps: 60-70%
  • APIs: 70-80%
  • Batch jobs: 80-90%

Memory targets:

  • Generally: 75-85%
  • Leave headroom for spikes

Define Min/Max Carefully

Kubernetesyml
# Consider:
# - Minimum for availability (min >= 2)
# - Maximum for cost control
# - Expected traffic patterns
minReplicas: 3    # High availability
maxReplicas: 20   # Cost limit

Use Stabilization Windows

Prevent rapid scaling oscillations:

Kubernetesyml
behavior:
    scaleDown:
        stabilizationWindowSeconds: 300  # 5 minutes
    scaleUp:
        stabilizationWindowSeconds: 0    # Immediate

Monitor HPA Decisions

Track HPA behavior:

Kubernetesbash
# Watch HPA
kubectl get hpa --watch
 
# View events
kubectl describe hpa web-app-hpa
 
# Check metrics
kubectl top pods

Combine with Resource Quotas

Prevent runaway scaling:

Kubernetesyml
apiVersion: v1
kind: ResourceQuota
metadata:
    name: compute-quota
spec:
    hard:
        requests.cpu: "50"
        requests.memory: "100Gi"
        pods: "100"

Test Scaling Behavior

Always test before production:

  1. Deploy with HPA
  2. Generate realistic load
  3. Observe scaling behavior
  4. Adjust targets and policies
  5. Repeat until satisfied

Document Scaling Decisions

Add annotations:

Kubernetesyml
metadata:
    annotations:
        hpa.note: "Target 70% based on load testing, scales 2-20 replicas"

Troubleshooting HPA

HPA Shows Unknown Metrics

Kubernetesbash
kubectl get hpa
# TARGETS: <unknown>/70%

Causes:

  1. Metrics Server not installed
  2. Metrics Server not ready
  3. No resource requests defined
  4. Pods not ready

Solutions:

Kubernetesbash
# Check Metrics Server
kubectl get deployment metrics-server -n kube-system
 
# Check Pod requests
kubectl get deployment web-app -o yaml | grep -A 5 resources
 
# Check Pod status
kubectl get pods -l app=web

HPA Not Scaling

Kubernetesbash
kubectl describe hpa web-app-hpa
# Events show: "unable to get metrics"

Solutions:

Kubernetesbash
# Verify target exists
kubectl get deployment web-app
 
# Check HPA configuration
kubectl get hpa web-app-hpa -o yaml
 
# View HPA events
kubectl describe hpa web-app-hpa

Rapid Scaling (Flapping)

HPA scales up and down repeatedly.

Solutions:

  1. Increase stabilization window
  2. Adjust target utilization
  3. Review scaling policies
  4. Check application behavior
Kubernetesyml
behavior:
    scaleDown:
        stabilizationWindowSeconds: 600  # Increase from 300

HPA Reaches Max Replicas

Kubernetesbash
kubectl get hpa
# REPLICAS: 10 (at maxReplicas)
# TARGETS: 95%/70% (still high)

Solutions:

  1. Increase maxReplicas
  2. Optimize application performance
  3. Lower target utilization
  4. Add more node capacity

HPA vs VPA vs Cluster Autoscaler

Horizontal Pod Autoscaler (HPA)

  • Scales number of Pods
  • Based on metrics
  • Fast response
  • For stateless workloads

Vertical Pod Autoscaler (VPA)

  • Adjusts Pod resource requests/limits
  • Based on historical usage
  • Requires Pod restart
  • For right-sizing resources

Cluster Autoscaler

  • Adds/removes nodes
  • Based on pending Pods
  • Slower response
  • For cluster capacity

Use together:

  • HPA: Scale Pods for traffic
  • VPA: Right-size Pod resources
  • Cluster Autoscaler: Adjust cluster size

Viewing HPA Details

Get HPA

Kubernetesbash
kubectl get hpa
kubectl get hpa -o wide
kubectl get hpa --all-namespaces

Describe HPA

Kubernetesbash
kubectl describe hpa web-app-hpa

View HPA YAML

Kubernetesbash
kubectl get hpa web-app-hpa -o yaml

Watch HPA

Kubernetesbash
kubectl get hpa web-app-hpa --watch

Check Current Metrics

Kubernetesbash
kubectl top pods -l app=web
kubectl top nodes

Deleting HPA

Kubernetesbash
kubectl delete hpa web-app-hpa

Deployment continues running with current replica count.

Conclusion

In episode 30, we've explored Horizontal Pod Autoscaler (HPA) in Kubernetes in depth. We've learned how HPA automatically scales Pods based on metrics, how to configure autoscaling policies, and best practices for production use.

Key takeaways:

  • HPA automatically scales Pod replicas based on metrics
  • Requires Metrics Server for resource metrics
  • Resource requests required for percentage-based scaling
  • Scales based on CPU, memory, or custom metrics
  • minReplicas/maxReplicas define scaling boundaries
  • Target utilization determines when to scale
  • Stabilization windows prevent rapid oscillations
  • Scale up typically immediate, scale down delayed
  • Behavior policies control scaling rate and timing
  • Multiple metrics use highest replica requirement
  • Default scale down: 5-minute stabilization
  • Default scale up: immediate response
  • Test thoroughly before production deployment
  • Monitor HPA decisions and adjust as needed
  • Combine with Resource Quotas for safety

Horizontal Pod Autoscaler is essential for running dynamic, cost-efficient workloads in Kubernetes. By understanding HPA configuration and behavior, you can ensure your applications automatically scale to meet demand while optimizing resource utilization and costs.

Are you getting a clearer understanding of Horizontal Pod Autoscaler in Kubernetes? Keep your learning momentum going and look forward to the next episode!

Note

If you want to continue to the next episode, you can click the Episode 31 thumbnail below

Episode 31Episode 31

Related Posts