Belajar Kubernetes - Pengenalan dan Penjelasan Horizontal Pod Autoscaler

#Pendahuluan

Di episode sebelumnya kita sudah belajar tentang computational resource - CPU dan memory request dan limit. Selanjutnya di episode 30 kali ini, kita akan coba bahas Horizontal Pod Autoscaler (HPA), yang automatically scale number Pod based on observed metric seperti CPU utilization, memory usage, atau custom metric.

Catatan: Disini saya akan menggunakan Kubernetes Cluster yang di install melalui K3s.

Manual scaling work untuk predictable workload, tapi real-world application face variable traffic pattern. HPA automatically adjust replica count untuk match demand, ensuring optimal resource utilization dan application performance tanpa manual intervention.

#Apa Itu Horizontal Pod Autoscaler?

Horizontal Pod Autoscaler (HPA) automatically scale number Pod di Deployment, ReplicaSet, atau StatefulSet based on observed metric.

Bayangkan HPA seperti automatic thermostat - dia monitor temperature (metric) dan adjust heating/cooling (Pod count) untuk maintain desired comfort level kalian (target utilization). Kalian set target, HPA handle adjustment.

Karakteristik kunci HPA:

Automatic scaling - Adjust replica tanpa manual intervention
Metric-based - Scale based on CPU, memory, atau custom metric
Configurable target - Set desired utilization threshold
Min/max bound - Define scaling limit
Cool-down period - Prevent rapid scaling oscillation
Multiple metric - Scale based on multiple condition
Work dengan Deployment - Compatible dengan standard workload

#Kenapa Gunakan HPA?

HPA solve critical scaling challenge:

Handle traffic spike - Automatically add Pod during high load
Cost optimization - Scale down during low traffic period
Maintain performance - Keep response time consistent
Reduce manual work - No need to watch metric constantly
Predictable behavior - Consistent scaling logic
Resource efficiency - Right-size capacity automatically
24/7 operation - Scale even ketika kalian sleeping
Business continuity - Handle unexpected load gracefully

Tanpa HPA, kalian either over-provision (wasting money) atau under-provision (risking outage).

#Bagaimana HPA Bekerja

#HPA Control Loop

HPA run control loop setiap 15 detik (default):

Query metric - Fetch current resource usage dari Metrics Server
Calculate desired replica - Compare current vs target utilization
Scale jika needed - Adjust replica count jika threshold exceeded
Wait untuk stabilization - Cool-down sebelum next scaling decision

#Scaling Algorithm

HPA gunakan formula ini untuk calculate desired replica:

plaintext

desiredReplicas = ceil[currentReplicas * (currentMetricValue / targetMetricValue)]

Contoh:

Current replica: 3
Current CPU usage: 80%
Target CPU usage: 50%

plaintext

desiredReplicas = ceil[3 * (80 / 50)] = ceil[4.8] = 5

HPA scale dari 3 ke 5 replica.

#Metrics Server Requirement

HPA require Metrics Server untuk collect resource metric:

Jika belum installed:

Important

Penting: HPA tidak bisa function tanpa Metrics Server. Install dulu sebelum create HPA resource.

#Membuat HPA

#Prerequisite

Sebelum create HPA, ensure:

Metrics Server installed - Untuk resource metric
Resource request defined - HPA need baseline untuk percentage calculation
Deployment exist - HPA target existing workload

#Basic HPA dengan CPU

Create Deployment dulu:

Create HPA:

Apply:

Behavior:

Maintain 2-10 replica
Scale up ketika CPU > 70%
Scale down ketika CPU < 70%

#HPA dengan kubectl

Create HPA menggunakan kubectl:

Ini create HPA yang sama dengan YAML di atas.

#Checking HPA Status

Output:

Column:

TARGETS: Current/Target utilization
REPLICAS: Current number Pod
MINPODS/MAXPODS: Scaling boundary

Detailed view:

#HPA dengan Memory

Scale based on memory utilization:

Behavior:

Scale ketika memory usage > 80%
Based on memory request di Pod spec

#HPA dengan Multiple Metric

Scale based on CPU OR memory (whichever trigger first):

Behavior:

Scale up jika CPU > 70% OR memory > 80%
Gunakan metric requiring most replica
Scale down ketika both metric below target

#HPA dengan Custom Metric

Scale based on application-specific metric (require custom metric adapter):

Use case:

HTTP request per second
Queue length
Database connection
Custom business metric

Catatan

Custom metric require additional setup seperti Prometheus Adapter atau custom metric API server.

#HPA Behavior Configuration

Control scaling behavior dengan v2 API:

Scale Down Behavior:

Wait 5 menit sebelum scaling down (stabilization)
Remove max 50% Pod OR 2 Pod per menit
Gunakan most conservative policy (Min)

Scale Up Behavior:

Scale up immediately (no stabilization)
Add max 100% Pod OR 4 Pod per 15 detik
Gunakan most aggressive policy (Max)

#Contoh Praktis

#Contoh 1: Web Application

#Contoh 2: API Service dengan Memory Scaling

#Contoh 3: Background Worker

#Testing HPA

#Generate Load

Create load generator untuk test HPA:

#Watch HPA in Action

Di terminal lain, watch HPA:

Output show scaling:

Watch Pod scaling:

#Stop Load

Stop load generator (Ctrl+C), watch HPA scale down:

Perhatikan 5-minute stabilization window sebelum scale down.

#HPA Scaling Policy

#Default Behavior

Tanpa explicit behavior configuration:

Scale Up:

Immediate (no stabilization)
Max 100% increase atau 4 Pod per 15 detik

Scale Down:

5-minute stabilization window
Max 50% decrease atau 1 Pod per menit

#Conservative Scaling

Slow, steady scaling:

#Aggressive Scaling

Fast response ke load:

#Disable Scale Down

Hanya scale up, never down:

#Kesalahan Umum dan Pitfall

#Kesalahan 1: No Resource Request

Problem: HPA tidak bisa calculate utilization tanpa request.

Solusi: Selalu define request:

#Kesalahan 2: Metrics Server Not Installed

Problem: HPA show "unknown" untuk metric.

Solusi: Install Metrics Server:

#Kesalahan 3: Target Terlalu Rendah

Problem: Constant scaling up dan down (flapping).

Solusi: Gunakan reasonable target (60-80%):

#Kesalahan 4: Min = Max Replica

Problem: HPA tidak bisa scale.

Solusi: Provide scaling range:

#Kesalahan 5: Conflicting Manual Scaling

Problem: Manually scaling Deployment conflict dengan HPA.

Solusi: Let HPA manage replica, atau delete HPA:

#Best Practice

#Set Appropriate Target

CPU target:

Web app: 60-70%
API: 70-80%
Batch job: 80-90%

Memory target:

Generally: 75-85%
Leave headroom untuk spike

#Define Min/Max Carefully

#Gunakan Stabilization Window

Prevent rapid scaling oscillation:

#Monitor HPA Decision

Track HPA behavior:

#Combine dengan Resource Quota

Prevent runaway scaling:

#Test Scaling Behavior

Selalu test sebelum production:

Deploy dengan HPA
Generate realistic load
Observe scaling behavior
Adjust target dan policy
Repeat sampai satisfied

#Document Scaling Decision

Add annotation:

#Troubleshooting HPA

#HPA Show Unknown Metric

Cause:

Metrics Server not installed
Metrics Server not ready
No resource request defined
Pod not ready

Solusi:

#HPA Not Scaling

Solusi:

#Rapid Scaling (Flapping)

HPA scale up dan down repeatedly.

Solusi:

Increase stabilization window
Adjust target utilization
Review scaling policy
Check application behavior

#HPA Reach Max Replica

Solusi:

Increase maxReplicas
Optimize application performance
Lower target utilization
Add more node capacity

#HPA vs VPA vs Cluster Autoscaler

#Horizontal Pod Autoscaler (HPA)

Scale number Pod
Based on metric
Fast response
Untuk stateless workload

#Vertical Pod Autoscaler (VPA)

Adjust Pod resource request/limit
Based on historical usage
Require Pod restart
Untuk right-sizing resource

#Cluster Autoscaler

Add/remove node
Based on pending Pod
Slower response
Untuk cluster capacity

Gunakan together:

HPA: Scale Pod untuk traffic
VPA: Right-size Pod resource
Cluster Autoscaler: Adjust cluster size

#Melihat Detail HPA

#Get HPA

#Describe HPA

#View HPA YAML

#Watch HPA

#Check Current Metric

#Menghapus HPA

Deployment continue running dengan current replica count.

#Penutup

Pada episode 30 ini, kita telah membahas Horizontal Pod Autoscaler (HPA) di Kubernetes secara mendalam. Kita sudah belajar bagaimana HPA automatically scale Pod based on metric, cara configure autoscaling policy, dan best practice untuk production use.

Key takeaway:

HPA automatically scale Pod replica based on metric
Require Metrics Server untuk resource metric
Resource request required untuk percentage-based scaling
Scale based on CPU, memory, atau custom metric
minReplicas/maxReplicas define scaling boundary
Target utilization determine kapan scale
Stabilization window prevent rapid oscillation
Scale up typically immediate, scale down delayed
Behavior policy control scaling rate dan timing
Multiple metric gunakan highest replica requirement
Default scale down: 5-minute stabilization
Default scale up: immediate response
Test thoroughly sebelum production deployment
Monitor HPA decision dan adjust as needed
Combine dengan Resource Quota untuk safety

Horizontal Pod Autoscaler essential untuk running dynamic, cost-efficient workload di Kubernetes. Dengan memahami HPA configuration dan behavior, kalian bisa ensure application automatically scale untuk meet demand sambil optimizing resource utilization dan cost.

Bagaimana, makin jelas kan tentang Horizontal Pod Autoscaler di Kubernetes? Jadi, pastikan tetap semangat belajar dan nantikan episode selanjutnya!