Learning Kubernetes - Episode 31 - Introduction and Explanation of Vertical Pod Autoscaler

Learning Kubernetes - Episode 31 - Introduction and Explanation of Vertical Pod Autoscaler

In this episode, we'll discuss Kubernetes Vertical Pod Autoscaler (VPA) for automatic resource sizing. We'll learn how VPA works, how to install and configure it, update modes, and best practices for right-sizing Pod resources automatically.

Arman Dwi Pangestu
Arman Dwi PangestuApril 6, 2026
0 views
10 min read

Introduction

Note

If you want to read the previous episode, you can click the Episode 30 thumbnail below

Episode 30Episode 30

In the previous episode, we learned about Horizontal Pod Autoscaler (HPA) which scales the number of Pods. In episode 31, we'll discuss Vertical Pod Autoscaler (VPA), which automatically adjusts CPU and memory requests and limits for containers based on actual usage.

Note: Here I'll be using a Kubernetes Cluster installed through K3s.

While HPA scales horizontally (more Pods), VPA scales vertically (bigger Pods). Setting the right resource requests is challenging - too low causes OOMKills and throttling, too high wastes resources. VPA solves this by continuously analyzing usage and recommending or applying optimal resource values.

What Is Vertical Pod Autoscaler?

Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests and limits for containers based on historical and current resource usage.

Think of VPA like a tailor - it measures your actual size (resource usage) and adjusts your clothes (resource requests/limits) to fit perfectly. Instead of guessing sizes, VPA uses real data to right-size your Pods.

Key characteristics of VPA:

  • Automatic right-sizing - Adjusts resource requests/limits
  • Historical analysis - Uses past usage patterns
  • Recommendation mode - Suggests values without applying
  • Auto mode - Applies changes automatically
  • Initial mode - Sets resources at Pod creation only
  • Prevents waste - Reduces over-provisioning
  • Prevents failures - Avoids under-provisioning
  • Works with Deployments - Compatible with standard workloads

VPA vs HPA

Understanding the key differences:

AspectVPAHPA
Scaling DirectionVertical (resource size)Horizontal (replica count)
What ChangesCPU/Memory requests/limitsNumber of Pods
Requires RestartYes (in Auto/Recreate mode)No
Use CaseRight-size resourcesHandle traffic spikes
MetricsHistorical usageCurrent metrics
Response TimeSlower (requires restart)Faster (add Pods)
Best ForStateful workloadsStateless workloads

Can use together:

  • VPA: Right-size individual Pods
  • HPA: Scale number of Pods
  • Combine for optimal resource utilization

Warning

Warning: Don't use VPA and HPA on the same CPU/memory metrics simultaneously - they can conflict. Use HPA for CPU/memory, VPA for other resources, or HPA for scaling and VPA in recommendation mode.

Why Use VPA?

VPA solves critical resource management challenges:

  • Eliminate guesswork - No need to estimate resource needs
  • Optimize costs - Reduce over-provisioning waste
  • Prevent failures - Avoid OOMKills from under-provisioning
  • Adapt to changes - Adjust as application behavior evolves
  • Save time - No manual resource tuning
  • Improve efficiency - Better cluster utilization
  • Handle variability - Adapt to workload changes
  • Data-driven decisions - Based on actual usage

Without VPA, you either waste resources (over-provision) or risk failures (under-provision), and must manually adjust as applications change.

How VPA Works

VPA Components

VPA consists of three components:

1. Recommender:

  • Monitors resource usage
  • Analyzes historical data
  • Calculates recommended values
  • Updates VPA objects with recommendations

2. Updater:

  • Checks if Pods need updates
  • Evicts Pods that need resource changes
  • Respects Pod Disruption Budgets
  • Triggers Pod recreation

3. Admission Controller:

  • Intercepts Pod creation
  • Applies VPA recommendations
  • Sets resource requests/limits
  • Works as mutating webhook

VPA Control Loop

  1. Monitor - Recommender watches Pod metrics
  2. Analyze - Calculate optimal resource values
  3. Recommend - Update VPA object with recommendations
  4. Decide - Updater checks if changes needed
  5. Evict - Updater evicts Pods (if Auto mode)
  6. Apply - Admission Controller sets new values on recreation

Installing VPA

VPA is not installed by default. Let's install it.

Prerequisites

  • Kubernetes cluster (1.11+)
  • Metrics Server installed
  • kubectl access

Installation Steps

Clone VPA repository:

Kubernetesbash
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler

Install VPA:

Kubernetesbash
./hack/vpa-up.sh

This installs:

  • VPA CRDs (CustomResourceDefinitions)
  • VPA Recommender
  • VPA Updater
  • VPA Admission Controller

Verify installation:

Kubernetesbash
kubectl get pods -n kube-system | grep vpa

Output:

Kubernetesbash
vpa-admission-controller-xxx   1/1     Running   0          1m
vpa-recommender-xxx            1/1     Running   0          1m
vpa-updater-xxx                1/1     Running   0          1m

Check VPA CRDs:

Kubernetesbash
kubectl get crd | grep verticalpodautoscaler

Output:

Kubernetesbash
verticalpodautoscalercheckpoints.autoscaling.k8s.io
verticalpodautoscalers.autoscaling.k8s.io

VPA Update Modes

VPA supports different update modes:

Off Mode (Recommendation Only)

VPA calculates recommendations but doesn't apply them:

Kubernetesvpa-off.yml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
    name: my-app-vpa
spec:
    targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
    updatePolicy:
        updateMode: "Off"

Use case:

  • Testing VPA recommendations
  • Manual review before applying
  • Learning resource patterns
  • Generating reports

Initial Mode

VPA sets resources only when Pods are created, never updates running Pods:

Kubernetesvpa-initial.yml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
    name: my-app-vpa
spec:
    targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
    updatePolicy:
        updateMode: "Initial"

Use case:

  • Set initial resources for new Pods
  • Avoid disrupting running Pods
  • Gradual rollout of VPA

Recreate Mode (Default)

VPA evicts and recreates Pods with new resource values:

Kubernetesvpa-recreate.yml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
    name: my-app-vpa
spec:
    targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
    updatePolicy:
        updateMode: "Recreate"

Behavior:

  • Evicts Pods when resources need adjustment
  • Deployment controller recreates Pods
  • Admission Controller applies new values
  • Causes brief downtime per Pod

Use case:

  • Automatic resource optimization
  • Stateless applications
  • When brief disruption acceptable

Auto Mode

VPA automatically updates Pods (currently same as Recreate):

Kubernetesvpa-auto.yml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
    name: my-app-vpa
spec:
    targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
    updatePolicy:
        updateMode: "Auto"

Note

Auto mode currently behaves like Recreate. In-place updates (without Pod restart) are planned for future Kubernetes versions.

Creating VPA

Basic VPA Example

Create a Deployment:

Kubernetesapp-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: my-app
spec:
    replicas: 2
    selector:
        matchLabels:
            app: my-app
    template:
        metadata:
            labels:
                app: my-app
        spec:
            containers:
                - name: app
                  image: nginx:1.25
                  resources:
                      requests:
                          cpu: "100m"
                          memory: "128Mi"
                      limits:
                          cpu: "200m"
                          memory: "256Mi"

Create VPA:

Kubernetesapp-vpa.yml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
    name: my-app-vpa
spec:
    targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
    updatePolicy:
        updateMode: "Auto"

Apply:

Kubernetesbash
kubectl apply -f app-deployment.yml
kubectl apply -f app-vpa.yml

VPA with Resource Policy

Control which resources VPA can modify:

Kubernetesvpa-with-policy.yml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
    name: my-app-vpa
spec:
    targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
    updatePolicy:
        updateMode: "Auto"
    resourcePolicy:
        containerPolicies:
            - containerName: "app"
              mode: "Auto"
              minAllowed:
                  cpu: "50m"
                  memory: "64Mi"
              maxAllowed:
                  cpu: "1000m"
                  memory: "1Gi"
              controlledResources:
                  - cpu
                  - memory

Resource Policy Options:

  • mode: Auto, Off (per container)
  • minAllowed: Minimum resource values
  • maxAllowed: Maximum resource values
  • controlledResources: Which resources to manage (cpu, memory)

VPA for Specific Containers

Target specific containers in multi-container Pods:

Kubernetesvpa-multi-container.yml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
    name: my-app-vpa
spec:
    targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: my-app
    updatePolicy:
        updateMode: "Auto"
    resourcePolicy:
        containerPolicies:
            # Main application container
            - containerName: "app"
              mode: "Auto"
              minAllowed:
                  cpu: "100m"
                  memory: "128Mi"
              maxAllowed:
                  cpu: "2000m"
                  memory: "2Gi"
            # Sidecar container
            - containerName: "sidecar"
              mode: "Off"  # Don't modify sidecar

Viewing VPA Recommendations

Get VPA Status

Kubernetesbash
kubectl get vpa

Output:

Kubernetesbash
NAME         MODE   CPU    MEM       PROVIDED   AGE
my-app-vpa   Auto   150m   256Mi     True       5m

Describe VPA

Kubernetesbash
kubectl describe vpa my-app-vpa

Output shows recommendations:

Kubernetesbash
Name:         my-app-vpa
Namespace:    default
API Version:  autoscaling.k8s.io/v1
Kind:         VerticalPodAutoscaler
 
Recommendation:
  Container Recommendations:
    Container Name:  app
    Lower Bound:
      Cpu:     100m
      Memory:  128Mi
    Target:
      Cpu:     150m
      Memory:  256Mi
    Uncapped Target:
      Cpu:     150m
      Memory:  256Mi
    Upper Bound:
      Cpu:     300m
      Memory:  512Mi

Recommendation Fields:

  • Lower Bound: Minimum recommended (avoid OOMKills)
  • Target: Recommended optimal value
  • Uncapped Target: Recommendation without policy limits
  • Upper Bound: Maximum recommended (avoid waste)

View VPA YAML

Kubernetesbash
kubectl get vpa my-app-vpa -o yaml

Practical Examples

Example 1: Web Application with VPA

Kubernetesweb-app-vpa.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: web-app
spec:
    replicas: 3
    selector:
        matchLabels:
            app: web
    template:
        metadata:
            labels:
                app: web
        spec:
            containers:
                - name: nginx
                  image: nginx:1.25
                  ports:
                      - containerPort: 80
                  resources:
                      requests:
                          cpu: "100m"
                          memory: "128Mi"
                      limits:
                          cpu: "500m"
                          memory: "512Mi"
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
    name: web-app-vpa
spec:
    targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: web-app
    updatePolicy:
        updateMode: "Auto"
    resourcePolicy:
        containerPolicies:
            - containerName: "nginx"
              minAllowed:
                  cpu: "50m"
                  memory: "64Mi"
              maxAllowed:
                  cpu: "1000m"
                  memory: "1Gi"

Example 2: Database with VPA (Recommendation Only)

Kubernetesdatabase-vpa.yml
apiVersion: apps/v1
kind: StatefulSet
metadata:
    name: postgres
spec:
    serviceName: postgres
    replicas: 1
    selector:
        matchLabels:
            app: postgres
    template:
        metadata:
            labels:
                app: postgres
        spec:
            containers:
                - name: postgres
                  image: postgres:15
                  resources:
                      requests:
                          cpu: "500m"
                          memory: "512Mi"
                      limits:
                          cpu: "2000m"
                          memory: "2Gi"
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
    name: postgres-vpa
spec:
    targetRef:
        apiVersion: apps/v1
        kind: StatefulSet
        name: postgres
    updatePolicy:
        updateMode: "Off"  # Recommendation only for database

Example 3: Microservice with Multiple Containers

Kubernetesmicroservice-vpa.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: api-service
spec:
    replicas: 3
    selector:
        matchLabels:
            app: api
    template:
        metadata:
            labels:
                app: api
        spec:
            containers:
                - name: api
                  image: myapi:latest
                  resources:
                      requests:
                          cpu: "200m"
                          memory: "256Mi"
                - name: log-agent
                  image: fluent/fluentd:v1.16
                  resources:
                      requests:
                          cpu: "50m"
                          memory: "64Mi"
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
    name: api-service-vpa
spec:
    targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: api-service
    updatePolicy:
        updateMode: "Auto"
    resourcePolicy:
        containerPolicies:
            - containerName: "api"
              mode: "Auto"
              minAllowed:
                  cpu: "100m"
                  memory: "128Mi"
              maxAllowed:
                  cpu: "2000m"
                  memory: "2Gi"
            - containerName: "log-agent"
              mode: "Auto"
              minAllowed:
                  cpu: "25m"
                  memory: "32Mi"
              maxAllowed:
                  cpu: "200m"
                  memory: "256Mi"

Example 4: VPA with Initial Mode

Kubernetesworker-vpa-initial.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: worker
spec:
    replicas: 5
    selector:
        matchLabels:
            app: worker
    template:
        metadata:
            labels:
                app: worker
        spec:
            containers:
                - name: worker
                  image: myworker:latest
                  resources:
                      requests:
                          cpu: "100m"
                          memory: "128Mi"
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
    name: worker-vpa
spec:
    targetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: worker
    updatePolicy:
        updateMode: "Initial"  # Only set on Pod creation

Testing VPA

Deploy Application

Kubernetesbash
kubectl apply -f app-deployment.yml
kubectl apply -f app-vpa.yml

Generate Load

Create load to trigger resource usage:

Kubernetesbash
kubectl run -it --rm load-generator --image=busybox:1.36 --restart=Never -- /bin/sh
 
# Generate CPU load
while true; do :; done

Watch VPA Recommendations

Kubernetesbash
kubectl get vpa my-app-vpa --watch

Check Pod Resources

Before VPA:

Kubernetesbash
kubectl get pod <pod-name> -o yaml | grep -A 5 resources

After VPA updates (in Auto mode):

Kubernetesbash
# VPA will evict and recreate Pods
kubectl get pods -w
 
# Check new resource values
kubectl get pod <new-pod-name> -o yaml | grep -A 5 resources

VPA Limitations

Current Limitations

1. Requires Pod Restart:

  • VPA cannot update resources in-place
  • Pods must be evicted and recreated
  • Causes brief downtime

2. Not for Horizontal Scaling:

  • VPA adjusts resource size, not replica count
  • Use HPA for scaling replicas

3. Conflicts with HPA:

  • Don't use both on same CPU/memory metrics
  • Can cause scaling conflicts

4. No Downscaling Protection:

  • VPA can reduce resources aggressively
  • May cause issues if recommendations too low

5. Limited History:

  • Recommendations based on recent history
  • May not capture long-term patterns

6. Experimental Status:

  • VPA is still beta/experimental
  • Not recommended for critical production workloads without testing

Common Mistakes and Pitfalls

Mistake 1: Using VPA and HPA Together on Same Metrics

Problem: VPA and HPA conflict when both target CPU/memory.

Kubernetesyml
# Bad: Both VPA and HPA on CPU
# VPA adjusts CPU requests
# HPA scales based on CPU utilization
# They fight each other

Solution: Use different metrics or modes:

Kubernetesyml
# Option 1: VPA in Off mode (recommendations only)
updatePolicy:
    updateMode: "Off"
 
# Option 2: HPA on CPU, VPA on memory only
resourcePolicy:
    containerPolicies:
        - containerName: "app"
          controlledResources:
              - memory  # VPA only manages memory

Mistake 2: No Min/Max Limits

Problem: VPA can set extreme values.

Solution: Always set boundaries:

Kubernetesyml
resourcePolicy:
    containerPolicies:
        - containerName: "app"
          minAllowed:
              cpu: "50m"
              memory: "64Mi"
          maxAllowed:
              cpu: "2000m"
              memory: "4Gi"

Mistake 3: Using Auto Mode on Stateful Workloads

Problem: Pod eviction causes data loss or downtime.

Solution: Use Off or Initial mode for stateful apps:

Kubernetesyml
# For databases, use Off mode
updatePolicy:
    updateMode: "Off"

Mistake 4: No Initial Resources

Problem: VPA needs baseline to start from.

Solution: Always set initial requests:

Kubernetesyml
resources:
    requests:
        cpu: "100m"      # Set reasonable initial values
        memory: "128Mi"

Mistake 5: Ignoring Recommendations

Problem: Running VPA in Off mode but never checking recommendations.

Solution: Regularly review and apply recommendations:

Kubernetesbash
kubectl describe vpa my-app-vpa
# Review Target recommendations
# Update Deployment manually if needed

Best Practices

Start with Off Mode

Test VPA before enabling Auto mode:

Kubernetesyml
# Phase 1: Observe recommendations
updatePolicy:
    updateMode: "Off"
 
# Phase 2: After validation, enable Auto
updatePolicy:
    updateMode: "Auto"

Set Appropriate Boundaries

Define min/max based on workload:

Kubernetesyml
resourcePolicy:
    containerPolicies:
        - containerName: "app"
          minAllowed:
              cpu: "100m"      # Minimum for functionality
              memory: "128Mi"
          maxAllowed:
              cpu: "2000m"     # Maximum for cost control
              memory: "2Gi"

Use Initial Mode for Gradual Rollout

Avoid disrupting running Pods:

Kubernetesyml
updatePolicy:
    updateMode: "Initial"  # Only affects new Pods

Monitor VPA Decisions

Track VPA behavior:

Kubernetesbash
# Watch VPA recommendations
kubectl get vpa --watch
 
# Check VPA events
kubectl describe vpa my-app-vpa
 
# Monitor Pod evictions
kubectl get events --sort-by='.lastTimestamp'

Combine with Pod Disruption Budget

Protect availability during updates:

Kubernetespdb.yml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
    name: my-app-pdb
spec:
    minAvailable: 2
    selector:
        matchLabels:
            app: my-app

Use for Right-Sizing, Not Scaling

VPA is for resource optimization, not traffic handling:

  • VPA: Right-size individual Pods
  • HPA: Scale for traffic
  • Cluster Autoscaler: Add nodes

Document VPA Configuration

Add annotations explaining choices:

Kubernetesyml
metadata:
    annotations:
        vpa.note: "Auto mode with 100m-2000m CPU range based on load testing"

Troubleshooting VPA

VPA Not Providing Recommendations

Kubernetesbash
kubectl describe vpa my-app-vpa
# Status shows: No recommendation available

Causes:

  1. VPA components not running
  2. Insufficient metrics data
  3. Target workload not found

Solutions:

Kubernetesbash
# Check VPA components
kubectl get pods -n kube-system | grep vpa
 
# Check target exists
kubectl get deployment my-app
 
# Wait for metrics collection (5-10 minutes)

VPA Not Updating Pods

Kubernetesbash
# Pods not being evicted despite recommendations

Causes:

  1. UpdateMode is Off or Initial
  2. Pod Disruption Budget blocking evictions
  3. Recommendations within current values

Solutions:

Kubernetesbash
# Check update mode
kubectl get vpa my-app-vpa -o yaml | grep updateMode
 
# Check PDB
kubectl get pdb
 
# Check if recommendations differ from current
kubectl describe vpa my-app-vpa

Pods Constantly Restarting

Kubernetesbash
# VPA keeps evicting Pods

Causes:

  1. Recommendations oscillating
  2. Min/max limits too narrow
  3. Workload highly variable

Solutions:

Kubernetesyml
# Widen min/max range
maxAllowed:
    cpu: "4000m"     # Increase from 2000m
    memory: "4Gi"    # Increase from 2Gi
 
# Or switch to Off mode
updatePolicy:
    updateMode: "Off"

VPA Recommendations Too High/Low

Kubernetesbash
kubectl describe vpa my-app-vpa
# Target: 4000m CPU (seems too high)

Solutions:

Kubernetesyml
# Set maxAllowed to cap recommendations
resourcePolicy:
    containerPolicies:
        - containerName: "app"
          maxAllowed:
              cpu: "2000m"
              memory: "2Gi"

Uninstalling VPA

Kubernetesbash
cd autoscaler/vertical-pod-autoscaler
./hack/vpa-down.sh

This removes:

  • VPA Deployments
  • VPA CRDs
  • VPA configurations

Viewing VPA Details

Get VPA

Kubernetesbash
kubectl get vpa
kubectl get vpa -o wide
kubectl get vpa --all-namespaces

Describe VPA

Kubernetesbash
kubectl describe vpa my-app-vpa

View VPA YAML

Kubernetesbash
kubectl get vpa my-app-vpa -o yaml

Check VPA Components

Kubernetesbash
kubectl get pods -n kube-system | grep vpa
kubectl logs -n kube-system <vpa-recommender-pod>

Deleting VPA

Kubernetesbash
kubectl delete vpa my-app-vpa

Pods continue running with current resource values.

Conclusion

In episode 31, we've explored Vertical Pod Autoscaler (VPA) in Kubernetes in depth. We've learned how VPA automatically right-sizes Pod resources based on actual usage, different update modes, and best practices for production use.

Key takeaways:

  • VPA automatically adjusts CPU and memory requests/limits
  • Analyzes historical usage to recommend optimal values
  • Four update modes: Off, Initial, Recreate, Auto
  • Off mode: Recommendations only (no changes)
  • Initial mode: Sets resources at Pod creation only
  • Recreate/Auto mode: Evicts and recreates Pods with new values
  • Requires VPA installation (not default in Kubernetes)
  • Resource policies define min/max boundaries
  • Don't combine VPA and HPA on same metrics
  • VPA requires Pod restart to apply changes
  • Use Off mode for testing and stateful workloads
  • Set min/max limits to prevent extreme values
  • Pod Disruption Budget protects availability
  • VPA is beta/experimental - test thoroughly
  • Best for right-sizing, not traffic scaling

Vertical Pod Autoscaler is essential for optimizing resource utilization in Kubernetes. By understanding VPA configuration and limitations, you can automatically right-size your Pods, reduce waste, and prevent resource-related failures without manual tuning.

Are you getting a clearer understanding of Vertical Pod Autoscaler in Kubernetes? Keep your learning momentum going and look forward to the next episode!

Note

If you want to continue to the next episode, you can click the Episode 32 thumbnail below

Episode 32Episode 32

Related Posts