In this episode, we'll discuss Kubernetes Vertical Pod Autoscaler (VPA) for automatic resource sizing. We'll learn how VPA works, how to install and configure it, update modes, and best practices for right-sizing Pod resources automatically.

Note
If you want to read the previous episode, you can click the Episode 30 thumbnail below
In the previous episode, we learned about Horizontal Pod Autoscaler (HPA) which scales the number of Pods. In episode 31, we'll discuss Vertical Pod Autoscaler (VPA), which automatically adjusts CPU and memory requests and limits for containers based on actual usage.
Note: Here I'll be using a Kubernetes Cluster installed through K3s.
While HPA scales horizontally (more Pods), VPA scales vertically (bigger Pods). Setting the right resource requests is challenging - too low causes OOMKills and throttling, too high wastes resources. VPA solves this by continuously analyzing usage and recommending or applying optimal resource values.
Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests and limits for containers based on historical and current resource usage.
Think of VPA like a tailor - it measures your actual size (resource usage) and adjusts your clothes (resource requests/limits) to fit perfectly. Instead of guessing sizes, VPA uses real data to right-size your Pods.
Key characteristics of VPA:
Understanding the key differences:
| Aspect | VPA | HPA |
|---|---|---|
| Scaling Direction | Vertical (resource size) | Horizontal (replica count) |
| What Changes | CPU/Memory requests/limits | Number of Pods |
| Requires Restart | Yes (in Auto/Recreate mode) | No |
| Use Case | Right-size resources | Handle traffic spikes |
| Metrics | Historical usage | Current metrics |
| Response Time | Slower (requires restart) | Faster (add Pods) |
| Best For | Stateful workloads | Stateless workloads |
Can use together:
Warning
Warning: Don't use VPA and HPA on the same CPU/memory metrics simultaneously - they can conflict. Use HPA for CPU/memory, VPA for other resources, or HPA for scaling and VPA in recommendation mode.
VPA solves critical resource management challenges:
Without VPA, you either waste resources (over-provision) or risk failures (under-provision), and must manually adjust as applications change.
VPA consists of three components:
1. Recommender:
2. Updater:
3. Admission Controller:
VPA is not installed by default. Let's install it.
Clone VPA repository:
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscalerInstall VPA:
./hack/vpa-up.shThis installs:
Verify installation:
kubectl get pods -n kube-system | grep vpaOutput:
vpa-admission-controller-xxx 1/1 Running 0 1m
vpa-recommender-xxx 1/1 Running 0 1m
vpa-updater-xxx 1/1 Running 0 1mCheck VPA CRDs:
kubectl get crd | grep verticalpodautoscalerOutput:
verticalpodautoscalercheckpoints.autoscaling.k8s.io
verticalpodautoscalers.autoscaling.k8s.ioVPA supports different update modes:
VPA calculates recommendations but doesn't apply them:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Off"Use case:
VPA sets resources only when Pods are created, never updates running Pods:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Initial"Use case:
VPA evicts and recreates Pods with new resource values:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Recreate"Behavior:
Use case:
VPA automatically updates Pods (currently same as Recreate):
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"Note
Auto mode currently behaves like Recreate. In-place updates (without Pod restart) are planned for future Kubernetes versions.
Create a Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 2
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: app
image: nginx:1.25
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "200m"
memory: "256Mi"Create VPA:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"Apply:
kubectl apply -f app-deployment.yml
kubectl apply -f app-vpa.ymlControl which resources VPA can modify:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "app"
mode: "Auto"
minAllowed:
cpu: "50m"
memory: "64Mi"
maxAllowed:
cpu: "1000m"
memory: "1Gi"
controlledResources:
- cpu
- memoryResource Policy Options:
Target specific containers in multi-container Pods:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
# Main application container
- containerName: "app"
mode: "Auto"
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "2000m"
memory: "2Gi"
# Sidecar container
- containerName: "sidecar"
mode: "Off" # Don't modify sidecarkubectl get vpaOutput:
NAME MODE CPU MEM PROVIDED AGE
my-app-vpa Auto 150m 256Mi True 5mkubectl describe vpa my-app-vpaOutput shows recommendations:
Name: my-app-vpa
Namespace: default
API Version: autoscaling.k8s.io/v1
Kind: VerticalPodAutoscaler
Recommendation:
Container Recommendations:
Container Name: app
Lower Bound:
Cpu: 100m
Memory: 128Mi
Target:
Cpu: 150m
Memory: 256Mi
Uncapped Target:
Cpu: 150m
Memory: 256Mi
Upper Bound:
Cpu: 300m
Memory: 512MiRecommendation Fields:
kubectl get vpa my-app-vpa -o yamlapiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: nginx
image: nginx:1.25
ports:
- containerPort: 80
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "nginx"
minAllowed:
cpu: "50m"
memory: "64Mi"
maxAllowed:
cpu: "1000m"
memory: "1Gi"apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2000m"
memory: "2Gi"
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: postgres-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: StatefulSet
name: postgres
updatePolicy:
updateMode: "Off" # Recommendation only for databaseapiVersion: apps/v1
kind: Deployment
metadata:
name: api-service
spec:
replicas: 3
selector:
matchLabels:
app: api
template:
metadata:
labels:
app: api
spec:
containers:
- name: api
image: myapi:latest
resources:
requests:
cpu: "200m"
memory: "256Mi"
- name: log-agent
image: fluent/fluentd:v1.16
resources:
requests:
cpu: "50m"
memory: "64Mi"
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-service-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: "api"
mode: "Auto"
minAllowed:
cpu: "100m"
memory: "128Mi"
maxAllowed:
cpu: "2000m"
memory: "2Gi"
- containerName: "log-agent"
mode: "Auto"
minAllowed:
cpu: "25m"
memory: "32Mi"
maxAllowed:
cpu: "200m"
memory: "256Mi"apiVersion: apps/v1
kind: Deployment
metadata:
name: worker
spec:
replicas: 5
selector:
matchLabels:
app: worker
template:
metadata:
labels:
app: worker
spec:
containers:
- name: worker
image: myworker:latest
resources:
requests:
cpu: "100m"
memory: "128Mi"
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: worker-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: worker
updatePolicy:
updateMode: "Initial" # Only set on Pod creationkubectl apply -f app-deployment.yml
kubectl apply -f app-vpa.ymlCreate load to trigger resource usage:
kubectl run -it --rm load-generator --image=busybox:1.36 --restart=Never -- /bin/sh
# Generate CPU load
while true; do :; donekubectl get vpa my-app-vpa --watchBefore VPA:
kubectl get pod <pod-name> -o yaml | grep -A 5 resourcesAfter VPA updates (in Auto mode):
# VPA will evict and recreate Pods
kubectl get pods -w
# Check new resource values
kubectl get pod <new-pod-name> -o yaml | grep -A 5 resources1. Requires Pod Restart:
2. Not for Horizontal Scaling:
3. Conflicts with HPA:
4. No Downscaling Protection:
5. Limited History:
6. Experimental Status:
Problem: VPA and HPA conflict when both target CPU/memory.
# Bad: Both VPA and HPA on CPU
# VPA adjusts CPU requests
# HPA scales based on CPU utilization
# They fight each otherSolution: Use different metrics or modes:
# Option 1: VPA in Off mode (recommendations only)
updatePolicy:
updateMode: "Off"
# Option 2: HPA on CPU, VPA on memory only
resourcePolicy:
containerPolicies:
- containerName: "app"
controlledResources:
- memory # VPA only manages memoryProblem: VPA can set extreme values.
Solution: Always set boundaries:
resourcePolicy:
containerPolicies:
- containerName: "app"
minAllowed:
cpu: "50m"
memory: "64Mi"
maxAllowed:
cpu: "2000m"
memory: "4Gi"Problem: Pod eviction causes data loss or downtime.
Solution: Use Off or Initial mode for stateful apps:
# For databases, use Off mode
updatePolicy:
updateMode: "Off"Problem: VPA needs baseline to start from.
Solution: Always set initial requests:
resources:
requests:
cpu: "100m" # Set reasonable initial values
memory: "128Mi"Problem: Running VPA in Off mode but never checking recommendations.
Solution: Regularly review and apply recommendations:
kubectl describe vpa my-app-vpa
# Review Target recommendations
# Update Deployment manually if neededTest VPA before enabling Auto mode:
# Phase 1: Observe recommendations
updatePolicy:
updateMode: "Off"
# Phase 2: After validation, enable Auto
updatePolicy:
updateMode: "Auto"Define min/max based on workload:
resourcePolicy:
containerPolicies:
- containerName: "app"
minAllowed:
cpu: "100m" # Minimum for functionality
memory: "128Mi"
maxAllowed:
cpu: "2000m" # Maximum for cost control
memory: "2Gi"Avoid disrupting running Pods:
updatePolicy:
updateMode: "Initial" # Only affects new PodsTrack VPA behavior:
# Watch VPA recommendations
kubectl get vpa --watch
# Check VPA events
kubectl describe vpa my-app-vpa
# Monitor Pod evictions
kubectl get events --sort-by='.lastTimestamp'Protect availability during updates:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: my-appVPA is for resource optimization, not traffic handling:
Add annotations explaining choices:
metadata:
annotations:
vpa.note: "Auto mode with 100m-2000m CPU range based on load testing"kubectl describe vpa my-app-vpa
# Status shows: No recommendation availableCauses:
Solutions:
# Check VPA components
kubectl get pods -n kube-system | grep vpa
# Check target exists
kubectl get deployment my-app
# Wait for metrics collection (5-10 minutes)# Pods not being evicted despite recommendationsCauses:
Solutions:
# Check update mode
kubectl get vpa my-app-vpa -o yaml | grep updateMode
# Check PDB
kubectl get pdb
# Check if recommendations differ from current
kubectl describe vpa my-app-vpa# VPA keeps evicting PodsCauses:
Solutions:
# Widen min/max range
maxAllowed:
cpu: "4000m" # Increase from 2000m
memory: "4Gi" # Increase from 2Gi
# Or switch to Off mode
updatePolicy:
updateMode: "Off"kubectl describe vpa my-app-vpa
# Target: 4000m CPU (seems too high)Solutions:
# Set maxAllowed to cap recommendations
resourcePolicy:
containerPolicies:
- containerName: "app"
maxAllowed:
cpu: "2000m"
memory: "2Gi"cd autoscaler/vertical-pod-autoscaler
./hack/vpa-down.shThis removes:
kubectl get vpa
kubectl get vpa -o wide
kubectl get vpa --all-namespaceskubectl describe vpa my-app-vpakubectl get vpa my-app-vpa -o yamlkubectl get pods -n kube-system | grep vpa
kubectl logs -n kube-system <vpa-recommender-pod>kubectl delete vpa my-app-vpaPods continue running with current resource values.
In episode 31, we've explored Vertical Pod Autoscaler (VPA) in Kubernetes in depth. We've learned how VPA automatically right-sizes Pod resources based on actual usage, different update modes, and best practices for production use.
Key takeaways:
Vertical Pod Autoscaler is essential for optimizing resource utilization in Kubernetes. By understanding VPA configuration and limitations, you can automatically right-size your Pods, reduce waste, and prevent resource-related failures without manual tuning.
Are you getting a clearer understanding of Vertical Pod Autoscaler in Kubernetes? Keep your learning momentum going and look forward to the next episode!
Note
If you want to continue to the next episode, you can click the Episode 32 thumbnail below