Learning Kubernetes - Introduction and Explanation of Vertical Pod Autoscaler

#Introduction

In the previous episode, we learned about Horizontal Pod Autoscaler (HPA) which scales the number of Pods. In episode 31, we'll discuss Vertical Pod Autoscaler (VPA), which automatically adjusts CPU and memory requests and limits for containers based on actual usage.

Note: Here I'll be using a Kubernetes Cluster installed through K3s.

While HPA scales horizontally (more Pods), VPA scales vertically (bigger Pods). Setting the right resource requests is challenging - too low causes OOMKills and throttling, too high wastes resources. VPA solves this by continuously analyzing usage and recommending or applying optimal resource values.

#What Is Vertical Pod Autoscaler?

Vertical Pod Autoscaler (VPA) automatically adjusts CPU and memory requests and limits for containers based on historical and current resource usage.

Think of VPA like a tailor - it measures your actual size (resource usage) and adjusts your clothes (resource requests/limits) to fit perfectly. Instead of guessing sizes, VPA uses real data to right-size your Pods.

Key characteristics of VPA:

Automatic right-sizing - Adjusts resource requests/limits
Historical analysis - Uses past usage patterns
Recommendation mode - Suggests values without applying
Auto mode - Applies changes automatically
Initial mode - Sets resources at Pod creation only
Prevents waste - Reduces over-provisioning
Prevents failures - Avoids under-provisioning
Works with Deployments - Compatible with standard workloads

#VPA vs HPA

Understanding the key differences:

Aspect	VPA	HPA
Scaling Direction	Vertical (resource size)	Horizontal (replica count)
What Changes	CPU/Memory requests/limits	Number of Pods
Requires Restart	Yes (in Auto/Recreate mode)	No
Use Case	Right-size resources	Handle traffic spikes
Metrics	Historical usage	Current metrics
Response Time	Slower (requires restart)	Faster (add Pods)
Best For	Stateful workloads	Stateless workloads

Can use together:

VPA: Right-size individual Pods
HPA: Scale number of Pods
Combine for optimal resource utilization

Warning

Warning: Don't use VPA and HPA on the same CPU/memory metrics simultaneously - they can conflict. Use HPA for CPU/memory, VPA for other resources, or HPA for scaling and VPA in recommendation mode.

#Why Use VPA?

VPA solves critical resource management challenges:

Eliminate guesswork - No need to estimate resource needs
Optimize costs - Reduce over-provisioning waste
Prevent failures - Avoid OOMKills from under-provisioning
Adapt to changes - Adjust as application behavior evolves
Save time - No manual resource tuning
Improve efficiency - Better cluster utilization
Handle variability - Adapt to workload changes
Data-driven decisions - Based on actual usage

Without VPA, you either waste resources (over-provision) or risk failures (under-provision), and must manually adjust as applications change.

#How VPA Works

#VPA Components

VPA consists of three components:

1. Recommender:

Monitors resource usage
Analyzes historical data
Calculates recommended values
Updates VPA objects with recommendations

2. Updater:

Checks if Pods need updates
Evicts Pods that need resource changes
Respects Pod Disruption Budgets
Triggers Pod recreation

3. Admission Controller:

Intercepts Pod creation
Applies VPA recommendations
Sets resource requests/limits
Works as mutating webhook

#VPA Control Loop

Monitor - Recommender watches Pod metrics
Analyze - Calculate optimal resource values
Recommend - Update VPA object with recommendations
Decide - Updater checks if changes needed
Evict - Updater evicts Pods (if Auto mode)
Apply - Admission Controller sets new values on recreation

#Installing VPA

VPA is not installed by default. Let's install it.

#Prerequisites

Kubernetes cluster (1.11+)
Metrics Server installed
kubectl access

#Installation Steps

Clone VPA repository:

Install VPA:

This installs:

VPA CRDs (CustomResourceDefinitions)
VPA Recommender
VPA Updater
VPA Admission Controller

Verify installation:

Output:

Check VPA CRDs:

Output:

#VPA Update Modes

VPA supports different update modes:

#Off Mode (Recommendation Only)

VPA calculates recommendations but doesn't apply them:

Use case:

Testing VPA recommendations
Manual review before applying
Learning resource patterns
Generating reports

#Initial Mode

VPA sets resources only when Pods are created, never updates running Pods:

Use case:

Set initial resources for new Pods
Avoid disrupting running Pods
Gradual rollout of VPA

#Recreate Mode (Default)

VPA evicts and recreates Pods with new resource values:

Behavior:

Evicts Pods when resources need adjustment
Deployment controller recreates Pods
Admission Controller applies new values
Causes brief downtime per Pod

Use case:

Automatic resource optimization
Stateless applications
When brief disruption acceptable

#Auto Mode

VPA automatically updates Pods (currently same as Recreate):

Note

Auto mode currently behaves like Recreate. In-place updates (without Pod restart) are planned for future Kubernetes versions.

#Creating VPA

#Basic VPA Example

Create a Deployment:

Create VPA:

Apply:

#VPA with Resource Policy

Control which resources VPA can modify:

Resource Policy Options:

mode: Auto, Off (per container)
minAllowed: Minimum resource values
maxAllowed: Maximum resource values
controlledResources: Which resources to manage (cpu, memory)

#VPA for Specific Containers

Target specific containers in multi-container Pods:

#Viewing VPA Recommendations

#Get VPA Status

Output:

#Describe VPA

Output shows recommendations:

Recommendation Fields:

Lower Bound: Minimum recommended (avoid OOMKills)
Target: Recommended optimal value
Uncapped Target: Recommendation without policy limits
Upper Bound: Maximum recommended (avoid waste)

#View VPA YAML

#Practical Examples

#Example 1: Web Application with VPA

#Example 2: Database with VPA (Recommendation Only)

#Example 3: Microservice with Multiple Containers

#Example 4: VPA with Initial Mode

#Testing VPA

#Deploy Application

#Generate Load

Create load to trigger resource usage:

#Watch VPA Recommendations

#Check Pod Resources

Before VPA:

After VPA updates (in Auto mode):

#VPA Limitations

#Current Limitations

1. Requires Pod Restart:

VPA cannot update resources in-place
Pods must be evicted and recreated
Causes brief downtime

2. Not for Horizontal Scaling:

VPA adjusts resource size, not replica count
Use HPA for scaling replicas

3. Conflicts with HPA:

Don't use both on same CPU/memory metrics
Can cause scaling conflicts

4. No Downscaling Protection:

VPA can reduce resources aggressively
May cause issues if recommendations too low

5. Limited History:

Recommendations based on recent history
May not capture long-term patterns

6. Experimental Status:

VPA is still beta/experimental
Not recommended for critical production workloads without testing

#Common Mistakes and Pitfalls

#Mistake 1: Using VPA and HPA Together on Same Metrics

Problem: VPA and HPA conflict when both target CPU/memory.

Solution: Use different metrics or modes:

#Mistake 2: No Min/Max Limits

Problem: VPA can set extreme values.

Solution: Always set boundaries:

#Mistake 3: Using Auto Mode on Stateful Workloads

Problem: Pod eviction causes data loss or downtime.

Solution: Use Off or Initial mode for stateful apps:

#Mistake 4: No Initial Resources

Problem: VPA needs baseline to start from.

Solution: Always set initial requests:

#Mistake 5: Ignoring Recommendations

Problem: Running VPA in Off mode but never checking recommendations.

Solution: Regularly review and apply recommendations:

#Best Practices

#Start with Off Mode

Test VPA before enabling Auto mode:

#Set Appropriate Boundaries

Define min/max based on workload:

#Use Initial Mode for Gradual Rollout

Avoid disrupting running Pods:

#Monitor VPA Decisions

Track VPA behavior:

#Combine with Pod Disruption Budget

Protect availability during updates:

#Use for Right-Sizing, Not Scaling

VPA is for resource optimization, not traffic handling:

VPA: Right-size individual Pods
HPA: Scale for traffic
Cluster Autoscaler: Add nodes

#Document VPA Configuration

Add annotations explaining choices:

#Troubleshooting VPA

#VPA Not Providing Recommendations

Causes:

VPA components not running
Insufficient metrics data
Target workload not found

Solutions:

#VPA Not Updating Pods

Causes:

UpdateMode is Off or Initial
Pod Disruption Budget blocking evictions
Recommendations within current values

Solutions:

#Pods Constantly Restarting

Causes:

Recommendations oscillating
Min/max limits too narrow
Workload highly variable

Solutions:

#VPA Recommendations Too High/Low

Solutions:

#Uninstalling VPA

This removes:

VPA Deployments
VPA CRDs
VPA configurations

#Viewing VPA Details

#Get VPA

#Describe VPA

#View VPA YAML

#Check VPA Components

#Deleting VPA

Pods continue running with current resource values.

#Conclusion

In episode 31, we've explored Vertical Pod Autoscaler (VPA) in Kubernetes in depth. We've learned how VPA automatically right-sizes Pod resources based on actual usage, different update modes, and best practices for production use.

Key takeaways:

VPA automatically adjusts CPU and memory requests/limits
Analyzes historical usage to recommend optimal values
Four update modes: Off, Initial, Recreate, Auto
Off mode: Recommendations only (no changes)
Initial mode: Sets resources at Pod creation only
Recreate/Auto mode: Evicts and recreates Pods with new values
Requires VPA installation (not default in Kubernetes)
Resource policies define min/max boundaries
Don't combine VPA and HPA on same metrics
VPA requires Pod restart to apply changes
Use Off mode for testing and stateful workloads
Set min/max limits to prevent extreme values
Pod Disruption Budget protects availability
VPA is beta/experimental - test thoroughly
Best for right-sizing, not traffic scaling

Vertical Pod Autoscaler is essential for optimizing resource utilization in Kubernetes. By understanding VPA configuration and limitations, you can automatically right-size your Pods, reduce waste, and prevent resource-related failures without manual tuning.

Are you getting a clearer understanding of Vertical Pod Autoscaler in Kubernetes? Keep your learning momentum going and look forward to the next episode!