Learning Kubernetes - Introduction and Explanation of Taints and Tolerations

#Introduction

In the previous episode, we learned about RBAC and RoleBinding for authorization. In episode 34, we'll discuss Taints and Tolerations, which control which Pods can be scheduled on which nodes.

Note: Here I'll be using a Kubernetes Cluster installed through K3s.

While node affinity attracts Pods to nodes, taints and tolerations work the opposite way - taints repel Pods from nodes unless they have matching tolerations. This enables powerful workload placement strategies like dedicated nodes, GPU nodes, or nodes with special hardware.

#What Are Taints and Tolerations?

Taints are properties applied to nodes that repel Pods unless they have matching tolerations.

Tolerations are properties applied to Pods that allow them to be scheduled on nodes with matching taints.

Think of taints like "no entry" signs on nodes - by default, Pods cannot enter. Tolerations are like special passes that allow specific Pods to enter despite the "no entry" sign.

Key characteristics:

Taints - Applied to nodes, repel Pods
Tolerations - Applied to Pods, allow scheduling on tainted nodes
Key-value pairs - Taints and tolerations use key=value format
Effects - NoSchedule, PreferNoSchedule, NoExecute
Workload placement - Control which Pods run on which nodes
Dedicated nodes - Reserve nodes for specific workloads
Hardware affinity - Place Pods on nodes with specific hardware

#Taint Effects

#NoSchedule

Pods without matching toleration cannot be scheduled on the node.

Behavior:

New Pods without toleration: not scheduled
Existing Pods: continue running
Strict enforcement

#PreferNoSchedule

Kubernetes prefers not to schedule Pods without matching toleration, but will if necessary.

Behavior:

New Pods without toleration: scheduled if no other nodes available
Existing Pods: continue running
Soft enforcement

#NoExecute

Pods without matching toleration are evicted from the node.

Behavior:

New Pods without toleration: not scheduled
Existing Pods without toleration: evicted
Strictest enforcement

#Adding Taints to Nodes

#Add Single Taint

#Add Multiple Taints

Or in one command:

#View Taints

Output:

#Remove Taint

#Adding Tolerations to Pods

#Basic Toleration

#Toleration Operators

Equal - Value must match exactly:

Exists - Key must exist, value ignored:

#Multiple Tolerations

#Toleration with Timeout

For NoExecute effect, specify how long Pod can stay:

#Practical Examples

#Example 1: GPU Node

Dedicate node for GPU workloads:

Pod requesting GPU:

#Example 2: SSD Storage Node

Reserve node with fast storage:

Pod requiring SSD:

#Example 3: Maintenance Window

Temporarily evict Pods for maintenance:

Pod tolerating maintenance:

#Example 4: Dedicated Nodes

Reserve nodes for specific team:

Team A workload:

#Example 5: Wildcard Toleration

Tolerate any taint with specific key:

#Taints and Tolerations with Deployments

#Deployment with Toleration

#Combining with Node Affinity

Use taints/tolerations with node affinity for powerful placement:

#System Taints

Kubernetes automatically taints nodes in certain conditions:

#node.kubernetes.io/not-ready

Node is not ready:

#node.kubernetes.io/unreachable

Node is unreachable:

#node.kubernetes.io/memory-pressure

Node has memory pressure:

#node.kubernetes.io/disk-pressure

Node has disk pressure:

#node.kubernetes.io/pid-pressure

Node has PID pressure:

#node.kubernetes.io/network-unavailable

Node network unavailable:

#Viewing Taints

#Check Node Taints

#Get All Nodes with Taints

#Check Pod Tolerations

#Common Mistakes and Pitfalls

#Mistake 1: Forgetting Toleration

Problem: Pod cannot be scheduled on tainted node.

Solution: Add toleration to Pod:

#Mistake 2: Wrong Operator

Problem: Toleration doesn't match taint.

Solution: Use correct operator:

#Mistake 3: Mismatched Effect

Problem: Toleration effect doesn't match taint effect.

Solution: Match effect:

#Mistake 4: Tainting All Nodes

Problem: Tainting all nodes without tolerations.

Solution: Taint only specific nodes:

#Mistake 5: Not Removing Taints

Problem: Temporary taints left on nodes.

Solution: Remove taints when done:

#Best Practices

#Use Descriptive Taint Keys

#Document Taint Purpose

#Use PreferNoSchedule for Soft Constraints

For non-critical workloads:

#Combine with Node Affinity

For precise placement:

#Set Toleration Timeout for NoExecute

Prevent indefinite Pod eviction:

#Regular Taint Audits

Review taints regularly:

#Troubleshooting

#Pod Not Scheduling

Solution: Add matching toleration:

#Pod Evicted from Node

Solution: Add NoExecute toleration with timeout:

#Taint Not Taking Effect

#Viewing Taint and Toleration Details

#Get Node Taints

#Get Pod Tolerations

#Describe Node

#Removing Taints

#Remove Specific Taint

#Remove All Taints

#Conclusion

In episode 34, we've explored Taints and Tolerations in Kubernetes in depth. We've learned how to use taints to repel Pods from nodes and tolerations to allow specific Pods on tainted nodes.

Key takeaways:

Taints repel Pods from nodes
Tolerations allow Pods on tainted nodes
Three effects: NoSchedule, PreferNoSchedule, NoExecute
NoSchedule - Prevent scheduling
PreferNoSchedule - Soft constraint
NoExecute - Evict existing Pods
Operators: Equal (exact match), Exists (key only)
Use cases: GPU nodes, SSD nodes, dedicated nodes, maintenance
Combine with node affinity for precise placement
System taints for node conditions
Toleration timeout for NoExecute effect
Document taint purposes
Regular audits of taints
Remove taints when no longer needed

Taints and tolerations are powerful tools for workload placement in Kubernetes. By understanding how to use them effectively, you can optimize resource utilization, dedicate nodes for specific workloads, and manage maintenance windows gracefully.