Belajar Kubernetes - Episode 34 - Pengenalan dan Penjelasan Taints dan Tolerations

Belajar Kubernetes - Episode 34 - Pengenalan dan Penjelasan Taints dan Tolerations

Di episode ini kita akan coba bahas Kubernetes Taints dan Tolerations untuk node affinity control. Kita akan mempelajari bagaimana taint repel Pod, bagaimana toleration allow Pod scheduled di tainted node, dan best practice untuk workload placement.

Arman Dwi Pangestu
Arman Dwi PangestuApril 9, 2026
0 views
7 min read

Pendahuluan

Catatan

Untuk kalian yang ingin membaca episode sebelumnya, bisa click thumbnail episode 33 di bawah ini

Episode 33Episode 33

Di episode sebelumnya kita sudah belajar tentang RBAC dan RoleBinding untuk authorization. Selanjutnya di episode 34 kali ini, kita akan coba bahas Taints dan Tolerations, yang control Pod mana yang bisa scheduled di node mana.

Catatan: Disini saya akan menggunakan Kubernetes Cluster yang di install melalui K3s.

Sementara node affinity attract Pod ke node, taint dan toleration work opposite way - taint repel Pod dari node kecuali mereka punya matching toleration. Ini enable powerful workload placement strategy seperti dedicated node, GPU node, atau node dengan special hardware.

Apa Itu Taints dan Tolerations?

Taints adalah property applied ke node yang repel Pod kecuali mereka punya matching toleration.

Tolerations adalah property applied ke Pod yang allow mereka scheduled di node dengan matching taint.

Bayangkan taint seperti "no entry" sign di node - by default, Pod tidak bisa enter. Toleration seperti special pass yang allow specific Pod untuk enter despite "no entry" sign.

Karakteristik kunci:

  • Taints - Applied ke node, repel Pod
  • Tolerations - Applied ke Pod, allow scheduling di tainted node
  • Key-value pair - Taint dan toleration gunakan key=value format
  • Effect - NoSchedule, PreferNoSchedule, NoExecute
  • Workload placement - Control Pod mana yang run di node mana
  • Dedicated node - Reserve node untuk specific workload
  • Hardware affinity - Place Pod di node dengan specific hardware

Taint Effect

NoSchedule

Pod tanpa matching toleration tidak bisa scheduled di node.

Kubernetesbash
kubectl taint nodes node-1 gpu=true:NoSchedule

Behavior:

  • New Pod tanpa toleration: tidak scheduled
  • Existing Pod: continue running
  • Strict enforcement

PreferNoSchedule

Kubernetes prefer tidak schedule Pod tanpa matching toleration, tapi akan jika necessary.

Kubernetesbash
kubectl taint nodes node-1 gpu=true:PreferNoSchedule

Behavior:

  • New Pod tanpa toleration: scheduled jika no other node available
  • Existing Pod: continue running
  • Soft enforcement

NoExecute

Pod tanpa matching toleration di evict dari node.

Kubernetesbash
kubectl taint nodes node-1 gpu=true:NoExecute

Behavior:

  • New Pod tanpa toleration: tidak scheduled
  • Existing Pod tanpa toleration: evicted
  • Strictest enforcement

Menambah Taint ke Node

Add Single Taint

Kubernetesbash
kubectl taint nodes node-1 gpu=true:NoSchedule

Add Multiple Taint

Kubernetesbash
kubectl taint nodes node-1 gpu=true:NoSchedule
kubectl taint nodes node-1 storage=ssd:NoSchedule

Atau dalam satu command:

Kubernetesbash
kubectl taint nodes node-1 gpu=true:NoSchedule storage=ssd:NoSchedule

View Taint

Kubernetesbash
kubectl describe node node-1 | grep Taints

Output:

Kubernetesbash
Taints:             gpu=true:NoSchedule,storage=ssd:NoSchedule

Remove Taint

Kubernetesbash
# Remove specific taint
kubectl taint nodes node-1 gpu=true:NoSchedule-
 
# Remove semua taint
kubectl taint nodes node-1 gpu- storage-

Menambah Toleration ke Pod

Basic Toleration

Kubernetespod-with-toleration.yml
apiVersion: v1
kind: Pod
metadata:
    name: gpu-pod
spec:
    tolerations:
        - key: gpu
          operator: Equal
          value: "true"
          effect: NoSchedule
    containers:
        - name: app
          image: nvidia/cuda:11.0

Toleration Operator

Equal - Value harus match exactly:

Kubernetesyml
tolerations:
    - key: gpu
      operator: Equal
      value: "true"
      effect: NoSchedule

Exists - Key harus exist, value ignored:

Kubernetesyml
tolerations:
    - key: gpu
      operator: Exists
      effect: NoSchedule

Multiple Toleration

Kubernetesmulti-toleration.yml
apiVersion: v1
kind: Pod
metadata:
    name: special-pod
spec:
    tolerations:
        - key: gpu
          operator: Equal
          value: "true"
          effect: NoSchedule
        - key: storage
          operator: Equal
          value: ssd
          effect: NoSchedule
    containers:
        - name: app
          image: myapp:latest

Toleration dengan Timeout

Untuk NoExecute effect, specify berapa lama Pod bisa stay:

Kubernetestoleration-timeout.yml
apiVersion: v1
kind: Pod
metadata:
    name: temporary-pod
spec:
    tolerations:
        - key: maintenance
          operator: Equal
          value: "true"
          effect: NoExecute
          tolerationSeconds: 3600  # 1 jam
    containers:
        - name: app
          image: myapp:latest

Contoh Praktis

Contoh 1: GPU Node

Dedicate node untuk GPU workload:

Kubernetesbash
# Taint GPU node
kubectl taint nodes gpu-node gpu=true:NoSchedule

Pod requesting GPU:

Kubernetesgpu-pod.yml
apiVersion: v1
kind: Pod
metadata:
    name: gpu-workload
spec:
    tolerations:
        - key: gpu
          operator: Equal
          value: "true"
          effect: NoSchedule
    containers:
        - name: gpu-app
          image: nvidia/cuda:11.0
          resources:
              limits:
                  nvidia.com/gpu: 1

Contoh 2: SSD Storage Node

Reserve node dengan fast storage:

Kubernetesbash
# Taint SSD node
kubectl taint nodes ssd-node storage=ssd:NoSchedule

Pod requiring SSD:

Kubernetesssd-pod.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: database
spec:
    replicas: 1
    selector:
        matchLabels:
            app: database
    template:
        metadata:
            labels:
                app: database
        spec:
            tolerations:
                - key: storage
                  operator: Equal
                  value: ssd
                  effect: NoSchedule
            containers:
                - name: postgres
                  image: postgres:15

Contoh 3: Maintenance Window

Temporarily evict Pod untuk maintenance:

Kubernetesbash
# Taint node untuk maintenance
kubectl taint nodes node-1 maintenance=true:NoExecute

Pod tolerating maintenance:

Kubernetesmaintenance-tolerant.yml
apiVersion: v1
kind: Pod
metadata:
    name: maintenance-pod
spec:
    tolerations:
        - key: maintenance
          operator: Equal
          value: "true"
          effect: NoExecute
          tolerationSeconds: 300  # 5 menit
    containers:
        - name: app
          image: myapp:latest

Contoh 4: Dedicated Node

Reserve node untuk specific team:

Kubernetesbash
# Taint node untuk team-a
kubectl taint nodes node-1 team=a:NoSchedule
kubectl taint nodes node-2 team=a:NoSchedule

Team A workload:

Kubernetesteam-a-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: team-a-app
spec:
    replicas: 3
    selector:
        matchLabels:
            app: team-a-app
    template:
        metadata:
            labels:
                app: team-a-app
        spec:
            tolerations:
                - key: team
                  operator: Equal
                  value: a
                  effect: NoSchedule
            containers:
                - name: app
                  image: team-a-app:latest

Contoh 5: Wildcard Toleration

Tolerate any taint dengan specific key:

Kuberneteswildcard-toleration.yml
apiVersion: v1
kind: Pod
metadata:
    name: flexible-pod
spec:
    tolerations:
        - key: workload-type
          operator: Exists  # Accept any value
          effect: NoSchedule
    containers:
        - name: app
          image: myapp:latest

Taints dan Tolerations dengan Deployment

Deployment dengan Toleration

Kubernetesdeployment-toleration.yml
apiVersion: apps/v1
kind: Deployment
metadata:
    name: special-workload
spec:
    replicas: 3
    selector:
        matchLabels:
            app: special
    template:
        metadata:
            labels:
                app: special
        spec:
            tolerations:
                - key: workload-type
                  operator: Equal
                  value: special
                  effect: NoSchedule
            containers:
                - name: app
                  image: special-app:latest
                  resources:
                      requests:
                          memory: "256Mi"
                          cpu: "250m"
                      limits:
                          memory: "512Mi"
                          cpu: "500m"

Combine dengan Node Affinity

Gunakan taint/toleration dengan node affinity untuk powerful placement:

Kubernetesaffinity-and-toleration.yml
apiVersion: v1
kind: Pod
metadata:
    name: placed-pod
spec:
    # Tolerate taint
    tolerations:
        - key: gpu
          operator: Equal
          value: "true"
          effect: NoSchedule
    # Prefer GPU node
    affinity:
        nodeAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
                - weight: 100
                  preference:
                      matchExpressions:
                          - key: gpu
                            operator: In
                            values:
                                - "true"
    containers:
        - name: app
          image: gpu-app:latest

System Taint

Kubernetes automatically taint node dalam certain condition:

node.kubernetes.io/not-ready

Node tidak ready:

Kubernetesbash
Taints: node.kubernetes.io/not-ready:NoExecute

node.kubernetes.io/unreachable

Node unreachable:

Kubernetesbash
Taints: node.kubernetes.io/unreachable:NoExecute

node.kubernetes.io/memory-pressure

Node punya memory pressure:

Kubernetesbash
Taints: node.kubernetes.io/memory-pressure:NoSchedule

node.kubernetes.io/disk-pressure

Node punya disk pressure:

Kubernetesbash
Taints: node.kubernetes.io/disk-pressure:NoSchedule

node.kubernetes.io/pid-pressure

Node punya PID pressure:

Kubernetesbash
Taints: node.kubernetes.io/pid-pressure:NoSchedule

node.kubernetes.io/network-unavailable

Node network unavailable:

Kubernetesbash
Taints: node.kubernetes.io/network-unavailable:NoSchedule

Viewing Taint dan Toleration

Check Node Taint

Kubernetesbash
kubectl describe node node-1 | grep Taints

Get Semua Node dengan Taint

Kubernetesbash
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints

Check Pod Toleration

Kubernetesbash
kubectl get pod gpu-pod -o yaml | grep -A 10 tolerations

Kesalahan Umum dan Pitfall

Kesalahan 1: Lupa Toleration

Problem: Pod tidak bisa scheduled di tainted node.

Kubernetesbash
# Node tainted
kubectl taint nodes node-1 gpu=true:NoSchedule
 
# Pod tanpa toleration - tidak scheduled
kubectl run gpu-pod --image=nvidia/cuda:11.0

Solusi: Add toleration ke Pod:

Kubernetesyml
tolerations:
    - key: gpu
      operator: Equal
      value: "true"
      effect: NoSchedule

Kesalahan 2: Wrong Operator

Problem: Toleration tidak match taint.

Kubernetesyml
# Bad: Wrong operator
tolerations:
    - key: gpu
      operator: In  # Wrong! Should be Equal
      values: ["true"]
      effect: NoSchedule

Solusi: Gunakan correct operator:

Kubernetesyml
# Good: Correct operator
tolerations:
    - key: gpu
      operator: Equal
      value: "true"
      effect: NoSchedule

Kesalahan 3: Mismatched Effect

Problem: Toleration effect tidak match taint effect.

Kubernetesbash
# Taint dengan NoExecute
kubectl taint nodes node-1 gpu=true:NoExecute
Kubernetesyml
# Bad: Wrong effect
tolerations:
    - key: gpu
      operator: Equal
      value: "true"
      effect: NoSchedule  # Wrong! Should be NoExecute

Solusi: Match effect:

Kubernetesyml
# Good: Matching effect
tolerations:
    - key: gpu
      operator: Equal
      value: "true"
      effect: NoExecute

Kesalahan 4: Tainting Semua Node

Problem: Tainting semua node tanpa toleration.

Kubernetesbash
# Bad: Taint semua node
for node in $(kubectl get nodes -o name); do
    kubectl taint $node special=true:NoSchedule
done

Solusi: Taint hanya specific node:

Kubernetesbash
# Good: Taint hanya GPU node
kubectl taint nodes gpu-node gpu=true:NoSchedule

Kesalahan 5: Tidak Remove Taint

Problem: Temporary taint left di node.

Solusi: Remove taint ketika done:

Kubernetesbash
kubectl taint nodes node-1 gpu=true:NoSchedule-

Best Practice

Gunakan Descriptive Taint Key

Kubernetesbash
# Good: Clear purpose
kubectl taint nodes gpu-node gpu=true:NoSchedule
kubectl taint nodes ssd-node storage=ssd:NoSchedule
 
# Avoid: Vague name
kubectl taint nodes node-1 special=true:NoSchedule

Document Taint Purpose

Kubernetesbash
# Add label untuk document
kubectl label nodes gpu-node node-type=gpu
kubectl label nodes ssd-node node-type=ssd

Gunakan PreferNoSchedule untuk Soft Constraint

Untuk non-critical workload:

Kubernetesbash
kubectl taint nodes node-1 workload=batch:PreferNoSchedule

Combine dengan Node Affinity

Untuk precise placement:

Kubernetesyml
spec:
    tolerations:
        - key: gpu
          operator: Equal
          value: "true"
          effect: NoSchedule
    affinity:
        nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                    - matchExpressions:
                          - key: gpu
                            operator: In
                            values:
                                - "true"

Set Toleration Timeout untuk NoExecute

Prevent indefinite Pod eviction:

Kubernetesyml
tolerations:
    - key: maintenance
      operator: Equal
      value: "true"
      effect: NoExecute
      tolerationSeconds: 3600  # 1 jam

Regular Taint Audit

Review taint regularly:

Kubernetesbash
# List semua taint
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
 
# Check untuk orphaned taint

Troubleshooting

Pod Not Scheduling

Kubernetesbash
kubectl describe pod gpu-pod
# Event show: node(s) had taint yang pod tidak tolerate

Solusi: Add matching toleration:

Kubernetesbash
# Check node taint
kubectl describe node node-1 | grep Taints
 
# Add toleration ke Pod

Pod Evicted dari Node

Kubernetesbash
kubectl describe pod pod-name
# Status: Evicted
# Reason: Tainted node

Solusi: Add NoExecute toleration dengan timeout:

Kubernetesyml
tolerations:
    - key: maintenance
      operator: Equal
      value: "true"
      effect: NoExecute
      tolerationSeconds: 3600

Taint Not Taking Effect

Kubernetesbash
# Verify taint applied
kubectl describe node node-1 | grep Taints
 
# Check jika Pod punya toleration
kubectl get pod -o yaml | grep -A 5 tolerations

Viewing Taint dan Toleration Detail

Get Node Taint

Kubernetesbash
kubectl get nodes -o json | jq '.items[].spec.taints'

Get Pod Toleration

Kubernetesbash
kubectl get pods -o json | jq '.items[].spec.tolerations'

Describe Node

Kubernetesbash
kubectl describe node node-1
# Show Taint section

Removing Taint

Remove Specific Taint

Kubernetesbash
kubectl taint nodes node-1 gpu=true:NoSchedule-

Remove Semua Taint

Kubernetesbash
kubectl taint nodes node-1 gpu- storage- workload-

Penutup

Pada episode 34 ini, kita telah membahas Taints dan Tolerations di Kubernetes secara mendalam. Kita sudah belajar cara gunakan taint untuk repel Pod dari node dan toleration untuk allow specific Pod di tainted node.

Key takeaway:

  • Taints repel Pod dari node
  • Tolerations allow Pod di tainted node
  • Tiga effect: NoSchedule, PreferNoSchedule, NoExecute
  • NoSchedule - Prevent scheduling
  • PreferNoSchedule - Soft constraint
  • NoExecute - Evict existing Pod
  • Operator: Equal (exact match), Exists (key only)
  • Use case: GPU node, SSD node, dedicated node, maintenance
  • Combine dengan node affinity untuk precise placement
  • System taint untuk node condition
  • Toleration timeout untuk NoExecute effect
  • Document taint purpose
  • Regular audit taint
  • Remove taint ketika no longer needed

Taints dan Tolerations adalah powerful tool untuk workload placement di Kubernetes. Dengan memahami cara gunakan mereka effectively, kalian bisa optimize resource utilization, dedicate node untuk specific workload, dan manage maintenance window gracefully.

Catatan

Untuk kalian yang ingin melanjutkan ke episode selanjutnya, bisa click thumbnail episode 35 di bawah ini

Episode 35Episode 35

Related Posts