Learning Kubernetes - Introduction and Explanation of Node Selector

#Introduction

In the previous episode, we learned about CronJob, which creates Jobs on a time-based schedule. In episode 16, we'll discuss Node Selector, a fundamental concept for controlling where Pods run in your cluster.

Note: Here I'll be using a Kubernetes Cluster installed through K3s.

By default, Kubernetes scheduler automatically places Pods on available nodes. But sometimes you need control over Pod placement - maybe you want GPU workloads on GPU nodes, or production Pods on high-performance nodes. Node Selector provides a simple way to achieve this.

#What Is Node Selector?

Node Selector is the simplest way to constrain Pods to run on specific nodes. It uses label matching to select nodes where Pods should be scheduled. You add labels to nodes, then specify those labels in Pod specifications using nodeSelector.

Think of Node Selector like filtering - you label nodes with characteristics (GPU, SSD, high-memory), then tell Pods to only run on nodes with specific labels. The scheduler only considers nodes that match all specified labels.

Key characteristics of Node Selector:

Label-based selection - Uses key-value labels to match nodes
Simple syntax - Easy to understand and implement
Equality matching - Only supports exact label matches
Multiple labels - Can specify multiple labels (AND logic)
Scheduling constraint - Pods won't schedule if no matching nodes exist
Node labeling - Requires manual node labeling

#Why Do We Need Node Selector?

Node Selector is useful for various scenarios where you need control over Pod placement:

Hardware requirements - Schedule GPU workloads on GPU nodes
Storage types - Place Pods on nodes with SSD or NVMe storage
Environment separation - Keep production and development Pods separate
Geographic location - Schedule Pods in specific regions or zones
Node capabilities - Use nodes with specific CPU architectures
Cost optimization - Use cheaper nodes for non-critical workloads
Compliance - Keep sensitive workloads on specific nodes
Performance - Schedule high-performance apps on powerful nodes

Without Node Selector, you would need to:

Manually schedule Pods on specific nodes
Use more complex affinity rules
Accept random Pod placement by scheduler

#Node Labels

Before using Node Selector, you need to understand node labels. Labels are key-value pairs attached to nodes.

#Viewing Node Labels

Check existing node labels:

View labels for a specific node:

#Built-in Node Labels

Kubernetes automatically adds several labels to nodes:

kubernetes.io/hostname - Node's hostname
kubernetes.io/os - Operating system (linux, windows)
kubernetes.io/arch - CPU architecture (amd64, arm64)
node.kubernetes.io/instance-type - Cloud instance type
topology.kubernetes.io/region - Cloud region
topology.kubernetes.io/zone - Cloud availability zone

#Adding Custom Labels

Add a label to a node:

Example - label a node with SSD storage:

Example - label a node as production:

Example - label a node with GPU:

#Removing Labels

Remove a label from a node:

Example:

#Updating Labels

Update an existing label:

Example:

#Using Node Selector

Once nodes are labeled, you can use nodeSelector in Pod specifications.

#Example 1: Basic Node Selector

First, label a node:

Create a Pod with node selector:

Apply the configuration:

Verify Pod placement:

The Pod will only schedule on nodes with disktype=ssd label.

#Example 2: Multiple Label Selectors

You can specify multiple labels (all must match):

First, label a node with multiple labels:

Create a Pod requiring both labels:

This Pod only schedules on nodes with BOTH disktype=ssd AND environment=production.

#Example 3: Using Built-in Labels

Use Kubernetes built-in labels:

This Pod only runs on Linux nodes with AMD64 architecture.

#Node Selector with Deployments

Node Selector works with all Pod controllers:

#Example 1: Deployment with Node Selector

All 3 replicas will only schedule on nodes with both labels.

#Example 2: DaemonSet with Node Selector

This DaemonSet only runs on nodes labeled with monitoring=enabled.

#Example 3: Job with Node Selector

This Job runs on nodes suitable for batch processing with high memory.

#Practical Examples

#Example 1: GPU Workload

Label GPU nodes:

Create GPU workload:

#Example 2: Environment Separation

Label nodes by environment:

Production deployment:

Development deployment:

#Example 3: Storage Type Selection

Label nodes by storage type:

Database on SSD:

Log storage on HDD:

#Example 4: Geographic Placement

Label nodes by region:

Deploy to specific region:

#Example 5: Cost Optimization

Label nodes by cost tier:

Critical workload on high-performance nodes:

Non-critical workload on standard nodes:

#Troubleshooting Node Selector

#Pod Stuck in Pending State

If a Pod is pending, check if matching nodes exist:

Look for events like:

plaintext

Warning  FailedScheduling  Pod didn't match node selector

Check available nodes with required labels:

If no nodes match, either:

Add the label to a node
Remove/modify the nodeSelector

#Checking Pod Placement

Verify where Pods are running:

Check if Pods are on expected nodes:

#Listing Pods by Node

See all Pods on a specific node:

#Common Mistakes and Pitfalls

#Mistake 1: Typo in Label Names

Problem: Label name mismatch between node and Pod.

Solution: Double-check label names:

#Mistake 2: No Matching Nodes

Problem: No nodes have the required labels.

Solution: Verify nodes with required labels exist:

#Mistake 3: Forgetting to Label New Nodes

Problem: New nodes added without required labels.

Solution: Create a checklist or automation for labeling new nodes:

#Mistake 4: Using OR Logic

Problem: Expecting OR logic, but nodeSelector uses AND.

Solution: Node Selector only supports AND logic. For OR logic, use Node Affinity (covered in next episode).

#Mistake 5: Overconstraining Pods

Problem: Too many nodeSelector constraints prevent scheduling.

Solution: Use only necessary constraints:

#Mistake 6: Not Considering Node Capacity

Problem: All matching nodes are full.

Solution: Ensure enough capacity on labeled nodes:

#Best Practices

#Use Meaningful Label Names

Choose clear, descriptive label names:

#Document Your Labeling Strategy

Maintain documentation of your label schema:

yaml

# Label Schema Documentation
# disktype: ssd | hdd | nvme
# environment: production | staging | development
# region: us-east | us-west | eu-central
# gpu: nvidia-tesla-v100 | nvidia-a100 | none

#Use Consistent Label Values

Standardize label values across your cluster:

#Label Nodes During Provisioning

Automate node labeling during cluster setup:

#Combine with Resource Requests

Always set resource requests with nodeSelector:

#Use for Critical Workloads

Reserve nodeSelector for workloads with specific requirements:

#Monitor Node Label Changes

Track label changes for audit purposes:

#Test Before Production

Test nodeSelector in development first:

#Node Selector Limitations

Node Selector is simple but has limitations:

#Only Equality Matching

Cannot use operators like "not equal" or "in":

For advanced matching, use Node Affinity (next episode).

#AND Logic Only

Cannot express OR logic:

#No Soft Preferences

Node Selector is a hard requirement. Pod won't schedule if no matching nodes exist.

For soft preferences, use Node Affinity (next episode).

#When to Use Node Selector

Use Node Selector when:

You need simple, straightforward node selection
You have clear, specific node requirements
You want easy-to-understand Pod specifications
You're using equality-based label matching

Consider Node Affinity when:

You need complex selection logic (OR, NOT, IN)
You want soft preferences (preferred but not required)
You need more flexible matching operators
You're implementing advanced scheduling strategies

#Conclusion

In episode 16, we've explored Node Selector in Kubernetes in depth. We've learned what Node Selector is, how to label nodes, and how to use nodeSelector to control Pod placement.

Key takeaways:

Node Selector is the simplest way to constrain Pod placement
Uses label matching to select nodes
Requires manual node labeling before use
Supports multiple labels with AND logic
Works with all Pod controllers (Deployment, DaemonSet, Job, etc.)
Only supports equality matching (key=value)
Pod won't schedule if no matching nodes exist
Perfect for hardware requirements, environment separation, and cost optimization
Use meaningful, consistent label names
Document your labeling strategy

Node Selector is essential for controlling Pod placement in Kubernetes. By understanding Node Selector, you can ensure workloads run on appropriate nodes, optimize resource usage, and maintain environment separation.

Are you getting a clearer understanding of Node Selector in Kubernetes? In the next episode 17, we'll discuss working with the all keyword, which provides a convenient way to manage multiple Kubernetes resources at once using kubectl get all and kubectl delete all. Keep your learning momentum going and look forward to the next episode!