In this episode, we'll discuss Kubernetes Job, a controller designed for running tasks to completion. We'll learn how Job manages batch processing, one-time tasks, and parallel execution in Kubernetes.

Note
If you want to read the previous episode, you can click the Episode 13 thumbnail below
In the previous episode, we learned about DaemonSet, which ensures a Pod runs on every node in the cluster. In episode 14, we'll discuss a different type of controller: Job.
Note: Here I'll be using a Kubernetes Cluster installed through K3s.
Unlike controllers we've discussed (ReplicaSet, DaemonSet) that keep Pods running continuously, Job is designed for tasks that run to completion. Think of it as running a script or batch process that needs to finish successfully, then stop.
A Job creates one or more Pods and ensures that a specified number of them successfully terminate. Jobs track the successful completions of Pods and when the specified number of successful completions is reached, the Job itself is complete.
Think of Job like running a cron task or batch script - it starts, does its work, and finishes. In Kubernetes, Job manages this process, handling failures and retries automatically.
Key characteristics of Job:
Job is designed for workloads that need to run once or periodically and then complete:
Without Job, you would need to:
Let's understand the key differences:
| Aspect | Job | ReplicaSet | DaemonSet |
|---|---|---|---|
| Purpose | Run to completion | Keep running | Keep running on nodes |
| Pod lifecycle | Terminates on success | Runs continuously | Runs continuously |
| Restart policy | OnFailure or Never | Always | Always |
| Completion tracking | Yes | No | No |
| Use case | Batch tasks | Applications | Node-level services |
| Cleanup | Can auto-delete | Manual | Manual |
Example scenario:
Let's create a basic Job:
Create a file named job-basic.yml:
apiVersion: batch/v1
kind: Job
metadata:
name: hello-job
spec:
template:
spec:
containers:
- name: hello
image: busybox:1.36
command:
- /bin/sh
- -c
- echo "Hello from Kubernetes Job!"; sleep 5; echo "Job completed!"
restartPolicy: NeverImportant
Important: Job Pods must use restartPolicy: Never or restartPolicy: OnFailure. The default Always is not allowed for Jobs.
Apply the configuration:
sudo kubectl apply -f job-basic.ymlVerify the Job is created:
sudo kubectl get jobsOutput:
NAME COMPLETIONS DURATION AGE
hello-job 1/1 8s 10sCheck the Pods:
sudo kubectl get podsOutput:
NAME READY STATUS RESTARTS AGE
hello-job-abc12 0/1 Completed 0 15sNotice the Pod status is Completed, not Running.
View the Pod logs:
sudo kubectl logs hello-job-abc12Output:
Hello from Kubernetes Job!
Job completed!Job supports different completion modes:
Runs a single Pod to completion:
apiVersion: batch/v1
kind: Job
metadata:
name: single-job
spec:
template:
spec:
containers:
- name: task
image: busybox:1.36
command: ["echo", "Single task completed"]
restartPolicy: NeverThis creates one Pod. If it fails, Job creates a new Pod until one succeeds.
Runs multiple Pods in parallel until a specified number complete successfully:
apiVersion: batch/v1
kind: Job
metadata:
name: parallel-job
spec:
completions: 5
parallelism: 2
template:
spec:
containers:
- name: task
image: busybox:1.36
command:
- /bin/sh
- -c
- echo "Processing task"; sleep 10; echo "Task completed"
restartPolicy: NeverThis Job:
completions: 5)parallelism: 2)Runs multiple Pods in parallel without a fixed completion count:
apiVersion: batch/v1
kind: Job
metadata:
name: work-queue-job
spec:
parallelism: 3
template:
spec:
containers:
- name: worker
image: busybox:1.36
command:
- /bin/sh
- -c
- echo "Processing work item"; sleep 5; echo "Done"
restartPolicy: NeverThis Job:
Job Pods support two restart policies:
Pod is never restarted. If it fails, Job creates a new Pod:
spec:
template:
spec:
restartPolicy: NeverBehavior:
Error statePod is restarted on the same node if it fails:
spec:
template:
spec:
restartPolicy: OnFailureBehavior:
Control how many times Job retries failed Pods:
apiVersion: batch/v1
kind: Job
metadata:
name: retry-job
spec:
backoffLimit: 3
template:
spec:
containers:
- name: task
image: busybox:1.36
command:
- /bin/sh
- -c
- exit 1
restartPolicy: NeverThis Job:
backoffLimit: 3)backoffLimit is 6Set a time limit for Job execution:
apiVersion: batch/v1
kind: Job
metadata:
name: deadline-job
spec:
activeDeadlineSeconds: 60
template:
spec:
containers:
- name: task
image: busybox:1.36
command:
- /bin/sh
- -c
- sleep 120
restartPolicy: NeverThis Job:
To see detailed information about a Job:
sudo kubectl describe job hello-jobOutput:
Name: hello-job
Namespace: default
Selector: controller-uid=abc123
Labels: <none>
Annotations: <none>
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Sun, 01 Mar 2026 10:00:00 +0000
Completed At: Sun, 01 Mar 2026 10:00:08 +0000
Duration: 8s
Pods Statuses: 0 Active / 1 Succeeded / 0 Failed
Pod Template:
Labels: controller-uid=abc123
Containers:
hello:
Image: busybox:1.36
Command:
/bin/sh
-c
echo "Hello from Kubernetes Job!"; sleep 5; echo "Job completed!"
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 2m job-controller Created pod: hello-job-abc12
Normal Completed 2m job-controller Job completedapiVersion: batch/v1
kind: Job
metadata:
name: db-migration
labels:
app: database
task: migration
spec:
backoffLimit: 2
activeDeadlineSeconds: 300
template:
metadata:
labels:
app: database
task: migration
spec:
containers:
- name: migrate
image: migrate/migrate:v4.16.2
command:
- migrate
- -path=/migrations
- -database=postgres://user:pass@db:5432/mydb?sslmode=disable
- up
volumeMounts:
- name: migrations
mountPath: /migrations
volumes:
- name: migrations
configMap:
name: db-migrations
restartPolicy: NeverThis Job:
apiVersion: batch/v1
kind: Job
metadata:
name: data-processor
spec:
completions: 10
parallelism: 3
template:
spec:
containers:
- name: processor
image: python:3.11-slim
command:
- python
- -c
- |
import time
import random
print("Processing data batch...")
time.sleep(random.randint(5, 15))
print("Batch processing completed!")
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
restartPolicy: OnFailureThis Job:
apiVersion: batch/v1
kind: Job
metadata:
name: database-backup
labels:
app: backup
type: database
spec:
backoffLimit: 1
activeDeadlineSeconds: 600
template:
metadata:
labels:
app: backup
type: database
spec:
containers:
- name: backup
image: postgres:15-alpine
command:
- /bin/sh
- -c
- |
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > /backup/backup-$(date +%Y%m%d-%H%M%S).sql
echo "Backup completed successfully"
env:
- name: DB_HOST
value: "postgres-service"
- name: DB_USER
valueFrom:
secretKeyRef:
name: db-credentials
key: username
- name: DB_NAME
value: "production"
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password
volumeMounts:
- name: backup-storage
mountPath: /backup
volumes:
- name: backup-storage
persistentVolumeClaim:
claimName: backup-pvc
restartPolicy: NeverThis Job:
apiVersion: batch/v1
kind: Job
metadata:
name: image-processor
spec:
completions: 100
parallelism: 10
template:
spec:
containers:
- name: processor
image: imagemagick:latest
command:
- /bin/sh
- -c
- |
echo "Processing image..."
convert input.jpg -resize 800x600 output.jpg
echo "Image processed successfully"
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
restartPolicy: OnFailureThis Job:
For tasks that might fail but should retry:
spec:
backoffLimit: 5
template:
spec:
restartPolicy: NeverFor processing a known number of items:
spec:
completions: 100
parallelism: 10For processing items from a queue:
spec:
parallelism: 5
# No completions specifiedPods coordinate through external queue (Redis, RabbitMQ, etc.)
For tasks that must complete within a time limit:
spec:
activeDeadlineSeconds: 300
backoffLimit: 3Delete a completed Job:
sudo kubectl delete job hello-jobThis deletes the Job and its Pods.
Use TTL (Time To Live) to automatically clean up completed Jobs:
apiVersion: batch/v1
kind: Job
metadata:
name: cleanup-job
spec:
ttlSecondsAfterFinished: 100
template:
spec:
containers:
- name: task
image: busybox:1.36
command: ["echo", "Task completed"]
restartPolicy: NeverThis Job:
Control when Jobs are cleaned up:
spec:
ttlSecondsAfterFinished: 0 # Delete immediately after completionOr keep failed Jobs for debugging:
spec:
ttlSecondsAfterFinished: 86400 # Keep for 24 hourssudo kubectl get jobssudo kubectl get jobs -wsudo kubectl get pods --selector=job-name=hello-job# Get Pod name from Job
POD_NAME=$(kubectl get pods --selector=job-name=hello-job -o jsonpath='{.items[0].metadata.name}')
# View logs
sudo kubectl logs $POD_NAMEsudo kubectl get events --sort-by='.lastTimestamp' | grep JobProblem: Using restartPolicy: Always for Jobs.
Solution: Use Never or OnFailure:
spec:
template:
spec:
restartPolicy: Never # or OnFailureProblem: Job retries indefinitely on failure.
Solution: Set appropriate backoffLimit:
spec:
backoffLimit: 3Problem: Job runs forever if task hangs.
Solution: Set activeDeadlineSeconds:
spec:
activeDeadlineSeconds: 300Problem: Accumulation of completed Jobs and Pods.
Solution: Use TTL for automatic cleanup:
spec:
ttlSecondsAfterFinished: 100Problem: Setting parallelism higher than completions.
Solution: Ensure parallelism <= completions:
spec:
completions: 10
parallelism: 5 # Not more than completionsProblem: Job Pods consume excessive resources.
Solution: Always set resource limits:
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"Prevent infinite retries:
spec:
backoffLimit: 3Prevent Jobs from running too long:
spec:
activeDeadlineSeconds: 600Use TTL to clean up completed Jobs:
spec:
ttlSecondsAfterFinished: 100Prevent resource exhaustion:
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"Add meaningful labels:
metadata:
labels:
app: data-processor
task: batch-import
environment: productionNever for debugging (keeps failed Pods)OnFailure for efficiency (restarts in place)Balance speed and resource usage:
spec:
completions: 100
parallelism: 10 # Process 10 at a timeNever hardcode credentials:
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: passwordJobs run once, but what if you need to run them on a schedule?
Job - Runs once:
apiVersion: batch/v1
kind: JobCronJob - Runs on a schedule:
apiVersion: batch/v1
kind: CronJob
metadata:
name: scheduled-backup
spec:
schedule: "0 2 * * *" # Every day at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: backup-tool:latest
restartPolicy: NeverCronJob creates Jobs on a schedule. We'll cover CronJob in detail in the next episode.
In episode 14, we've explored Job in Kubernetes in depth. We've learned what Job is, how it differs from other controllers, and how to use it for batch processing and one-time tasks.
Key takeaways:
backoffLimitparallelism and completionsactiveDeadlineSeconds to time-limit JobsttlSecondsAfterFinished for automatic cleanupJob is essential for running batch workloads, data processing, and one-time tasks in Kubernetes. By understanding Job, you can effectively manage tasks that need to run to completion, handle failures gracefully, and clean up resources automatically.
Are you getting a clearer understanding of Job in Kubernetes? In the next episode 15, we'll discuss CronJob, which builds on Job to provide scheduled, recurring task execution. Keep your learning momentum going and look forward to the next episode!