Learning Kubernetes - Episode 14 - Introduction and Explanation of Job

Learning Kubernetes - Episode 14 - Introduction and Explanation of Job

In this episode, we'll discuss Kubernetes Job, a controller designed for running tasks to completion. We'll learn how Job manages batch processing, one-time tasks, and parallel execution in Kubernetes.

Arman Dwi Pangestu
Arman Dwi PangestuMarch 17, 2026
0 views
8 min read

Introduction

Note

If you want to read the previous episode, you can click the Episode 13 thumbnail below

Episode 13Episode 13

In the previous episode, we learned about DaemonSet, which ensures a Pod runs on every node in the cluster. In episode 14, we'll discuss a different type of controller: Job.

Note: Here I'll be using a Kubernetes Cluster installed through K3s.

Unlike controllers we've discussed (ReplicaSet, DaemonSet) that keep Pods running continuously, Job is designed for tasks that run to completion. Think of it as running a script or batch process that needs to finish successfully, then stop.

What Is Job?

A Job creates one or more Pods and ensures that a specified number of them successfully terminate. Jobs track the successful completions of Pods and when the specified number of successful completions is reached, the Job itself is complete.

Think of Job like running a cron task or batch script - it starts, does its work, and finishes. In Kubernetes, Job manages this process, handling failures and retries automatically.

Key characteristics of Job:

  • Run to completion - Pods are expected to finish and exit successfully
  • Automatic retry - Failed Pods are automatically restarted
  • Completion tracking - Tracks how many Pods completed successfully
  • Parallel execution - Can run multiple Pods in parallel
  • Cleanup - Completed Jobs can be automatically cleaned up
  • One-time or batch tasks - Perfect for migrations, backups, data processing

Why Do We Need Job?

Job is designed for workloads that need to run once or periodically and then complete:

  • Database migrations - Run schema updates or data migrations
  • Batch processing - Process large datasets or generate reports
  • Backup tasks - Create backups of databases or files
  • Data import/export - Load data into systems or export for analysis
  • Image processing - Resize images, generate thumbnails
  • ETL operations - Extract, transform, and load data
  • One-time setup tasks - Initialize systems or seed data
  • Cleanup operations - Remove old data or temporary files

Without Job, you would need to:

  • Manually create Pods for one-time tasks
  • Monitor Pod completion status
  • Handle failures and retries manually
  • Clean up completed Pods yourself

Job vs Other Controllers

Let's understand the key differences:

AspectJobReplicaSetDaemonSet
PurposeRun to completionKeep runningKeep running on nodes
Pod lifecycleTerminates on successRuns continuouslyRuns continuously
Restart policyOnFailure or NeverAlwaysAlways
Completion trackingYesNoNo
Use caseBatch tasksApplicationsNode-level services
CleanupCan auto-deleteManualManual

Example scenario:

  • Job: Run a database migration script once
  • ReplicaSet: Run 3 replicas of a web application continuously
  • DaemonSet: Run a log collector on every node continuously

Creating a Job

Let's create a basic Job:

Example 1: Basic Job

Create a file named job-basic.yml:

Kubernetesjob-basic.yml
apiVersion: batch/v1
kind: Job
metadata:
    name: hello-job
spec:
    template:
        spec:
            containers:
                - name: hello
                  image: busybox:1.36
                  command:
                      - /bin/sh
                      - -c
                      - echo "Hello from Kubernetes Job!"; sleep 5; echo "Job completed!"
            restartPolicy: Never

Important

Important: Job Pods must use restartPolicy: Never or restartPolicy: OnFailure. The default Always is not allowed for Jobs.

Apply the configuration:

Kubernetesbash
sudo kubectl apply -f job-basic.yml

Verify the Job is created:

Kubernetesbash
sudo kubectl get jobs

Output:

Kubernetesbash
NAME        COMPLETIONS   DURATION   AGE
hello-job   1/1           8s         10s

Check the Pods:

Kubernetesbash
sudo kubectl get pods

Output:

Kubernetesbash
NAME              READY   STATUS      RESTARTS   AGE
hello-job-abc12   0/1     Completed   0          15s

Notice the Pod status is Completed, not Running.

View the Pod logs:

Kubernetesbash
sudo kubectl logs hello-job-abc12

Output:

Kubernetesbash
Hello from Kubernetes Job!
Job completed!

Job Completion Modes

Job supports different completion modes:

Non-Parallel Jobs (Default)

Runs a single Pod to completion:

Kubernetesjob-single.yml
apiVersion: batch/v1
kind: Job
metadata:
    name: single-job
spec:
    template:
        spec:
            containers:
                - name: task
                  image: busybox:1.36
                  command: ["echo", "Single task completed"]
            restartPolicy: Never

This creates one Pod. If it fails, Job creates a new Pod until one succeeds.

Parallel Jobs with Fixed Completion Count

Runs multiple Pods in parallel until a specified number complete successfully:

Kubernetesjob-parallel-fixed.yml
apiVersion: batch/v1
kind: Job
metadata:
    name: parallel-job
spec:
    completions: 5
    parallelism: 2
    template:
        spec:
            containers:
                - name: task
                  image: busybox:1.36
                  command:
                      - /bin/sh
                      - -c
                      - echo "Processing task"; sleep 10; echo "Task completed"
            restartPolicy: Never

This Job:

  • Needs 5 successful completions (completions: 5)
  • Runs 2 Pods at a time (parallelism: 2)
  • Creates new Pods until 5 complete successfully

Parallel Jobs with Work Queue

Runs multiple Pods in parallel without a fixed completion count:

Kubernetesjob-work-queue.yml
apiVersion: batch/v1
kind: Job
metadata:
    name: work-queue-job
spec:
    parallelism: 3
    template:
        spec:
            containers:
                - name: worker
                  image: busybox:1.36
                  command:
                      - /bin/sh
                      - -c
                      - echo "Processing work item"; sleep 5; echo "Done"
            restartPolicy: Never

This Job:

  • Runs 3 Pods in parallel
  • No fixed completion count
  • Pods coordinate through external work queue
  • Job completes when all Pods finish

Restart Policy

Job Pods support two restart policies:

Never

Pod is never restarted. If it fails, Job creates a new Pod:

Kubernetesyml
spec:
    template:
        spec:
            restartPolicy: Never

Behavior:

  • Failed Pod stays in Error state
  • New Pod is created for retry
  • Good for debugging (can inspect failed Pods)

OnFailure

Pod is restarted on the same node if it fails:

Kubernetesyml
spec:
    template:
        spec:
            restartPolicy: OnFailure

Behavior:

  • Failed Pod is restarted in place
  • No new Pod is created
  • Good for resource efficiency

Backoff Limit

Control how many times Job retries failed Pods:

Kubernetesjob-backoff.yml
apiVersion: batch/v1
kind: Job
metadata:
    name: retry-job
spec:
    backoffLimit: 3
    template:
        spec:
            containers:
                - name: task
                  image: busybox:1.36
                  command:
                      - /bin/sh
                      - -c
                      - exit 1
            restartPolicy: Never

This Job:

  • Retries up to 3 times (backoffLimit: 3)
  • After 3 failures, Job is marked as failed
  • Default backoffLimit is 6

Active Deadline Seconds

Set a time limit for Job execution:

Kubernetesjob-deadline.yml
apiVersion: batch/v1
kind: Job
metadata:
    name: deadline-job
spec:
    activeDeadlineSeconds: 60
    template:
        spec:
            containers:
                - name: task
                  image: busybox:1.36
                  command:
                      - /bin/sh
                      - -c
                      - sleep 120
            restartPolicy: Never

This Job:

  • Must complete within 60 seconds
  • After 60 seconds, Job is terminated
  • All running Pods are killed

Viewing Job Details

To see detailed information about a Job:

Kubernetesbash
sudo kubectl describe job hello-job

Output:

Kubernetesbash
Name:             hello-job
Namespace:        default
Selector:         controller-uid=abc123
Labels:           <none>
Annotations:      <none>
Parallelism:      1
Completions:      1
Completion Mode:  NonIndexed
Start Time:       Sun, 01 Mar 2026 10:00:00 +0000
Completed At:     Sun, 01 Mar 2026 10:00:08 +0000
Duration:         8s
Pods Statuses:    0 Active / 1 Succeeded / 0 Failed
Pod Template:
  Labels:  controller-uid=abc123
  Containers:
   hello:
    Image:      busybox:1.36
    Command:
      /bin/sh
      -c
      echo "Hello from Kubernetes Job!"; sleep 5; echo "Job completed!"
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  2m    job-controller  Created pod: hello-job-abc12
  Normal  Completed         2m    job-controller  Job completed

Practical Examples

Example 1: Database Migration Job

Kubernetesmigration-job.yml
apiVersion: batch/v1
kind: Job
metadata:
    name: db-migration
    labels:
        app: database
        task: migration
spec:
    backoffLimit: 2
    activeDeadlineSeconds: 300
    template:
        metadata:
            labels:
                app: database
                task: migration
        spec:
            containers:
                - name: migrate
                  image: migrate/migrate:v4.16.2
                  command:
                      - migrate
                      - -path=/migrations
                      - -database=postgres://user:pass@db:5432/mydb?sslmode=disable
                      - up
                  volumeMounts:
                      - name: migrations
                        mountPath: /migrations
            volumes:
                - name: migrations
                  configMap:
                      name: db-migrations
            restartPolicy: Never

This Job:

  • Runs database migrations
  • Retries up to 2 times on failure
  • Must complete within 5 minutes
  • Loads migration files from ConfigMap

Example 2: Batch Data Processing Job

Kubernetesbatch-processing-job.yml
apiVersion: batch/v1
kind: Job
metadata:
    name: data-processor
spec:
    completions: 10
    parallelism: 3
    template:
        spec:
            containers:
                - name: processor
                  image: python:3.11-slim
                  command:
                      - python
                      - -c
                      - |
                          import time
                          import random
                          print("Processing data batch...")
                          time.sleep(random.randint(5, 15))
                          print("Batch processing completed!")
                  resources:
                      requests:
                          memory: "256Mi"
                          cpu: "250m"
                      limits:
                          memory: "512Mi"
                          cpu: "500m"
            restartPolicy: OnFailure

This Job:

  • Processes 10 batches of data
  • Runs 3 batches in parallel
  • Sets resource limits
  • Restarts failed Pods on the same node

Example 3: Backup Job

Kubernetesbackup-job.yml
apiVersion: batch/v1
kind: Job
metadata:
    name: database-backup
    labels:
        app: backup
        type: database
spec:
    backoffLimit: 1
    activeDeadlineSeconds: 600
    template:
        metadata:
            labels:
                app: backup
                type: database
        spec:
            containers:
                - name: backup
                  image: postgres:15-alpine
                  command:
                      - /bin/sh
                      - -c
                      - |
                          pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME > /backup/backup-$(date +%Y%m%d-%H%M%S).sql
                          echo "Backup completed successfully"
                  env:
                      - name: DB_HOST
                        value: "postgres-service"
                      - name: DB_USER
                        valueFrom:
                            secretKeyRef:
                                name: db-credentials
                                key: username
                      - name: DB_NAME
                        value: "production"
                      - name: PGPASSWORD
                        valueFrom:
                            secretKeyRef:
                                name: db-credentials
                                key: password
                  volumeMounts:
                      - name: backup-storage
                        mountPath: /backup
            volumes:
                - name: backup-storage
                  persistentVolumeClaim:
                      claimName: backup-pvc
            restartPolicy: Never

This Job:

  • Creates database backup
  • Stores backup in persistent volume
  • Uses secrets for credentials
  • Must complete within 10 minutes

Example 4: Image Processing Job

Kubernetesimage-processing-job.yml
apiVersion: batch/v1
kind: Job
metadata:
    name: image-processor
spec:
    completions: 100
    parallelism: 10
    template:
        spec:
            containers:
                - name: processor
                  image: imagemagick:latest
                  command:
                      - /bin/sh
                      - -c
                      - |
                          echo "Processing image..."
                          convert input.jpg -resize 800x600 output.jpg
                          echo "Image processed successfully"
                  resources:
                      requests:
                          memory: "512Mi"
                          cpu: "500m"
                      limits:
                          memory: "1Gi"
                          cpu: "1000m"
            restartPolicy: OnFailure

This Job:

  • Processes 100 images
  • Runs 10 processing tasks in parallel
  • Sets appropriate resource limits for image processing

Job Patterns

Pattern 1: Single Job with Multiple Attempts

For tasks that might fail but should retry:

Kubernetesyml
spec:
    backoffLimit: 5
    template:
        spec:
            restartPolicy: Never

Pattern 2: Parallel Processing with Fixed Count

For processing a known number of items:

Kubernetesyml
spec:
    completions: 100
    parallelism: 10

Pattern 3: Work Queue Pattern

For processing items from a queue:

Kubernetesyml
spec:
    parallelism: 5
    # No completions specified

Pods coordinate through external queue (Redis, RabbitMQ, etc.)

Pattern 4: Time-Limited Job

For tasks that must complete within a time limit:

Kubernetesyml
spec:
    activeDeadlineSeconds: 300
    backoffLimit: 3

Cleaning Up Jobs

Manual Cleanup

Delete a completed Job:

Kubernetesbash
sudo kubectl delete job hello-job

This deletes the Job and its Pods.

Automatic Cleanup

Use TTL (Time To Live) to automatically clean up completed Jobs:

Kubernetesjob-ttl.yml
apiVersion: batch/v1
kind: Job
metadata:
    name: cleanup-job
spec:
    ttlSecondsAfterFinished: 100
    template:
        spec:
            containers:
                - name: task
                  image: busybox:1.36
                  command: ["echo", "Task completed"]
            restartPolicy: Never

This Job:

  • Automatically deleted 100 seconds after completion
  • Applies to both successful and failed Jobs
  • Helps prevent Job accumulation

Cleanup Policy

Control when Jobs are cleaned up:

Kubernetesyml
spec:
    ttlSecondsAfterFinished: 0  # Delete immediately after completion

Or keep failed Jobs for debugging:

Kubernetesyml
spec:
    ttlSecondsAfterFinished: 86400  # Keep for 24 hours

Monitoring Jobs

Check Job Status

Kubernetesbash
sudo kubectl get jobs

Watch Job Progress

Kubernetesbash
sudo kubectl get jobs -w

View Job Pods

Kubernetesbash
sudo kubectl get pods --selector=job-name=hello-job

Check Job Logs

Kubernetesbash
# Get Pod name from Job
POD_NAME=$(kubectl get pods --selector=job-name=hello-job -o jsonpath='{.items[0].metadata.name}')
 
# View logs
sudo kubectl logs $POD_NAME

Monitor Job Events

Kubernetesbash
sudo kubectl get events --sort-by='.lastTimestamp' | grep Job

Common Mistakes and Pitfalls

Mistake 1: Using Wrong Restart Policy

Problem: Using restartPolicy: Always for Jobs.

Solution: Use Never or OnFailure:

Kubernetesyml
spec:
    template:
        spec:
            restartPolicy: Never  # or OnFailure

Mistake 2: Not Setting Backoff Limit

Problem: Job retries indefinitely on failure.

Solution: Set appropriate backoffLimit:

Kubernetesyml
spec:
    backoffLimit: 3

Mistake 3: No Time Limit

Problem: Job runs forever if task hangs.

Solution: Set activeDeadlineSeconds:

Kubernetesyml
spec:
    activeDeadlineSeconds: 300

Mistake 4: Not Cleaning Up Completed Jobs

Problem: Accumulation of completed Jobs and Pods.

Solution: Use TTL for automatic cleanup:

Kubernetesyml
spec:
    ttlSecondsAfterFinished: 100

Mistake 5: Incorrect Parallelism Configuration

Problem: Setting parallelism higher than completions.

Solution: Ensure parallelism <= completions:

Kubernetesyml
spec:
    completions: 10
    parallelism: 5  # Not more than completions

Mistake 6: Not Setting Resource Limits

Problem: Job Pods consume excessive resources.

Solution: Always set resource limits:

Kubernetesyml
resources:
    requests:
        memory: "256Mi"
        cpu: "250m"
    limits:
        memory: "512Mi"
        cpu: "500m"

Best Practices

Set Appropriate Backoff Limit

Prevent infinite retries:

Kubernetesyml
spec:
    backoffLimit: 3

Use Active Deadline

Prevent Jobs from running too long:

Kubernetesyml
spec:
    activeDeadlineSeconds: 600

Enable Automatic Cleanup

Use TTL to clean up completed Jobs:

Kubernetesyml
spec:
    ttlSecondsAfterFinished: 100

Set Resource Limits

Prevent resource exhaustion:

Kubernetesyml
resources:
    requests:
        memory: "256Mi"
        cpu: "250m"
    limits:
        memory: "512Mi"
        cpu: "500m"

Use Labels for Organization

Add meaningful labels:

Kubernetesyml
metadata:
    labels:
        app: data-processor
        task: batch-import
        environment: production

Choose Right Restart Policy

  • Use Never for debugging (keeps failed Pods)
  • Use OnFailure for efficiency (restarts in place)

Configure Parallelism Wisely

Balance speed and resource usage:

Kubernetesyml
spec:
    completions: 100
    parallelism: 10  # Process 10 at a time

Use Secrets for Sensitive Data

Never hardcode credentials:

Kubernetesyml
env:
    - name: DB_PASSWORD
      valueFrom:
          secretKeyRef:
              name: db-credentials
              key: password

Job vs CronJob

Jobs run once, but what if you need to run them on a schedule?

Job - Runs once:

Kubernetesyml
apiVersion: batch/v1
kind: Job

CronJob - Runs on a schedule:

Kubernetesyml
apiVersion: batch/v1
kind: CronJob
metadata:
    name: scheduled-backup
spec:
    schedule: "0 2 * * *"  # Every day at 2 AM
    jobTemplate:
        spec:
            template:
                spec:
                    containers:
                        - name: backup
                          image: backup-tool:latest
                    restartPolicy: Never

CronJob creates Jobs on a schedule. We'll cover CronJob in detail in the next episode.

Conclusion

In episode 14, we've explored Job in Kubernetes in depth. We've learned what Job is, how it differs from other controllers, and how to use it for batch processing and one-time tasks.

Key takeaways:

  • Job runs Pods to completion, not continuously
  • Automatically handles retries with backoffLimit
  • Supports parallel execution with parallelism and completions
  • Two restart policies: Never (creates new Pod) or OnFailure (restarts in place)
  • Use activeDeadlineSeconds to time-limit Jobs
  • Use ttlSecondsAfterFinished for automatic cleanup
  • Perfect for batch processing, migrations, backups, and one-time tasks
  • Different from ReplicaSet/DaemonSet which keep Pods running
  • Can run single or multiple Pods in parallel
  • Always set resource limits and backoff limits

Job is essential for running batch workloads, data processing, and one-time tasks in Kubernetes. By understanding Job, you can effectively manage tasks that need to run to completion, handle failures gracefully, and clean up resources automatically.

Are you getting a clearer understanding of Job in Kubernetes? In the next episode 15, we'll discuss CronJob, which builds on Job to provide scheduled, recurring task execution. Keep your learning momentum going and look forward to the next episode!


Related Posts