Master cgroups and namespaces, the kernel technologies that power Docker, Kubernetes, and modern containerization. Essential for DevOps and cloud engineers.

In Episode 1, we explored Linux fundamentals and history. Now we're diving deeper into the kernel—the core of Linux that makes modern containerization possible.
If you've ever wondered how Docker containers can run isolated processes on the same machine, or how Kubernetes manages thousands of containers without them interfering with each other, the answer lies in two powerful kernel technologies: namespaces and cgroups.
These aren't just academic concepts. They're the foundation of:
Understanding namespaces and cgroups is essential for anyone working with containers, cloud platforms, or infrastructure. In this episode, we'll demystify these technologies and show you how they work under the hood.
By the end, you'll understand how the kernel isolates processes, limits resources, and enables the containerized world we live in today.
The Linux kernel is the core software that manages all hardware resources and mediates access between applications and hardware. Its primary responsibilities include:
The kernel runs in a privileged mode called "kernel space" and protects itself from user applications.
Linux divides memory and execution into two distinct spaces:
Kernel Space
User Space
This separation is crucial for system stability and security. Applications can't directly access hardware; they must request the kernel to do it.
When a user application needs to do something privileged (like reading a file or allocating memory), it makes a system call. This is the interface between user space and kernel space.
Common system calls include:
open(): Open a fileread(): Read from a file descriptorwrite(): Write to a file descriptorfork(): Create a new processexec(): Execute a programexit(): Terminate a processmmap(): Map memorysocket(): Create a network socketWhen a system call is made, the CPU switches from user mode to kernel mode, the kernel performs the operation, and then switches back to user mode. This context switching has a performance cost, which is why minimizing system calls is important for performance-critical code.
A process is a running instance of a program. Each process has:
Processes are isolated from each other. One process crashing doesn't affect others (in most cases).
A process goes through several states during its lifetime:
Running → Waiting → Stopped → Zombie → TerminatedThe kernel's process scheduler decides which process runs on the CPU at any given time. On a multi-core system, multiple processes can run simultaneously (one per core).
The scheduler uses:
This is why a single-core system can run thousands of processes—they're rapidly switching between them.
Processes form a tree hierarchy:
init (PID 1)
├── systemd-journal
├── systemd-logind
├── sshd
│ └── bash (user session)
│ └── vim
└── nginx
├── nginx (worker)
└── nginx (worker)The first process is init (PID 1), which is the parent of all other processes. Modern Linux systems use systemd as the init system, which manages services, dependencies, and system startup.
When a parent process terminates, its children become orphans and are adopted by init. This prevents zombie processes from accumulating.
Every process has its own virtual address space. This is a key Linux feature that provides:
Virtual addresses are mapped to physical memory by the Memory Management Unit (MMU). A process thinks it has access to a large, contiguous memory space, but the kernel maps it to fragmented physical memory.
When a process needs memory:
This allows systems to run processes that collectively use more memory than physically available.
Swap is disk space used as an extension of RAM. When physical memory is full:
However, swap is much slower than RAM (disk I/O is ~1000x slower). Excessive swapping causes severe performance degradation. Modern systems try to minimize swapping through better memory management and cgroup limits.
Namespaces are a kernel feature that partitions system resources so that processes can have isolated views of the system. Instead of all processes seeing the same system resources, each namespace provides a separate view.
Think of namespaces like virtual worlds. Multiple processes can exist in different namespaces, each seeing a different version of the system. A process in one namespace can't see or interact with resources in another namespace.
Namespaces are the foundation of container isolation. Docker and Kubernetes use namespaces to create isolated environments for containers.
Linux provides several types of namespaces, each isolating different system resources:
Isolates process IDs. Each PID namespace has its own process tree with its own PID 1 (init process).
Use case: Containers see their own process tree, not the host's processes Example: A container's init process has PID 1 inside the container, but might be PID 12345 on the host
Isolates network resources: network interfaces, IP addresses, routing tables, firewall rules.
Use case: Each container has its own network stack Example: A container can have its own IP address, ports, and network configuration separate from the host
Isolates the filesystem. Each namespace can have a different view of the filesystem hierarchy.
Use case: Containers have their own root filesystem
Example: A container's / points to a container image, not the host's root filesystem
Isolates Inter-Process Communication resources: message queues, shared memory, semaphores.
Use case: Processes in different IPC namespaces can't communicate via IPC Example: Two containers can't share memory or message queues
Isolates hostname and domain name.
Use case: Each container can have its own hostname Example: A container can have hostname "web-server" while the host is "production-01"
Isolates user and group IDs. A process can be root (UID 0) inside a user namespace but a regular user on the host.
Use case: Containers can run as root inside but be unprivileged on the host Example: Container root (UID 0 in namespace) maps to UID 1000 on the host
Isolates the cgroup hierarchy (we'll cover cgroups next).
Use case: Processes see a simplified cgroup view
Example: A container sees its cgroup as / instead of /docker/container-id
Namespaces work by providing separate views of system resources. When a process is created in a namespace:
This is how Docker containers can:
Cgroups (control groups) are a kernel feature that limits, prioritizes, and isolates resource usage of process groups. While namespaces provide isolation (you can't see other resources), cgroups provide resource limits (you can't use more than allowed).
Cgroups allow you to:
Without cgroups, a single process could consume all CPU or memory, starving other processes. Cgroups prevent this.
Cgroups v1 (legacy)
Cgroups v2 (modern)
Most modern systems are transitioning to cgroups v2, but v1 is still common in production.
Control how much CPU time a process group can use:
cpu.max = "50000 100000" # 50% of one CPU core
cpu.weight = 100 # CPU scheduling weight (1-10000)Control how much memory a process group can use:
memory.max = "512M" # Hard limit: 512 MB
memory.high = "256M" # Soft limit: triggers reclaim at 256 MB
memory.swap.max = "0" # Disable swap for this cgroupControl disk I/O bandwidth:
io.max = "8:0 rbps=10485760 wbps=10485760" # 10 MB/s read and writeControl which devices a process can access:
devices.allow = "c 1:3 rw" # Allow /dev/null (character device 1:3)
devices.deny = "b 8:* rwm" # Deny all block devicesCgroups form a tree hierarchy. Each cgroup can have child cgroups, and resource limits are inherited and enforced at each level:
/
├── system.slice
│ ├── systemd-logind.service
│ └── sshd.service
├── user.slice
│ └── user-1000.slice
│ └── session-1.scope
└── docker
├── container-1
│ └── memory.max = 512M
└── container-2
└── memory.max = 1GA process's actual limits are determined by all cgroups in its path from root to leaf.
Docker combines namespaces and cgroups to create isolated, resource-limited containers:
When you run docker run, Docker:
The container process thinks it's running on its own machine, but it's actually sharing the host kernel with other containers.
Let's trace what happens when you run a container:
docker run --name web --cpus 1 --memory 512m nginx/ as the nginx image root, not the host's rootThe container is completely isolated from other containers and the host, yet they all share the same kernel.
When you specify resource limits in Docker or Kubernetes, you're setting cgroup limits:
# Docker: Limit to 2 CPUs and 1 GB memory
docker run --cpus 2 --memory 1g myapp
# Kubernetes: Set resource requests and limits
resources:
requests:
cpu: "500m"
memory: "256Mi"
limits:
cpu: "1000m"
memory: "512Mi"These limits are enforced by cgroups. If a container tries to exceed its memory limit, the kernel kills the process (OOMKill). If it tries to use more CPU than allocated, it's throttled.
You can inspect namespaces on your system:
# List all namespaces for a process
ls -la /proc/1/ns/
# Show namespace IDs
readlink /proc/1/ns/*
# Compare namespaces between processes
diff <(readlink /proc/1/ns/*) <(readlink /proc/self/ns/*)
# List all processes and their namespaces
ps aux | head -5You can create a process in a new namespace using unshare:
# Create a new PID namespace
sudo unshare --pid --fork /bin/bash
# Inside the new namespace, PID 1 is bash
ps aux
# Create a new network namespace
sudo unshare --net /bin/bash
# Inside, you have isolated network interfaces
ip link showYou can set cgroup limits using systemd-run:
# Run a process with CPU limit (50% of one core)
systemd-run --scope -p CPUQuota=50% stress-ng --cpu 1
# Run a process with memory limit (256 MB)
systemd-run --scope -p MemoryLimit=256M myapp
# Run with both limits
systemd-run --scope -p CPUQuota=50% -p MemoryLimit=512M myappMonitor resource usage of cgroups:
# View cgroup v2 memory usage
cat /sys/fs/cgroup/memory.current
cat /sys/fs/cgroup/memory.max
# View CPU usage
cat /sys/fs/cgroup/cpu.stat
# Monitor in real-time
watch -n 1 'cat /sys/fs/cgroup/memory.current'
# For Docker containers
docker statsTip
The path to cgroup files differs between cgroups v1 and v2. Most modern systems use cgroups v2 at /sys/fs/cgroup/, while older systems use v1 at /sys/fs/cgroup/<resource>/.
Mistake: Setting memory limits too low, causing OOMKill
# BAD: 128 MB is too low for most applications
resources:
limits:
memory: "128Mi"Why it happens: Underestimating application memory needs or trying to pack too many containers
How to avoid it:
docker stats or Kubernetes metrics to understand usageMistake: Assuming a process in a container can access the host's network
Why it happens: Misunderstanding how network namespaces work
How to avoid it: Remember that containers have isolated network namespaces. To access the host network, use --network host in Docker or hostNetwork: true in Kubernetes.
Mistake: Not accounting for memory pressure and swap usage
Why it happens: Assuming memory limits are hard stops (they're not—swap can extend them)
How to avoid it:
memory.swap.max = 0)Mistake: Allowing unlimited swap, causing performance degradation
Why it happens: Default system configuration allows swap
How to avoid it:
Set appropriate resource requests and limits:
# Kubernetes example
resources:
requests:
cpu: "250m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"Requests are what the scheduler uses for placement. Limits are hard caps enforced by cgroups.
Disable swap for predictable performance:
# In Docker
docker run --memory-swap 0 myapp
# In Kubernetes
securityContext:
capabilities:
add:
- SYS_RESOURCEUse resource quotas in Kubernetes:
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-quota
spec:
hard:
requests.cpu: "10"
requests.memory: "20Gi"
limits.cpu: "20"
limits.memory: "40Gi"Monitor cgroup metrics:
Use tools like:
docker stats: Real-time container metricskubectl top: Kubernetes resource usageSet up alerts for:
Use user namespaces: Map container root to unprivileged user on host
docker run --userns-remap=default myappRestrict device access: Only allow necessary devices
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALLUse read-only filesystems: Prevent container from modifying its filesystem
securityContext:
readOnlyRootFilesystem: trueDon't manually configure cgroups. Use Docker or Kubernetes instead:
# DON'T do this manually
echo "512M" > /sys/fs/cgroup/memory/myapp/memory.limit_in_bytes
# DO this instead
docker run --memory 512m myappContainer orchestration tools handle cgroup configuration for you, with better abstractions and error handling.
In Kubernetes, you specify resource requests and limits, and Kubernetes manages cgroups:
# Kubernetes handles the cgroup configuration
resources:
requests:
memory: "256Mi"
limits:
memory: "512Mi"You don't need to know about cgroups to use Kubernetes effectively. The orchestrator abstracts away the complexity.
ls /proc/1/ns/ to see namespaces on your systemdocker stats to see resource usage/usr/share/doc/linux-doc/ or kernel.orgUnderstanding namespaces and cgroups is the key to mastering containerization. These concepts apply whether you're using Docker, Kubernetes, or any other container platform.
Ready for the next episode? Continue with Episode 3: Permissions, Users & Groups to master file permissions and user management, which are crucial for container security.