Week 9 - Namespaces¶
9.1 Conceptual Core¶
- A namespace is a kernel mechanism that gives a process a private view of a global resource. Eight types exist:
- `mnt - mount tree.
- `pid - PID space; PID 1 inside is special (default signal handlers blocked, OOM-immune).
- `net - network stack: interfaces, routing, sockets, iptables tables.
- `uts - hostname, domainname.
- `ipc - System V IPC, POSIX message queues.
- `user - UID/GID mappings; the security-relevant namespace.
- `cgroup - view of cgroup hierarchy.
- `time - monotonic and boot-time clock offsets (relatively new; container images rarely use).
- Namespaces are the primitive containers are built on. Docker / containerd / runc use them; you can use them directly with
unshare(2),clone(2)flags, andsetns(2).
9.2 Mechanical Detail¶
unshare --user --pid --net --mount --uts --ipc --cgroup --fork --map-root-user bashgets you "inside" most namespaces in one shell (but mounts are inherited until youmount --make-rslaveor remount).lsns -t <type>enumerates active namespaces./proc/<pid>/ns/{mnt,pid,net,...}are inode-numbered handles you cansetns(2)into viansenter --target <pid> --all.- The
pivot_root(2)syscall replaces the current root with a new one-this is how containers escape from the host's/. - User namespaces allow unprivileged users to "be root" inside the namespace-the foundation of rootless containers.
9.3 Lab-"Hand-Built Container"¶
Write a C program that:
1. clone()s with CLONE_NEWUSER | CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWNET | CLONE_NEWCGROUP.
2. Configures UID/GID mappings via /proc/<pid>/uid_map and gid_map.
3. Creates a veth pair to give the namespace network access.
4. pivot_roots into a minimal Alpine rootfs.
5. execves /bin/sh.
You should now have a working terminal "inside" a "container" that you wrote in ~150 lines of C.
9.4 Hardening Drill¶
- Verify
kernel.unprivileged_userns_clone=1is set (it is on most modern distros). Read CVE history for `user namespaces - many privilege-escalation CVEs over the years are namespace-related; learn the surface.
9.5 Performance Tuning Slice¶
- `bpftrace -e 'kprobe:setns { @[comm] = count() }' - see who's switching namespaces. Useful for debugging container-runtime activity.