Saltar a contenido

Week 9 - Namespaces

9.1 Conceptual Core

  • A namespace is a kernel mechanism that gives a process a private view of a global resource. Eight types exist:
  • `mnt - mount tree.
  • `pid - PID space; PID 1 inside is special (default signal handlers blocked, OOM-immune).
  • `net - network stack: interfaces, routing, sockets, iptables tables.
  • `uts - hostname, domainname.
  • `ipc - System V IPC, POSIX message queues.
  • `user - UID/GID mappings; the security-relevant namespace.
  • `cgroup - view of cgroup hierarchy.
  • `time - monotonic and boot-time clock offsets (relatively new; container images rarely use).
  • Namespaces are the primitive containers are built on. Docker / containerd / runc use them; you can use them directly with unshare(2), clone(2) flags, and setns(2).

9.2 Mechanical Detail

  • unshare --user --pid --net --mount --uts --ipc --cgroup --fork --map-root-user bash gets you "inside" most namespaces in one shell (but mounts are inherited until you mount --make-rslave or remount).
  • lsns -t <type> enumerates active namespaces.
  • /proc/<pid>/ns/{mnt,pid,net,...} are inode-numbered handles you can setns(2) into via nsenter --target <pid> --all.
  • The pivot_root(2) syscall replaces the current root with a new one-this is how containers escape from the host's /.
  • User namespaces allow unprivileged users to "be root" inside the namespace-the foundation of rootless containers.

9.3 Lab-"Hand-Built Container"

Write a C program that: 1. clone()s with CLONE_NEWUSER | CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWNET | CLONE_NEWCGROUP. 2. Configures UID/GID mappings via /proc/<pid>/uid_map and gid_map. 3. Creates a veth pair to give the namespace network access. 4. pivot_roots into a minimal Alpine rootfs. 5. execves /bin/sh.

You should now have a working terminal "inside" a "container" that you wrote in ~150 lines of C.

9.4 Hardening Drill

  • Verify kernel.unprivileged_userns_clone=1 is set (it is on most modern distros). Read CVE history for `user namespaces - many privilege-escalation CVEs over the years are namespace-related; learn the surface.

9.5 Performance Tuning Slice

  • `bpftrace -e 'kprobe:setns { @[comm] = count() }' - see who's switching namespaces. Useful for debugging container-runtime activity.

Comments