Skip to content

Week 22 - Namespaces and Process Isolation

22.1 Conceptual Core

  • The runtime needs to clone (or fork+unshare) into the configured namespaces, set up UID/GID maps for user namespaces, configure UTS hostname, and pivot_root into the rootfs.
  • The classic two-process pattern: parent forks child with CLONE_NEWPID | CLONE_NEWNS | ...; parent writes UID/GID maps for the child; child waits via pipe for parent to finish setup; child performs final setup (mount /proc, pivot_root); child execs.

22.2 Mechanical Detail

  • In Go, golang.org/x/sys/unix.Clone does not exist directly; use syscall.SysProcAttr{Cloneflags: ...} on exec.Cmd, or use the lower-level syscall.Syscall(SYS_CLONE, ...).
  • The runc "init" pattern: a self-re-exec into the binary with a sentinel argument signaling "I'm the container init." The first invocation does the setup; the re-exec performs pivot_root and final execve. Read runc/libcontainer/standard_init_linux.go.
  • UID/GID mapping: write to /proc/<child-pid>/uid_map and gid_map. For non-root parents, also write setgroups deny first.
  • pivot_root requires a mount(MS_PRIVATE) of the parent mount before the call (to avoid leaking mounts back to the host).

22.3 Lab-"Namespaces Working"

  1. Implement the parent/child fork-with-clone-flags. Verify lsns -p <pid> shows new namespaces.
  2. Implement pivot_root into the rootfs. Verify / inside the container is the bundle's rootfs/.
  3. Implement /proc mount inside the new PID namespace. Verify ps shows only the container's processes.
  4. Implement UID/GID mapping for user-namespaced runs.

22.4 Hardening Drill

  • Mask /proc/kcore, /proc/keys, /proc/timer_list, /proc/sched_debug, /proc/scsi, /sys/firmware. Make /proc/asound, /proc/bus, /proc/fs, /proc/irq, /proc/sys, /proc/sysrq-trigger read-only. (Same as runtime-spec's maskedPaths and readonlyPaths.)

22.5 Production Readiness Slice

  • Run runc's integration tests against your runtime if feasible (they're spec-compliance tests). At minimum, run a representative subset of the OCI runtime test suite.

Comments