Skip to content

Week 18 - Capabilities, Seccomp, no_new_privs

18.1 Conceptual Core

  • Linux capabilities subdivide the historical "root" privilege into ~40 discrete capabilities (CAP_NET_ADMIN, CAP_SYS_PTRACE, CAP_DAC_OVERRIDE, etc.). A process holds bounding, effective, permitted, inheritable, and ambient sets.
  • The principle: a service should hold only the capabilities it needs. A web server binding port <1024 needs `CAP_NET_BIND_SERVICE - not full root.
  • seccomp-bpf is a syscall-level allowlist/denylist enforced by an eBPF program attached at prctl(PR_SET_SECCOMP) time. A killer feature for sandboxing.
  • no_new_privs (PR_SET_NO_NEW_PRIVS): once set, neither the calling task nor its descendants can gain privileges via setuid binaries, file capabilities, or LSM transitions. Required before applying user-space seccomp (and a generally good default).

18.2 Mechanical Detail

  • getcap, setcap to manage file capabilities. getpcaps <pid> for process caps.
  • systemd directives: CapabilityBoundingSet=, AmbientCapabilities=, NoNewPrivileges=yes, SystemCallFilter=, SystemCallArchitectures=native.
  • SystemCallFilter=@system-service is a curated allowlist that covers most service workloads. Combine with explicit denylists for risky calls.
  • For container runtimes, the Docker default seccomp profile (`/etc/docker/seccomp.json - equivalent) is a reasonable baseline; understand why each blocked syscall is blocked.

18.3 Lab-"Capabilities and Seccomp"

  1. Convert a service that runs as root to one that runs as a non-root user with only the minimum capabilities.
  2. Author a seccomp policy using libseccomp that allows only the syscalls the service uses. Verify by attempting denied syscalls.
  3. Apply via systemd SystemCallFilter= and confirm.

18.4 Hardening Drill

  • Review every long-running service on a host. For each: what capabilities does it actually need? Document. Tighten where possible.

18.5 Performance Tuning Slice

  • seccomp adds a small per-syscall cost. Measure with perf stat -e syscalls:sys_enter_* before and after.

Comments