Week 15 - Seccomp Profiles for Containers¶
15.1 Conceptual Core¶
A seccomp profile is a JSON document describing per-syscall actions: allow, log, errno (return an error), or kill (terminate the process). The container runtime compiles the JSON to a BPF filter and applies it via prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &filter). Once installed, the filter cannot be loosened - only tightened.
The default Docker profile allows ~310 syscalls and blocks ~50 (the ones rarely needed by app containers but useful to attackers - keyctl, kexec_load, umount, etc.). Multiple recent kernel CVEs have been blocked entirely by the default profile, even on unpatched hosts. Tighter custom profiles per service reduce attack surface further.
15.2 Mechanical Detail¶
Profile structure: a defaultAction plus per-syscall rules, with optional argument-value matching.
{
"defaultAction": "SCMP_ACT_ERRNO",
"syscalls": [
{ "names": ["read", "write", "exit_group", "futex", "mmap"],
"action": "SCMP_ACT_ALLOW" },
{ "names": ["openat"],
"action": "SCMP_ACT_ALLOW",
"args": [{"index": 2, "value": 0, "op": "SCMP_CMP_MASKED_EQ", "valueTwo": 2}] }
]
}
Generating profiles for a specific service:
oci-seccomp-bpf-hook(Red Hat) - attach to a container, record every syscall it makes during a representative workload, emit a JSON profile. The right tool for "what does this app actually need?"falcoctl/ Falco artifacts - newer; supports community-shared profiles.- Manual:
strace -c -ff -o trace.out <cmd>- enumerate syscalls under load, then deny everything else. Slower but no extra dependency.
Apply:
- Docker / podman: --security-opt seccomp=profile.json.
- Kubernetes pod spec: securityContext.seccompProfile.type: Localhost + localhostProfile: profiles/myapp.json (the kubelet looks under /var/lib/kubelet/seccomp/).
The trap
Recording a seccomp profile from a happy-path workload only. Edge cases (error handling, log rotation, graceful shutdown) need different syscalls; the profile blocks them in production and the service crashes mysteriously hours later. Always run the recorder through your full integration-test suite, not just the smoke test.
15.3 Lab - "Custom Seccomp"¶
- Run a service under
oci-seccomp-bpf-hook(orstrace -ff) and exercise it with your integration tests. - Generate a tight profile (default-deny + only the recorded syscalls).
- Run with the profile; verify the service works under load.
- Inject a "test" syscall (e.g.,
setns,unshare, ormount) the service doesn't legitimately use; verify it's blocked at runtime.
15.4 Hardening Drill¶
For long-running services, ship the custom seccomp profile alongside the image (e.g., as /seccomp/profile.json baked in, or as a ConfigMap mounted into /var/lib/kubelet/seccomp/). Reference it in deployment configs. Version it with the code - a profile that goes stale relative to its app is worse than no profile.
15.5 Production Readiness Slice¶
Document a process: every new service must ship with a seccomp profile generated from a representative load test, reviewed by a peer, committed to the repo. Pre-prod CI: run with the profile in audit-only mode (SCMP_ACT_LOG), collect any unexpected syscalls, fail the build if there are deltas from the committed profile.