Skip to content

Week 2 - Syscalls and the Kernel/Userspace Boundary

2.1 Conceptual Core

  • A system call is a transfer of control from userspace to the kernel via a defined ABI: trigger an interrupt or a syscall instruction, the kernel reads register-passed arguments, dispatches via a table indexed by syscall number.
  • On x86_64 Linux: arguments in rdi, rsi, rdx, r10, r8, r9; syscall number in rax; return in rax. Errors as negative rax ( - errno`).
  • libc wraps each syscall in a function (open(2) is a thin wrapper; some wrappers like fork(3) glue to clone(2)).

2.2 Mechanical Detail

  • Read `arch/x86/entry/syscalls/syscall_64.tbl - the syscall table.
  • The path: userspace syscall instruction → entry_SYSCALL_64 (arch/x86/entry/entry_64.S) → do_syscall_64sys_<name> in C.
  • strace -f -e trace=%file ./prog traces file-related syscalls only.
  • ltrace for library-level tracing (less useful since most actions hit the kernel anyway).
  • perf trace is the modern equivalent of strace with much lower overhead.
  • audit (auditd) for production-grade syscall logging-gated by rules, written via netlink.

2.3 Lab-"Syscall Forensics"

  1. `strace -c ls /etc - produce a count summary of syscalls. Predict the top 5; verify.
  2. Implement cat in pure C using only open, read, write, close. No libc helpers (syscall(SYS_open, ...)).
  3. Run under strace -f to verify zero unexpected calls.
  4. Build a minimal seccomp allowlist for your cat, allowing only the syscalls actually used. Verify it kills attempts to invoke other syscalls.

2.4 Hardening Drill

  • Configure auditctl -a always,exit -F arch=b64 -S execve -k exec to log every execve. Read the resulting aureport -x output. Document the operational cost (log volume).

2.5 Performance Tuning Slice

  • Run a workload under perf stat -e syscalls:sys_enter_*. Identify the highest-frequency syscall. Hypothesize a reduction (batching, larger buffers, splice).

Comments