Capstone Projects-Three Tracks, One Choice¶
Pick one. The work performed here is what you describe in interviews.
Track 1-Kernel Module: An Out-of-Tree LKM¶
Outcome: a non-trivial out-of-tree Linux kernel module, KUnit-tested, sparse-clean, KASAN-clean, with a clear README and a path toward upstream submission (even if you don't take it all the way).
Suggested scopes¶
- A character-device key/value store (week 21 lab, hardened). Adds:
ioctlfor batch ops, anmmapinterface for zero-copy reads, RCU-protected reader path. - A netfilter hook. A small accelerator that, e.g., counts packets matching a configurable BPF filter at the netfilter ingress hook, with stats exposed via a
procfsentry. - A custom tracepoint suite. Add tracepoints to a subsystem of your choice (e.g., your
pkvmodule from the lab) and write abpftraceconsumer.
Acceptance¶
- Loads cleanly on at least two LTS kernels (e.g., 6.6 and 6.12).
- KUnit tests in tree; pass on both kernels.
- KASAN, lockdep, KCSAN warnings: zero across stress-test load.
- Signed for secure boot.
- A
README.mdwith build, install, and use; aDESIGN.mdwith locking and memory ownership documented.
Skills exercised¶
- Months 1 (kernel boundary), 2 (memory + scheduling internals), 6 (LKM development).
Track 2-eBPF Observability Tool¶
Outcome: a production-grade tracing tool comparable in quality to one of Brendan Gregg's BCC tools, with a proper userspace consumer, a Prometheus exporter, and CO-RE portability.
Suggested scopes¶
syscallat-system-call latency histograms, per-syscall, per-process, with low overhead. Equivalent ofbpftrace'ssyscountbut production-quality.tcptop-top-N connections by bytes/sec, sortable by direction. Cilium's Hubble has equivalents; do this from scratch.- A profiler-like tool that, given a PID, samples on-CPU stacks at 99 Hz, aggregates with a frequency table, and exposes flamegraph data.
Acceptance¶
- Implemented with
libbpf+ CO-RE. - Userspace consumer in C or Go (using
cilium/ebpf). - Runs on kernels 5.10+ without recompilation.
- Verifier-clean across architectures (x86_64 + aarch64 minimum).
- Prometheus exporter with low-cardinality labels.
- A
bpftraceequivalent for comparison; document why the production version exists. - CPU overhead under representative load: < 1%.
Skills exercised¶
- Months 3 (eBPF), 4 (networking, if you pick
tcptop), 6 (perf tuning).
Track 3-Self-Healing Distributed Service¶
Outcome: a small distributed service (a multi-instance HTTP API, a job runner, a metrics collector) deployed on Linux hosts with a comprehensive self-healing posture.
Suggested scopes¶
- A 3-node deployment of a small HTTP service:
- Each node is a hardened Ubuntu/Debian/Rocky host provisioned by Ansible.
- The service is systemd-managed with watchdog, full hardening directives, cgroups-v2 resource limits.
- On any node failure, the survivors continue serving (use a TCP load balancer + healthcheck, e.g., HAProxy or IPVS).
- Memory pressure (PSI > X%) triggers a soft restart of the worst offender via a cgroup-event watcher.
- Disk pressure triggers log rotation and old-data cleanup.
- A
chaos.shscript kills random nodes; the cluster recovers without human intervention.
Acceptance¶
- Reproducible from Ansible:
ansible-playbook site.ymlbrings up 3 hosts from blank Ubuntu cloud images. - Full observability:
node_exporter, journald, eBPF tools, Prometheus + Grafana. - A documented threat model and CIS-aligned baseline (lynis score).
- A 60-minute "chaos demo": run
chaos.sh; observe full self-healing; produce a one-page incident report from logs. - Encryption at rest (LUKS on data volumes); TLS between nodes (
step-caor self-signed); auditd shipping logs off-host.
Skills exercised¶
- All months. This is the integrative track-the right choice if you want operations-engineer breadth.
Cross-Track Requirements¶
host-baseline/template integrated.- ADRs (≥3).
THREAT_MODEL.md,RUNBOOK.md,RECOVERY.md.- Defense readiness: a 45-minute walkthrough with a peer.