Labs¶
Every weekly module ends with a hands-on Lab. The labs are the curriculum -- internalizing a topic happens here, not in the prose above. This index aggregates every lab across every path so you can browse by domain instead of by curriculum.
187 labs across 8 paths.
Jump to a path¶
- AI Systems
- Container Internals
- Go Mastery
- Java Mastery
- Kubernetes
- Linux Kernel
- Python Mastery
- Rust Mastery
AI Systems¶
Week 1 - The Compute Hierarchy and the Cost Model - Roofline Sketch¶
Write a small program (Python+NumPy, or any) that:
1. Performs C = A @ B for square matrices N=64, 256, 1024, 4096.
2. Times each. Computes achieved FLOPS (= 2·N³ / time).
3. Computes the bytes moved (= 3·N²·sizeof(dtype)).
4. Plots achieved FLOPS vs arithmetic intensity on log-log axes.
5. Overlays the theoretical roofline of your laptop CPU (look up its peak FLOPS and DRAM bandwidth).
You should see the small N points sit under the bandwidth ramp and the large N points approach the compute roof. Keep the plot-every subsequent lab will produce another.
Week 2 - Linear Algebra Refresh, BLAS, NumPy - Three Matmuls¶
Implement 1024×1024 matmul three ways:
1. Naive triple-loop in Python (will take ~minutes; that's the point).
2. Naive in NumPy with explicit loops-only marginal speedup.
3. numpy.dot-measure speedup over (1).
You should see ~10,000× speedup between (1) and (3). Internalize why. Read Goto and van de Geijn's "Anatomy of a High-Performance Matrix Multiplication" if you want the deep version (recommended).
Week 3 - Tensors, Autograd, the Gradient Tape - Autograd From Scratch¶
Implement reverse-mode AD in ~100 lines of pure Python (no PyTorch). Support:
- A Tensor class wrapping a NumPy array with a grad field.
- __add__, __mul__, __matmul__, relu, sum. Each records its inputs and a backward function.
- A backward() method that topologically sorts and traverses the graph.
- Test on a tiny MLP: define f = x @ W1 + b1; g = relu(f); h = g @ W2 + b2; loss = h.sum(). Verify the gradients match a torch.autograd reference within float-precision.
This is Andrej Karpathy's micrograd exercise. Do it before reading his code; then read his code and compare.
Week 4 - The Honest Training Loop - Train Something Small, Right¶
Train a 1-layer transformer (or 2-layer MLP if transformer is too far) on TinyShakespeare or MNIST. Required:
- Dataset class + DataLoader with num_workers=4, pin_memory=True.
- AMP autocast (BF16 on Ampere+, FP16 with GradScaler on older).
- LR schedule (warmup + cosine).
- Checkpoint every N steps; able to resume from any checkpoint and produce identical loss thereafter (within 1e-5).
- Per-step metrics: loss, tokens/sec, GPU memory, GPU util%.
- Final report: train + val loss curves, throughput, peak memory, total cost in $ (compute hours × $/hr).
Week 5 - GPU Hardware Architecture - Inspect Your Hardware¶
- Run
nvidia-smiandnvidia-smi -q. Read every line. - Compile and run NVIDIA's
deviceQuerysample. It prints all the numbers above for your specific GPU. - Compile and run
bandwidthTest(CUDA samples). Compare measured PCIe and HBM bandwidth to spec. - Compute: at the measured HBM BW and compute peak of your GPU, what is the arithmetic intensity break-even? Sketch the roofline.
Week 6 - Your First CUDA Kernels - Kernel Speedrun¶
Write three kernels in CUDA C++:
1. Vector add: SAXPY (y = a*x + y). Time vs cuBLAS axpy.
2. Reduction: sum a million floats. Compare your naive version (one global atomic) with a hierarchical version (block-level reduction in shared memory, then global). Expect ~100× difference.
3. Naive matmul: 1024×1024 BF16. Compare to cuBLAS-expect to be 50-100× slower. Don't get discouraged; you'll close most of the gap in week 7.
For each: measure runtime with cudaEvent_t timing; compute achieved throughput; mark on the roofline.
Week 7 - Memory Optimization: Coalescing, Shared Memory, Tensor Cores - Climb the Roofline¶
Take your week 6 naive matmul and progressively optimize:
1. Coalesce loads (transpose access pattern). Re-time.
2. Tile in shared memory with 32×32 blocks. Re-time.
3. Double-buffer with cp.async. Re-time.
4. Use tensor cores with BF16. Re-time.
You should reach 30–60% of cuBLAS perf. Document each step's improvement and the residual gap. Read NVIDIA's cutlass examples for the production-grade version.
Week 8 - Triton: GPU Kernels From Python - Three Triton Kernels¶
- Elementwise add (the Hello World).
- Softmax with online maximum subtraction (numerical stability). Compare to
torch.softmaxperf. - Naive matmul in Triton with autotuning. Compare to cuBLAS-you should reach 70-90% of peak for square BF16 matmul on common shapes.
Week 10 - torch.compile, TorchDynamo, Inductor - Compile and Compare¶
Take your honest-training-loop from Month 1. Add model = torch.compile(model). Measure:
1. First-step time (compilation cost).
2. Steady-state step time vs uncompiled.
3. With TORCH_LOGS="recompiles": how many recompilations occurred? Why?
4. With mode="max-autotune": extra speed vs default? Worth the compile time?
Triage any graph breaks; report in COMPILE_LOG.md.
Week 11 - JAX, XLA, HLO - JAX Equivalent¶
Re-implement your Month 1 training loop in JAX:
- Pure-functional model (no nn.Module mutation).
- optax for the optimizer.
- jax.jit the train step.
- Add jax.vmap somewhere meaningfully (e.g., per-example metric computation).
- Compare end-to-end throughput with the PyTorch baseline.
Week 12 - Custom Operators: From CUDA Kernel to torch.ops - RMSNorm From Scratch¶
RMSNorm is used in modern LLMs (Llama, Qwen). Implement it three ways:
1. PyTorch: pure tensor ops.
2. Triton custom op: a fused kernel that reads input, computes RMS, normalizes, scales-all in one pass over HBM.
3. CUDA C++ extension: same kernel in CUDA C++ with a pybind11 binding.
For each: forward + backward, autograd-correct (numerical-grad test), benchmarked vs the others on (B, S, H) = (8, 4096, 4096) BF16. Your fused Triton version should beat PyTorch by 3-5×.
Week 9 - PyTorch Internals: Tensor, Dispatcher, ATen - Trace an Op¶
- From Python, run
a + bfor two CUDA tensors. UseTORCH_SHOW_DISPATCH_TRACE=1(ortorch._C._dispatch_print_registrations()) to see the dispatcher's path. - Read `aten/src/ATen/native/cuda/BinaryOps.cu - find the actual CUDA kernel for add.
- Trace
torch.matmul(a, b)similarly. Note that for BF16 it routes to cuBLAS. - Document the call chain in
TRACE.md.
Week 13 - Communication Primitives: NCCL, Allreduce, Topology - Allreduce Bench¶
On at least 2 GPUs (single node fine), run an allreduce benchmark:
1. torch.distributed.all_reduce on tensors from 1 KB to 1 GB.
2. Compute achieved bandwidth (= 2(N-1)/N · message_size / time).
3. Plot bandwidth vs message size; identify the message size at which BW saturates (the "knee").
4. If you have access: run on 8 GPUs via single node (NVLink) and compare to 8 GPUs across 2 nodes (InfiniBand). Document the gap.
Week 14 - Data Parallelism: DDP, ZeRO, FSDP - FSDP a Small Model¶
On 4-8 GPUs (single node fine):
1. Train a 1B-parameter transformer in FSDP. Use transformer_auto_wrap_policy.
2. Compare memory and throughput: DDP-OOM-baseline (small model) vs FSDP small vs FSDP same-model-larger.
3. Add activation checkpointing; re-measure.
4. Add CPU offload; observe the speed cost.
5. Compute scaling efficiency (throughput_8gpu / (8 × throughput_1gpu)).
Week 15 - Tensor Parallelism and Pipeline Parallelism - Implement Tensor-Parallel Attention¶
By hand, in pure PyTorch + torch.distributed:
1. Implement the Megatron-style tensor-parallel multi-head attention: column-parallel QKV projection, sharded heads, row-parallel output projection.
2. Verify numerically against a single-GPU reference for correctness (allclose to atol=1e-3).
3. Benchmark on 4 GPUs vs 1-GPU baseline. Compute scaling efficiency.
Week 16 - Mixed Precision, FP8, Numerical Stability at Scale - FP8 Train a Small Model¶
On at least one H100/H200/B200 (you may need to rent for a day):
1. Take your week 14 FSDP setup. Replace all linear layers with te.Linear. Wrap blocks with te.fp8_autocast.
2. Train the same model in BF16 vs FP8. Compare:
- Throughput.
- Memory.
- Loss curve (the test of stability-FP8 should match BF16 within noise).
3. Document any NaN events and recovery actions.
If H100+ is unavailable, do this lab in BF16 + torch.cuda.amp, comparing against FP32. The instability dynamics are similar at lower stakes.
Week 17 - LLM Inference, the KV-Cache, Attention Math - Decode From Scratch¶
- Implement greedy decoding for a small Hugging Face model (Llama-3-8B works on a single A100; smaller for L4):
- Prefill once, capture KV-cache.
- Decode loop: forward(token, kv_cache) → next_token.
- Append next_token to KV-cache.
- Measure tokens/sec. Compute the achieved HBM BW (model weights × tokens / time).
- Replace standard attention with
flash_attn_with_kvcache. Re-measure. - Document the decode-vs-prefill latency split for a 1K-prefill, 512-decode request.
Week 18 - Paged Attention, Continuous Batching, vLLM - vLLM Internals¶
- Install vLLM. Serve a 7B model. Run a load test (
benchmark_serving.py) at various concurrency levels. - Read
vllm/core/scheduler.pyandvllm/attention/backends/flash_attn.pyend-to-end. Annotate the scheduler's iteration loop. - Build a mini-scheduler in Python (not for prod; for understanding): manages a fixed pool of KV blocks, schedules decode steps, evicts on memory pressure. Use real model forward via vLLM's lower-level APIs or HuggingFace.
- Compare throughput of your mini-scheduler vs vLLM proper. The gap is likely 5-20×-that gap is your education.
Week 19 - Quantization: INT8, INT4, FP8, AWQ, GPTQ, SmoothQuant - Quantize and Compare¶
On a 7B-13B model: 1. Run baseline BF16 inference. Capture TTFT, TPOT, model size, throughput. 2. Quantize with AWQ (W4A16). Re-measure. Eval on a small held-out set (e.g., MMLU 200-question subset, or perplexity on Wikitext) for accuracy. 3. Quantize with FP8 (if on Hopper+). Re-measure. 4. Optionally: GPTQ comparison, AWQ INT8 comparison. 5. Build a tradeoff matrix: throughput, memory, perplexity / accuracy.
Week 20 - Speculative Decoding, Disaggregation, Inference Frontiers - Speculative Decoding¶
- Pair a small model (1B) drafting a larger model (7-13B).
- Implement vanilla speculative decoding: draft-then-verify.
- Measure: acceptance rate, tokens/sec gain, vs baseline single-model decoding.
- Tune K (draft length); sweep; identify the sweet spot for your workload.
Week 21 - ML on Kubernetes: KServe, KubeRay, Volcano, GPU Operators - Train and Serve on K8s¶
- Bring up a small GPU-enabled cluster (kind+nvidia, or a 2-node cloud cluster with 1-2 GPUs each).
- Install GPU Operator. Verify
kubectl describe nodeshowsnvidia.com/gpu: N. - Install Volcano. Submit a 4-GPU gang-scheduled training job (a small FSDP run from week 14).
- Install KServe + vLLM runtime. Deploy a 7B model. Hit it with a load test. Demonstrate autoscaling.
- Document the YAML for each in a deployable repo.
Week 22 - Observability, Cost, Eval Pipelines, MLOps - Eval and Drift Pipeline¶
- Build a CI pipeline: on every model push, run lm-evaluation-harness on a fixed subset (MMLU 500-question, HumanEval pass@1).
- Compare against a baseline; fail the pipeline on >2% regression.
- Wire production traffic samples into a drift dashboard: input length distribution, output length distribution, refusal rate, fraction of failed JSON-mode outputs.
- Synthetic drift: shift the input distribution (longer prompts) and verify the dashboard catches it.
Week 23 - Safety, Red-Teaming, Alignment Infrastructure - A Safety Layer¶
Take your week 21 vLLM deployment. Add: 1. Input classifier (Llama Guard or a small custom classifier)-block obvious prompt injections. 2. Output classifier-block policy-violating outputs. 3. Constrained-decoding mode for any structured-output endpoint. 4. Audit logging to a separate, append-only store. 5. A nightly red-teaming job that fires 1000 adversarial prompts; measures failure rate; alerts on regression.
Week 24 - Capstone Integration & Defense - Defend the Design¶
Schedule a 60-minute mock review (peer or recorded). Walk through: 1. The architecture diagram. 2. The roofline analysis: where does your system sit on the roofline? What's bound by what? 3. One slide per non-obvious decision (e.g., "why FSDP-2 over DeepSpeed Stage-3", "why AWQ over GPTQ", "why your batching policy"). 4. A live demo of the end-to-end artifact. 5. A live demo of one production-quality concern: cost, observability, safety, or fault tolerance.
The deliverable is the defense, not the slides. If you cannot answer: - "What is your worst-case tail latency under 10× concurrent load?" - "What happens when a GPU fails mid-training?" - "What is your cost per million output tokens?" - "How would you scale this to 10× the model size?" ...you have not yet finished the curriculum.
Container Internals¶
Week 1 - The OCI Image Spec - An Image Without Docker¶
skopeo copy docker://alpine:3.19 oci:./alpine-layout:3.19. Inspect the layout. Readindex.json, the manifest blob, the config blob.- Find a layer blob, decompress, list its contents (
tar tzf <blob>). - Compute one of the layer digests yourself (
sha256sum) and verify. - Modify the config (e.g., change the entrypoint) by writing a new config blob, generating a new manifest, updating
index.json. Verify withskopeo inspect oci:./alpine-layout:3.19.
Week 2 - The OCI Runtime Spec, runc, and crun - Run a Container Without Docker¶
- Generate a default config:
runc specproducesconfig.json. - Build a rootfs:
mkdir rootfs && skopeo copy docker://alpine:3.19 oci:./alpine && umoci unpack --image ./alpine:3.19 ./bundle(umoci gives you both rootfs + config in one step). Or do it manually. - Run:
sudo runc run mycontainer. You're inside the container. - Modify the config to: drop all capabilities except
CAP_NET_BIND_SERVICE, set a memory limit of 64M, mask/proc/sys. Re-run; verify withcat /proc/self/status | grep Capand pressure tests. - Repeat with
crun. Time the startup difference (time runc runvstime crun run)-crunis typically 2–5× faster.
Week 3 - skopeo Deep Dive: Multi-Arch, Signing, Sync - A Daemonless Image Pipeline¶
- Pull a multi-arch image as an OCI index. Inspect each per-platform manifest.
- Write a script that, given an image reference, prints a table of platforms, layer counts, total compressed/uncompressed sizes, and labels.
- Use
skopeo syncto mirror three images into your local registry. Verify by pulling the mirrored versions. - Compare
skopeo copyof a 1-GB image with and without - -multi-arch index-only` on the destination side.
Week 4 - Image Internals: Manifest Lists, Index, Annotations, Sparse Pulls - Build a Multi-Arch Image By Hand¶
- Build an image for
linux/amd64andlinux/arm64separately (usebuildah --arch=ordocker buildx). - Use
skopeoto assemble a manifest list pointing to both. - Push to your local registry.
- Pull from each architecture; verify the right manifest is selected.
- Add OCI annotations (
source,revision,created); verify they survive the pipeline.
Week 5 - OverlayFS and Storage Drivers - OverlayFS By Hand¶
- Create three lower dirs with different files. Mount as overlay. Verify merged view.
- Modify a file from the lower; observe copy-up in the upper.
- Delete a lower file from the merged view; observe the whiteout in the upper.
- Reproduce a "container layer": treat your container's tarball-extracted contents as a lower; create a fresh upper; mount; modify; tar up the upper to produce a new layer.
Week 6 - buildah: Building Images Without Dockerfiles - Image as a Shell Script¶
- Write a shell script that uses
buildah from,run,copy,config,committo produce a small Go-binary-on-alpineimage. No Dockerfile. - Add reproducibility flags: - -source-date-epoch
, - -timestamp,SOURCE_DATE_EPOCHenv. Build twice; verify hashes match. - Build the same image with
buildah bud -f Dockerfile. Compare hashes-they should be identical when both are reproducible.
Week 7 - Multi-Stage Builds, Distroless, Minimal Images - Three Image Diet¶
Take a Go (or Rust, or Python) service and produce three images:
1. Naive: FROM ubuntu, build inline. Measure size.
2. Distroless: multi-stage with gcr.io/distroless/static. Measure size.
3. Scratch: static build, FROM scratch. Measure size.
Document the size delta and any operational tradeoffs (e.g., scratch has no ca-certificates -tls.Configfailures unless youCOPY --from=alpine /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/`).
Week 8 - Layer Caching, Build Context, Reproducibility - Cache and Reproducibility¶
- Take a non-trivial image; measure clean-build time and incremental-build time (single source change). Reorder Dockerfile to maximize cache hits; re-measure.
- Enable BuildKit cache mounts; measure again.
- Build the same image on two machines with
SOURCE_DATE_EPOCHset; verify the digests match.
Week 10 - CRI-O and the Kubernetes CRI - CRI Direct¶
- Install
CRI-Oon a clean machine (or usecontainerdwith its CRI plugin). - Use
crictl runp(run pod),crictl create,crictl startto manually launch a pod-equivalent without Kubernetes. Inspect withcrictl inspect. - Add an OCI hook (e.g., a pre-start hook that logs every container) by configuring CRI-O's
hooks_dir.
Week 11 - podman and the Rootless Model - Rootless Production¶
- As a non-root user, install
podman. Configure/etc/subuid,/etc/subgid. - Run a multi-container app with
podman play kube(Kubernetes-YAML-as-podman-input). - Generate systemd units; install with - -user
. The service starts at user login and persists across reboots (withloginctl enable-linger`). - Compare
slirp4netnsvspastanetworking throughput withiperf3.
Week 12 - Sandboxed Runtimes: gVisor and Kata Containers - Two Sandboxes¶
- Install gVisor. Register as a containerd runtime. Run
nerdctl --runtime runscagainst a test workload. - Install Kata. Register as a containerd runtime. Run the same workload.
- Benchmark both vs
runcfor: startup time, syscall-heavy workload (e.g.,find /usr -type f), and CPU-bound workload (e.g.,sysbench cpu). - Document the tradeoffs in a markdown matrix.
Week 9 - containerd Architecture - containerd Without Kubernetes¶
- Install
containerdandnerdctl. Configure/etc/containerd/config.toml. - Pull, run, exec, kill containers entirely via
nerdctl. Confirmdockerdis not running. - Enable the
stargz-snapshotter. Pull a large image with eStargz layers. Measure first-run startup time vs cold pull. - Use
ctrto inspect tasks, snapshots, and content blobs at the daemon level.
Week 13 - The Default Threat Model - Audit a Real Image¶
- Pick a popular base image (
nginx,redis). Run withdocker scout cvesandtrivy image. Record findings. - Run as default; identify how many capabilities it has via
capsh --print. - Re-run with - -cap-drop=ALL
, - -security-opt=no-new-privileges, read-only rootfs. Identify what breaks. Fix what's needed. - Document the minimum config to run safely.
Week 14 - Capabilities for Containers - Capability Diet¶
- For three services (e.g., a Go HTTP server, an Nginx reverse proxy, a Node.js app), run each with
--cap-drop=ALL. Identify what fails (the error usually mentions the syscall - map back to the capability viacapabilities(7)). - Add back capabilities one at a time. Document the minimum set per service.
- Configure your container runtime (podman, containerd) or pod-security policy to apply this minimum by default.
Week 15 - Seccomp Profiles for Containers - Custom Seccomp¶
- Run a service under
oci-seccomp-bpf-hook(orstrace -ff) and exercise it with your integration tests. - Generate a tight profile (default-deny + only the recorded syscalls).
- Run with the profile; verify the service works under load.
- Inject a "test" syscall (e.g.,
setns,unshare, ormount) the service doesn't legitimately use; verify it's blocked at runtime.
Week 16 - LSM for Containers: SELinux and AppArmor - MAC Per Workload¶
- On RHEL/Fedora: write a custom SELinux policy module for one service. Test enforcement.
- On Ubuntu/Debian: write an AppArmor profile for the same service. Test enforcement.
- Document the comparative effort and expressivity.
Week 17 - Software Bill of Materials (SBOM) - SBOM Pipeline¶
- Generate an SBOM for one of your images with Syft (SPDX) and Trivy (CycloneDX). Diff the two-note where they disagree.
- Attach the SBOM to the image with
cosign attach sbom. - From a downstream consumer, retrieve and parse the SBOM with
cosign download sbom. - Add a CI step that fails the build if the SBOM contains a known-bad license (e.g., AGPL in a closed-source project).
Week 18 - Vulnerability Scanning: Grype, Trivy, Clair - Triage in CI¶
- Run Trivy on an image; produce a SARIF report. Upload to GitHub Code Scanning (or your scanner of choice).
- Pick three findings; for each, write a one-paragraph triage decision: fix, accept, or VEX-suppress.
- Author the VEX statement using
vexctl(OpenVEX). Attach to the image. - Re-scan-verify the suppressed findings are now flagged as "not exploitable" rather than disappearing entirely.
Week 19 - Signing and Verification: Cosign, Sigstore - Signing Pipeline¶
- Sign an image with cosign keyless (GitHub OIDC). Verify.
- Attach SBOM and VEX as attestations.
- Configure
policy-controller(Sigstore's Kubernetes admission controller) to require a valid signature from your CI's OIDC subject before allowing deploys. - Try to deploy an unsigned image-observe the rejection.
Week 20 - SLSA, Provenance, and Reproducibility - SLSA L3 in CI¶
- Set up a GitHub Actions workflow that builds, scans, signs, and produces SLSA L3 provenance for an image on every release tag.
- Verify end-to-end: pull the image, retrieve its attestations, validate the provenance points back to the correct commit and CI run.
- Reproducibility: rebuild the same tag from a fresh runner; verify image digest stability.
Week 21 - Scaffolding: Project Setup, OCI Bundle Reading - Parse and Run¶
- Implement
config.jsonparsing (theruntime-specrepo has a Go reference type definition). - Implement a no-isolation mode: just
chdir(rootfs),chroot(rootfs),execve. Verify it runs. - Add command-line plumbing for the lifecycle subcommands.
Week 22 - Namespaces and Process Isolation - Namespaces Working¶
- Implement the parent/child fork-with-clone-flags. Verify
lsns -p <pid>shows new namespaces. - Implement
pivot_rootinto the rootfs. Verify/inside the container is the bundle'srootfs/. - Implement
/procmount inside the new PID namespace. Verifypsshows only the container's processes. - Implement UID/GID mapping for user-namespaced runs.
Week 23 - Cgroups v2, Capabilities, Seccomp, OverlayFS - All The Layers¶
- Implement cgroup v2 setup. Verify
memory.max=64Mactually limits the container. - Implement capability dropping. Verify
getcap/capsh --printinside the container. - Implement seccomp filter loading. Verify a denied syscall fails.
- (Optional) Implement OverlayFS rootfs construction from a multi-layer image.
Week 24 - Polish, Defense, Distribution - Defend the Project¶
Schedule a 45-minute mock review:
- Live demo: build, run a container, exec into it, observe isolation.
- Walk through the lifecycle code with the OCI spec open beside it.
- Demo a hardened run (cgroups + caps + seccomp + LSM) and verify isolation.
- Compare with runc/crun: what's missing? What's different? Why is your design simpler?
Go Mastery¶
Week 1 - The Toolchain and the Build Pipeline - Hello World, Audited¶
- Create
hello-audited. Setgo 1.22and atoolchain go1.22.xdirective. - Build with
go build -trimpath -ldflags="-s -w -X main.version=v0.1.0". Rungo version -m ./hello-audited. - Strip with
stripand compare. Cross-compile tolinux/arm64,darwin/arm64,windows/amd64withGOOS=... GOARCH=... go build. - Document the size delta from each flag in
NOTES.md. - s -wtypically saves ~30%; - trimpathis a reproducibility flag (no local paths in the binary), not a size flag. - Inspect the binary with
go tool nmandgo tool objdump. Identify the runtime symbols (runtime.main,runtime.gcStart,runtime.schedule).
Week 2 - The GMP Scheduler Model - Schedule Forensics¶
Build a tiny program that:
1. Spawns 1,000 goroutines, each computing a busy CPU loop for 10ms.
2. Records the time-to-completion distribution.
3. Re-runs with GOMAXPROCS=1, =2, =N (your core count).
4. Re-runs with runtime.Gosched() inserted in the loop.
5. Re-runs with the loop replaced by time.Sleep(10*time.Millisecond) (the netpoller path).
Tabulate the latency distributions in NOTES.md. Explain why GOMAXPROCS=1 without Gosched() produces high tail latency. Then, capture an execution trace with runtime/trace:
go tool trace. Identify the per-P timeline, GC pauses, and proc transitions.
Week 3 - Stack Management - Stack Growth in the Wild¶
- Write a recursive function
func depth(n int) int { if n == 0 { return 0 }; var buf [256]byte; _ = buf; return 1 + depth(n-1) }. - Run with progressively larger
n. UseGODEBUG=gctrace=1,scheddetail=1and observe stack growth events. - Re-run under
runtime.ReadMemStatssnapshots, recordingStackInuseandStackSys. - Now write the same function with a `goroutine - per-call style and observe how stack churn changes.
Week 4 - Escape Analysis and the Inliner - Escape Forensics¶
For each of the following snippets, predict whether the value escapes, then verify with - gcflags=-m:
1.func A() int { x := 7; return &x }2.func B() int { x := 7; p := &x; return p }3.func C() { x := 7; go func() { fmt.Println(x) }() }4.func D() { x := bytes.Buffer{}; x.WriteString("hi"); fmt.Println(x.String()) }5.func E(s []int) int { return len(s) }called asE(make([]int, 8)).
6.func F() any { return 7 }(boxing intointerface{}`).
7. A method call on an interface value vs the concrete type (covered in Week 7).
For each that escapes, propose a refactor that keeps it on the stack. Then write a Criterion-style benchmark (testing.B) and prove the win.
Week 5 - Memory Layout, Padding, Alignment - Layout Forensics¶
- Define five "interestingly bad" structs (e.g.,
struct{ a bool; b int64; c bool; d int64; e bool }). Compute theirunsafe.Sizeofby hand, then verify. - Reorder for minimal padding. Re-measure. Document each delta.
- Build a benchmark with
[]Structof 1M elements; compare allocation/scan time with the badly-padded vs the optimally-packed version. Useruntime.ReadMemStatsto captureHeapAllocand GC pause durations. - Construct a false-sharing example: two atomic counters incremented by different goroutines, with and without
CacheLinePadbetween them. Benchmark contention. Expect 5–20× difference.
Week 6 - The Garbage Collector - GC Forensics¶
- Write a service that allocates 100 MB/s of short-lived objects. Run with
GODEBUG=gctrace=1. Read each GC line and identify: total heap, live heap, pause time, pacer target. - Set
GOMEMLIMIT=512MiBandGOGC=off. Re-run; observe how the GC is now driven entirely by the memory ceiling. - Set
GOGC=50(noGOMEMLIMIT). Re-run; observe more frequent, smaller GCs. - Capture a
go tool pprof -alloc_objectsprofile. Identify the top five allocation sites. Refactor at least two usingsync.Poolor pre-allocated buffers. Re-benchmark. - Capture a
go tool traceand locate the GC mark phases visually.
Week 7 - Interface Values, itabs, and Dispatch Cost - Interface Bench¶
- Build a tight loop calling a method via three paths: concrete type, interface, generic type parameter. Benchmark with - benchmem`.
- Inspect the disassembly with
go tool objdump -s 'main\.benchInterface'. Identify the indirect call. - Refactor a real-world pattern (a
Loggerinterface used 10× in a hot path) into a concrete type or a type-parameterized version. Measure the win or non-win. - Build a worst-case allocation example: passing a stack int into
fmt.Println(...). Show with - gcflags=-mthat the int escapes (boxing intoany). Replace withfmt.Println(strconv.Itoa(x))` and re-measure.
Week 8 - Allocation Profiling, sync.Pool, GC Tuning - Pool the Hot Path¶
- Take the JSON-handling hot path of any service. Run
pprof -alloc_objectsunder load. Identify the top three allocation sites. - Introduce a
sync.Poolfor the most appropriate one (typicallybytes.Bufferor a decoder). - Re-benchmark. The win should be visible in allocs/op and in p99 latency under load.
- Now intentionally misuse:
Pool.Putwithout resetting state. Detect the bug under - race` or via a deliberately-inserted assertion.
Week 10 - sync Primitives and sync/atomic - Lock-Free SPSC Ring¶
Build a single-producer, single-consumer ring buffer using only atomic.Uint64 indices. Pad the indices to separate cache lines. Validate with go test -race -count=1000 running 1 producer and 1 consumer. Benchmark against chan T and against sync.Mutex - protected slice. Document the cache-line padding's effect with awithoutPad` variant-expect a 3–10× difference on modern x86.
Week 11 - context.Context, Cancellation, errgroup, singleflight - Context Discipline¶
- Take a small HTTP service. Audit every blocking operation (DB query, downstream RPC, Redis call). Each should accept and propagate
ctx. Fail any goroutine that captures a requestctxand outlives the request. - Implement a parallel fan-out using
errgroupwith N=8 workers, all cancellable on first error. - Implement a cache stampede test: 1000 concurrent requests for the same uncached key. Without
singleflight, observe N upstream calls. Withsingleflight, observe 1. - Demonstrate
context.AfterFunccleanup: register a release-resource callback on cancellation; verify it fires under both timeout and explicit cancel.
Week 12 - Worker Pools, Leak Detection, Deadlock Prevention - Worker Pool Survival Test¶
Build a worker pool that handles:
1. Backpressure-bounded input channel, drop-with-metric on overflow.
2. Graceful shutdown-on ctx.Done(), drain in-flight tasks within a deadline, then abandon the rest.
3. Per-task timeouts-WithTimeout(ctx, 100ms) per task.
4. Panic isolation-a panic in one task does not kill the worker; recover and report.
5. Leak-clean-goleak passes after cancel(); pool.Wait().
Stress-test with 1M tasks across 1000 workers under - race`.
Week 9 - Channels, Deeply - Channel Internals¶
- Write a benchmark comparing: unbuffered chan, buffered chan(1), buffered chan(1024),
sync.Mutex+ slice queue, and a `sync/atomic - only SPSC ring buffer. Use 1 producer, 1 consumer, 10M messages. - Plot the throughput. The atomic SPSC should be 5–10× the channel; the mutex queue may beat the buffered channel for small messages.
- Reproduce a
nil - channel select pattern: a goroutine that toggles between two upstream channels by setting one tonil` to disable a case. - Write an "unbounded channel" using a goroutine that bridges an in-channel to an out-channel via an internal slice buffer. Discuss why this exists and why it is dangerous (memory growth on slow consumer).
Week 13 - Reflection: reflect, Performance, and Discipline - A Reflective Validator¶
Build a struct validator that processes validate:"..." tags:
- Must support: required, min=N, max=N, email, regexp=<re>.
- Must cache per-type field metadata (one reflect.Type walk per type ever).
- Must produce structured errors (path, rule, value).
- Must beat a naive non-cached implementation by 10× in benchmarks.
Compare against go-playground/validator for both ergonomics and performance.
Week 14 - go/ast, go/parser, go/types: Static Analysis - Build a Custom Analyzer¶
Write an analyzer that flags:
1. context.Background() calls outside main and *_test.go files.
2. time.After inside a select body (the classic timer-leak pattern).
3. Goroutines launched with closures capturing a context.Context parameter named ctx of an enclosing HTTP handler (heuristic; document the false-positive risk).
Wire as a unitchecker binary. Run on a real codebase and triage findings. Document each false positive in ANALYZER_NOTES.md.
Week 15 - go generate and AST-Based Code Generation - Three Generators¶
Build three small generators:
1. Enum stringer-a from-scratch reimplementation of stringer for one annotation pattern.
2. Mock generator-for one interface, generate a struct with method recorders and call assertions.
3. JSON marshaler-generate a type-specific MarshalJSON that allocates zero maps. Compare allocations against encoding/json for the same type.
For each: go vet - clean output,gofmt - formatted, with a go generate directive in the consumer file.
Week 16 - Plugins: plugin, go-plugin, gRPC-Based Extensions - A Pluggable Storage Backend¶
Build a service whose storage backend is a plugin. The host defines an interface Storage { Get(key) (val, err); Put(key, val) error; Delete(key) error }. Ship two plugins: an in-memory backend, and a file-system backend. Both communicate via gRPC over go-plugin. Demonstrate hot-swap by killing one plugin process and starting the other.
Week 17 - DDD in Go: Hexagonal Architecture, Bounded Contexts - A Hexagonal URL Shortener¶
Build a workspace implementing a URL shortener:
- internal/domain -ShortURLaggregate,URLRepoandHasherports.
-internal/application - Shorten and Resolve use cases.
- internal/adapter/postgres - implementsURLRepoagainst a real Postgres (usepgxnotdatabase/sql).
-internal/adapter/http - REST handlers using application.
- internal/adapter/memory - in-memoryURLRepofor tests.
-cmd/api - wires everything.
The architectural test (a Go test) walks the import graph and fails if internal/domain imports any adapter package or stdlib networking package.
Week 18 - Observability: slog, pprof, trace, OpenTelemetry - Wire the URL Shortener¶
Take week 17's URL shortener and add:
- slog JSON output with request-scoped logger via context.
- /metrics Prometheus endpoint exposing request count, latency histogram, and Go runtime metrics.
- OTLP traces exported to a local Jaeger via docker-compose.
- /debug/pprof/* on a separate admin port, gated by IP allowlist.
- A 30-second runtime/trace capture under load, committed as trace.out with a markdown analysis.
Week 19 - gRPC: Streaming, Interceptors, Deadlines, Retries, Outlier Ejection - A Hardened gRPC Service¶
Build a minimal Echo service with:
- Unary + server-streaming + bidi methods.
- Server interceptors for: panic recovery, request logging, OTel tracing, auth, rate limiting.
- Client config with retries (UNAVAILABLE only), 2 s default deadline, round-robin load balancing.
- A grpc.health.v1 health server.
- A tools/grpc_load_test/ directory with `ghz - based load tests; capture latency p50/p95/p99 under 10K QPS.
Week 20 - Testing Strategy: Five Surfaces, Race-Clean - Test-Pyramid the URL Shortener¶
- Unit: 100% line coverage on
internal/domainandinternal/applicationusing mocks for ports. - Integration:
testcontainers-goPostgres for the postgres adapter. - Fuzz: fuzz the alias-generation function, persisting any crashing inputs.
- Property:
goptertest that "shorten then resolve returns original URL." - E2E: a
make e2etarget that spins the full stack viadocker-compose, hits the HTTP API, asserts behavior. - All five surfaces run in CI under - race -count=1`.
Week 21 - Consensus Algorithms: Raft (and a Glance at Paxos) - Read Raft in Anger¶
- Read
etcd-io/raft/node.goandraft.goend-to-end. Annotate the state machine transitions. - Build a minimal in-memory KV store on top: a single goroutine consumes from
node.Ready(), applies entries to amap[string]string, persists log entries to a WAL, sends messages to peers, and acknowledges. - Run a 3-node cluster locally. Kill the leader; observe an election. Restart; observe log catchup.
- Add a snapshot mechanism every 10K entries.
Week 22 - Distributed Storage Patterns - Harden the KV Store¶
Take the week 21 Raft KV and add:
1. Pebble as the storage engine for both the WAL and the state machine.
2. Snapshots every N entries, with InstallSnapshot to recovering followers.
3. Linearizable reads via read-index.
4. Membership changes: add and remove nodes online.
5. Metrics: per-node Raft state, log lag, snapshot duration, apply latency.
Week 23 - Performance Tuning: Profile, Tune, Re-Profile - Profile-Tune-Profile¶
Take your capstone (whatever track) and:
1. Capture a CPU profile under representative load. Identify the top 5 functions.
2. Pick one and propose a fix. Estimate the win in advance.
3. Implement, re-profile, compare with benchstat. Document each change in PERF_LOG.md.
4. Capture a runtime/trace and identify any GC or scheduler stalls. Fix one.
5. Apply PGO. Confirm the win.
Week 24 - Capstone Integration, Defense, Final Hardening - Defend the Design¶
Schedule a 45-minute mock review with a senior peer (or record yourself). Present:
- The architecture diagram.
- One slide per non-obvious decision (e.g., "why etcd-io/raft over hashicorp/raft", "why Pebble over BoltDB", "why server-streaming over polling").
- A live demo of the test suite ( - race`, fuzzing, integration).
- A live demo of the observability stack (Jaeger, Prometheus, pprof).
- A live demo of fault tolerance (kill the leader, watch recovery).
The deliverable is the defense, not the slides. If you cannot answer "what is the worst-case write latency under leader change?" or "what is your goroutine count under 10× load?", you have not yet finished the curriculum.
Java Mastery¶
Week 1 - Modern Syntax and the Type System¶
Re-implement a small JSON-shaped expression evaluator: sealed interface Expr permits Lit, Add, Mul, Neg. Use record patterns + exhaustive switch. No instanceof chains, no visitors, no enum faking ADTs.
Week 2 - Build Tools, Dependencies, and JPMS¶
Take any small library. Modularize it: write module-info.java, run jdeps to find required modules, jlink --add-modules <list> --output runtime/ to build a custom runtime. Measure size before (du -sh $JAVA_HOME) vs after (du -sh runtime/). Then mvn install to a local repo and consume from a separate Gradle project via mavenLocal().
Week 3 - Collections, Streams, and java.time¶
Take a CSV file of timestamped events. Compute per-hour aggregates two ways:
1. Stream + Collectors.groupingBy(e -> e.timestamp().truncatedTo(HOURS), Collectors.counting()).
2. Explicit for loop + HashMap<Instant, Long>.
JMH them in Week 8. Also measure peak memory (-Xlog:gc*=info + young-gen allocation rate).
Week 4 - Testing, Logging, and the Definition-of-Done¶
Take Week 3's CSV aggregator (or any small evaluator). Write:
- 10 unit tests (JUnit 5 + AssertJ).
- 3 parameterized tests covering edge cases (empty input, single row, malformed timestamp).
- 1 property-based test (@ForAll List<Event> events -> aggregator.process(events).size() <= events.size()).
- Structured JSON logging at boundaries (input received, output produced).
Week 5 - Class Loading and Bytecode¶
Take a 10-line Java method. Compile it. Read the javap -v -p output line by line. Then build a class with the same bytecode using the Class-File API, load it via a custom ClassLoader, invoke via reflection, confirm output matches.
Week 6 - The JIT: C1, C2, Graal, Tiered Compilation¶
Write a polymorphic dispatch site (Shape.area() over 1 / 2 / 3 implementations). Run with -XX:+PrintInlining for each cardinality. Observe monomorphic → bimorphic → megamorphic transitions. Reproduce a deopt by introducing a 4th type after warmup.
Week 7 - Method Handles, VarHandles, and Reflection¶
Build a tiny DI container (~150 lines): scan a package for @Inject constructors, topologically sort by dependencies, instantiate with MethodHandle (lookup.unreflectConstructor(ctor).invokeExact(deps)). JMH vs Constructor.newInstance(deps). Expect MethodHandle ~5-10× faster after warmup. Stretch: use LambdaMetafactory to get within 1.5× of a direct call.
Week 8 - JMH and Microbenchmarking¶
JMH the Week 3 Stream vs for-loop comparison. Skeleton:
@State(Scope.Benchmark)
public static class Inputs {
@Param({"100", "10000", "1000000"}) int n;
List<Event> events;
@Setup public void setup() { events = generate(n); }
}
@Benchmark public Map<Hour, Long> streamWay(Inputs in) { /* ... */ }
@Benchmark public Map<Hour, Long> forWay(Inputs in) { /* ... */ }
-prof gc. There IS a crossover - find and explain it.
Week 10 - The Generational Hypothesis and the Modern GCs¶
Take a sample allocation-heavy app (a small JSON parser benchmark works). Run it under all four major GCs with identical heap. Collect logs with -Xlog:gc*=info:file=gc.log:time,uptime. Plot pause times.
Week 11 - Heap Sizing, GOMEMLIMIT-Equivalents, and Container Awareness¶
Take your week-10 app. Run it in a 1GB container with default flags, observe the headroom. Then explicitly size every memory pool and run again. Measure RSS over time.
Week 12 - JFR, Heap Dumps, and Allocation Profiling¶
Write a deliberate memory leak (a static Map that accumulates request contexts). Run it, take a heap dump after some traffic, identify the leak in MAT. Then fix it, re-run, re-dump, confirm.
Week 9 - Object Layout, Headers, and Cache Effects¶
Use JOL to measure the size of: an empty object, a String, a HashMap with 0/1/10/100 entries, an ArrayList vs LinkedList of 1000 Integers. Predict each before measuring.
Week 13 - The Java Memory Model and java.util.concurrent Foundations¶
Implement a single-producer single-consumer ring buffer two ways: with synchronized+wait/notify, and with VarHandle + acquire/release. JMH them. Run with -XX:+PrintAssembly (HotSpot debug build, or use the hsdis plugin on a normal build) and find the membar instructions.
Week 14 - Executors, CompletableFuture, and the Pre-Loom World¶
Implement the same web-scraper-with-fan-out three ways: blocking ExecutorService, CompletableFuture chain, Reactor Flux.flatMap. Compare lines of code, readability, and error handling. Keep them; you will re-do the same task with virtual threads next week.
Week 15 - Virtual Threads, Structured Concurrency, and Scoped Values¶
Redo week 14's web scraper with Executors.newVirtualThreadPerTaskExecutor() + StructuredTaskScope. Compare lines of code and readability to the three previous versions. Stress to 100k concurrent requests. Watch with JFR's virtual-thread events.
Week 16 - Lock-Free Patterns, VarHandle Memory Modes, and jcstress¶
Implement a Treiber stack with VarHandle.compareAndSet (single VarHandle on the head pointer). Write three jcstress tests:
1. Linearizability - concurrent push + pop produces an ordering consistent with some serial schedule.
2. No lost pops - every pushed element is popped exactly once.
3. ABA exposure - under contention, a pop-then-push cycle can corrupt a CAS; document the scenario even if you don't fix it (the standard fix is hazard pointers or versioned pointers).
Run under all available -m modes (default, sequential consistency, relaxed).
Week 17 - Spring Boot 3 and Quarkus¶
Build the same small REST service (book CRUD, JSON over HTTP, Postgres backend) in both Spring Boot 3 and Quarkus. Measure:
- Build time (time ./mvnw package)
- Image size (du -sh target/quarkus-app vs the Spring jar)
- Cold start (time-to-first-response after launch)
- Warm p99 latency under load (k6 or wrk, 100 RPS for 60s)
- RSS at steady state (ps -o rss -p $(pgrep -f yourapp))
Document the trade in a one-pager.
Week 18 - Observability: Logs, Metrics, Traces¶
Wire all three pillars into your Week 17 service:
- OTel agent: download opentelemetry-javaagent.jar, set OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317.
- Micrometer Prometheus: add micrometer-registry-prometheus, expose /actuator/prometheus.
- Logback JSON: replace the default text layout with LogstashEncoder + %X{trace_id} in the pattern.
Bring up a local stack via docker-compose: Prometheus + Tempo (traces) + Loki (logs) + Grafana. Generate load with k6 run script.js or wrk -t4 -c100 -d60s. In Grafana, find a slow request: open its trace, click through to the matching log lines via trace ID.
Week 19 - Persistence, RPC, and Resilience¶
Add to your week-17 service: a Postgres backend via Spring Data JDBC + Flyway, a downstream gRPC dependency (a fake "pricing service") with a Resilience4j circuit breaker, and 95th-percentile latency-bound retries. Chaos-test by killing the downstream and watching the breaker behavior in metrics.
Week 20 - Containers, Native Images, and Deployment¶
Produce four images of your service: plain JRE Dockerfile; jlink-trimmed; Buildpacks; native-image. Tabulate size, cold start, warm p99, RSS. Pick a winner for an explicit deployment profile (e.g., "always-on internal API" vs "scale-to-zero per-request webhook").
Kubernetes¶
Week 1 - etcd and the Raft Consensus Foundation - etcd, Up Close¶
- Bring up a 3-node etcd cluster locally (
etcdbinaries, no Kubernetes yet). Configure peer/client URLs. - Use
etcdctlto put/get keys; observe consistent reads. - Kill the leader. Use
etcdctl endpoint status --clusterto identify the new leader within seconds. - Use
etcdctl watch /foofrom one terminal; put values from another. Internalize the watch model. - Use
etcdctl --command-timeout=60s defragto compact + defragment. Observe disk-usage drop.
Week 2 - The kube-apiserver - Read the Pipeline¶
- Use
kubectl --v=8to dump the wire-level request/response of akubectl apply. Read it carefully. - Use
kubectl get --rawto hit/apis/,/api/v1,/apis/apps/v1and see the discovery surface. - Configure the apiserver to log all requests with - -audit-policy-file=audit.yaml`. Apply a few changes; read the audit log.
- Write a tiny mutating webhook (in Go, using
controller-runtime's webhook facilities) that adds a label to every Pod. Deploy and verify.
Week 3 - The Scheduler - Scheduler in Action¶
- Use
kubectl describeon a pending Pod to see filter/score reasons. - Set a Node taint (
kubectl taint nodes node1 key=value:NoSchedule); observe new Pods avoid it. - Define
PriorityClasses (high,default,batch); deploy mixed-priority Pods; trigger preemption by oversaturating. - Write a custom scheduler plugin (a tiny score plugin) using the scheduler framework. Configure your scheduler binary; run it. Verify selection difference vs default.
Week 4 - Built-in Controllers and client-go Foundations - Read the Deployment Controller¶
- Read
pkg/controller/deployment/deployment_controller.goend-to-end (~1500 lines). - Trace a
kubectl rolloutthrough the source: which conditions are checked, which fields updated, what triggers the next loop iteration. - Reproduce a stuck-rollout scenario (deploy a bad image); observe
Progressing=Falseafter the deadline; inspect status conditions. - Manually scale a Deployment to 0 with
kubectl scale; trace what the controller does in response.
Week 5 - Kubelet Internals - Kubelet Forensics¶
- SSH to a node.
journalctl -u kubelet -fand trigger a Pod creation. Watch the log. crictl ps,crictl pods, `crictl inspect - operate at the CRI layer directly.- Place a static pod manifest; observe kubelet picking it up.
- Trigger a memory eviction by setting low
evictionHardand oversubscribing. Read the eviction event and the kubelet's decision.
Week 6 - CRI: kubelet ↔ Runtime - CRI Direct¶
- From a node,
crictl pull alpine; crictl runp pod-config.json; crictl create <pod-id> ctr-config.json img-config.json; crictl start <ctr-id>. You've launched a pod-equivalent without the apiserver. - Compare with kubectl deploying the same: trace each CRI call in the kubelet log.
- Configure containerd with multiple runtimes (runc + runsc); register both as
RuntimeClasses; deploy Pods against each.
Week 7 - kube-proxy, Services, and the Networking Dataplane - Service Path¶
- Create a Service + Deployment. From a Pod,
curl <service>.<ns>.svc.cluster.local. Trace the DNS lookup (CoreDNS) and the iptables/IPVS rules that DNAT. - Switch kube-proxy to IPVS mode (
mode: ipvsin kube-proxy config). Verify withipvsadm -L -n. - Install Cilium with
kubeProxyReplacement=true. Observe kube-proxy not running. Verify Service connectivity still works. - Compare per-packet latency under each mode with a small benchmark.
Week 8 - CSI, Storage, and Device Plugins - Storage Hands-On¶
- Install a local-path CSI driver (
rancher/local-path-provisionerworks for kind). Create a PVC; observe binding. - Take a snapshot; restore to a new PVC.
- Author a mock device plugin that exposes 4 instances of a fake resource. Deploy a Pod requesting it; verify scheduling and resource accounting.
- Read the CSI proto. Diagram the provision + attach + mount flow on paper.
Week 10 - controller-runtime and Kubebuilder - Rebuild Week 9 in controller-runtime¶
Take week 9's mirror controller; rebuild with kubebuilder + controller-runtime. Compare LOC and verbosity. The framework should save substantial code.
Week 11 - CRDs: Schema, Versioning, Validation - A Well-Versioned CRD¶
- Define a CRD with v1alpha1.
- Add validation, defaults, status conditions, printer columns.
- Add a v1beta1 with renamed fields and a conversion webhook between them.
- Verify round-trip:
kubectl get -o v1alpha1then - o v1beta1` returns identical content.
Week 12 - Operator Patterns: Finalizers, External Resources, Multi-Cluster - An Operator That Manages an External Resource¶
Build an operator with a GitHubRepo CRD: spec includes a repo name and visibility; the controller calls the GitHub API to create/update/delete the repo to match. Includes:
- Authentication via a Secret referenced by the CR.
- Finalizers for cleanup.
- Status conditions: Ready, Synced, Error with reasons.
- Rate-limited reconciles with exponential backoff.
- E2E test using a fake GitHub API server.
Week 9 - client-go Internals and a Bare Controller - Controller From Scratch¶
Build a controller that watches ConfigMaps with the label mirror=true and copies them into every namespace whose name matches a configurable prefix.
- Use client-go informers + workqueue directly.
- Add leader election.
- Idempotent: same input twice produces same result.
- Handle deletions: when the source is deleted, delete all mirrors.
- Run as a Deployment in the cluster.
Week 13 - The CNI Spec and Pod Networking - Read a CNI's Source¶
- Pick a simple CNI (
flannelor the referencebridgeplugin fromcontainernetworking/plugins). Read itscmdAddend to end. - Deploy a small kind cluster; trace a Pod creation in the kubelet log; correlate with the CNI binary invocation.
- Use
nsenter -t <pause-pid> -n ip ato inspect the container's network namespace from the host.
Week 14 - Cilium and eBPF Networking - Install and Drive Cilium¶
- Install via Helm with:
kubeProxyReplacement=true,hubble.enabled=true,hubble.relay.enabled=true,hubble.ui.enabled=true,encryption.enabled=true,encryption.type=wireguard. - Use the Hubble UI (
cilium hubble ui) to visualize pod-to-pod traffic in real time. - Author L4
NetworkPolicy(standard k8s API); test enforcement with a denied + allowed flow. - Author an L7
CiliumNetworkPolicy(e.g., allow onlyHTTP GET /api/*from frontend → backend); test enforcement. - Enable Cilium Service Mesh; observe sidecar-free mTLS between two test services.
Week 15 - Service Meshes: Istio, Linkerd, Cilium Service Mesh - Three Meshes¶
- Install Istio in ambient mode on a test cluster. Apply a
VirtualServicethat does 90/10 canary routing. Verify with Hubble or Kiali. - Repeat with Linkerd. Compare install footprint, configuration ergonomics, and observability quality.
- (If running Cilium) enable Cilium Service Mesh. Compare again.
- Document tradeoffs: install effort, per-Pod overhead, feature gaps.
Week 16 - CSI at Scale: Snapshots, Backup, Cloning - Backup and Restore¶
- Install Velero against a MinIO bucket.
- Schedule a daily backup of one namespace.
- Delete the namespace; restore from backup; verify Pods come back, PVs reattach, data intact.
- Create a stateful workload (Postgres via an operator); test snapshot + clone flow for fast dev/test environment provisioning.
Week 17 - GitOps: ArgoCD and Flux - Two GitOps Stacks¶
- Install ArgoCD. Set up an
Applicationfor a small app from a git repo. Verify auto-sync and auto-prune. - Install Flux. Set up the equivalent. Compare ergonomics.
- Use
ApplicationSet(Argo) to deploy the same app to three environment overlays (dev,staging,prod). Verify per-environment configuration via Kustomize overlays.
Week 18 - IaC From Within K8s: Crossplane and Terraform - Self-Service Database¶
- Install Crossplane. Install
provider-aws(orprovider-gcp). - Configure provider credentials.
- Define an XRD
XDatabasewith parameters:size,engine,version,region. - Define a Composition that materializes an RDS instance + a Secret with credentials.
- As an "app team" persona, create a
Databaseclaim. Watch it become a real RDS instance. Delete; watch it be torn down.
Week 19 - HPA, VPA, KEDA: Autoscaling - Autoscale on Custom Metrics¶
- Deploy a load-test target with a Prometheus-exposed
requests_per_secondmetric. - Install
prometheus-adaptermapping that metric tocustom.metrics.k8s.io. - Author HPA targeting
AverageValue=200of that metric. Drive load; watch scaling. - Add KEDA in front for scale-to-zero behavior. Verify cold-start latency.
Week 20 - Admission Control: Webhooks, OPA Gatekeeper, Kyverno - Three Policy Layers¶
- Apply Pod Security Admission per-namespace:
restrictedeverywhere except aprivnamespace. - Author 5 Gatekeeper Constraints: require resource limits, forbid
latesttags, enforce non-root, label-required, namespace-must-have-team-label. - Author equivalents in Kyverno. Compare expressiveness.
- Run in audit-mode for a week against a pre-existing cluster; triage findings before enforcing.
Week 21 - Bootstrap: VMs, Certificates, etcd - Bring Up etcd¶
- Provision 3 VMs labeled
etcd-{1,2,3}. - Generate CA + per-node certs.
- Install etcd binaries; configure systemd units with mTLS.
- Bring up; verify
etcdctl member listshows healthy quorum. - Take a snapshot. Restore on a separate test machine.
Week 22 - Control Plane and Worker Nodes - Cluster Live¶
- Bring up 3 control-plane nodes; HAProxy in front.
- Bring up 3 workers; join via bootstrap tokens.
- Install Cilium; verify Pod-to-Pod connectivity.
- Install CoreDNS; verify Service DNS works.
- Smoke test: deploy a sample app + Service + Ingress; verify end-to-end.
Week 23 - RBAC, Multi-Tenancy, mTLS Everywhere - Onboard a Tenant¶
- Author a tenant Composition (Crossplane) or Helm chart that, given
{tenant: "acme"}, materializes everything in §23.2. - Onboard
acme. Have a "tenant developer" persona deploy an app via GitOps. - Verify isolation: from
acme's namespace, can you read another tenant's secrets? Pods? Logs? Each should fail.
Week 24 - Defense, Documentation, and the Capstone Demo - Defend the Cluster¶
Schedule a 60-minute mock review. Demo: 1. The architecture diagram. 2. Provisioning (Ansible/Terraform/Crossplane). 3. Tenant onboarding from request to running app. 4. Failure injection: kill a control-plane node; show cluster recovery. 5. Observability: trace a request from ingress through service mesh to backend, with metrics, logs, and trace ID correlation. 6. Backup + restore.
Linux Kernel¶
Week 1 - Boot, Init, Systemd - A Hardened Echo Service¶
- Write a tiny C program that listens on a Unix socket and echoes input. Static-link with - static`.
- Write a
echo.socketandecho.servicepair using socket activation. - Apply every hardening directive that is plausible for an echo server. Run
systemd-analyze security echo.serviceand aim for a score under 1.0. - Verify isolation: from inside the service (debug via
systemd-run --shell --unit=echo.service), confirmProtectSystemmakes/usrread-only.
Week 2 - Syscalls and the Kernel/Userspace Boundary - Syscall Forensics¶
- `strace -c ls /etc - produce a count summary of syscalls. Predict the top 5; verify.
- Implement
catin pure C using onlyopen,read,write,close. No libc helpers (syscall(SYS_open, ...)). - Run under
strace -fto verify zero unexpected calls. - Build a minimal
seccompallowlist for yourcat, allowing only the syscalls actually used. Verify it kills attempts to invoke other syscalls.
Week 3 - The Virtual File System (VFS) - VFS Forensics¶
- Catalogue every entry in
/proc/<pid>/for one of your processes. Document what each gives. - Read
/proc/<pid>/mapsand explain every region (text, heap, stack, vdso, vvar, shared libs). - Use
eBPF'svfs_openkprobe (viabpftrace) to log everyopensystem-wide for 5 seconds. Triage the noise. - Mount
tmpfsat a custom path, fill it, and observe the allocator behavior in/proc/meminfo(Shmem).
Week 4 - Processes, Threads, and Signals - Process Forensics¶
- Write a C program that forks 4 children, each computing for 5 s. Use
ptraceorstrace -fto observe all four. - Add a signal handler that catches
SIGTERMand logs cleanly to all children before exit. - Reproduce a classic bug: a parent that ignores
SIGCHLDand a child that exits, producing zombies. Verify withps -ef | grep defunct. - Convert to
signalfd+ `epoll - the modern signal-handling pattern that integrates with event loops.
Week 5 - Virtual Memory, Paging, and the Page Cache - Memory Forensics¶
- Run
vmstat 1andfree -hwhile loading a 4-GB file withcat file > /dev/null. WatchCachedgrow. echo 3 > /proc/sys/vm/drop_cachesand observe the eviction.mmapa large fileMAP_PRIVATE, write to it, observeAnonHugePagesand the COW behavior in/proc/<pid>/smaps.- Configure
vm.nr_hugepages=512(1 GiB of 2 MiB pages). Allocate viaMAP_HUGETLB. Measure the latency-distribution change vs default pages.
Week 6 - Swapping, OOM, Memory Pressure (PSI) - Pressure and the OOM Killer¶
- Write a memory-eater program. Run inside a
memory.high=512Mcgroup. Observepressure.memoryrise. - Push past
memory.max; watch the OOM killer. Checkdmesgandjournalctl -k | grep -i 'killed process'. - Set
oom_score_adj=-500on a critical process; verify it survives an OOM event triggered by another, lower-priority hog. - Measure PSI under realistic load: capture
pressure.memoryevery second for 5 minutes during a workload spike. Plot.
Week 7 - The CPU Scheduler (CFS, EEVDF) - Scheduler Forensics¶
- Run two CPU hogs at
nice 0. Observe split CPU. Lower one tonice 19, verify ~95/5 split. - Use
bpftrace -e 'tracepoint:sched:sched_switch { @[comm] = count() }'to see context-switch rates. - Pin a workload to specific CPUs with
taskset -c 0,1. Compare cache-miss rate vs unpinned withperf stat. - Place two services in cgroups with
cpu.weight=100andcpu.weight=1000. Verify the 10:1 split under contention.
Week 8 - Disk I/O Scheduling, Filesystems Beyond ext4 - I/O Forensics¶
- Run
fiowith a representative workload. Measure baseline. - Toggle the I/O scheduler. Re-run. Compare.
- Use
bpftrace -e 'tracepoint:block:block_rq_issue { @[args->comm] = count() }'to see who's hitting the disk. - Mount with vs without
noatimeand measure metadata-write traffic difference.
Week 10 - Control Groups v2 - Multi-Tenant Cgroups¶
- Create three sibling cgroups:
tenant-a,tenant-b,tenant-cunder/sys/fs/cgroup/test/. - Set
cpu.weight100/200/400 - under contention (runstress-ng --cpu Nin each), verify the 1:2:4 split withtop. - Set
memory.high=1G memory.max=2Gon each, run a memory hog (stress-ng --vm 1 --vm-bytes 3G), observe throttling first (memory.events.highticks, latency increases) then OOM (memory.events.oom_killticks). - Set
io.maxto limit disk bandwidth on a specific device for one cgroup; runfioinside, verify withiostat -x 1.
Week 11 - eBPF: Foundations - First eBPF Tools¶
- Install
bpftrace. Runbpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)) }'and watch the system-wide open trace. Triage. - Write a
bpftracescript that histogramsread()syscall sizes by process. - Convert one of the recipes to
libbpfC + a userspace consumer usinglibbpf-bootstrapas the template. - Read 10 of Brendan Gregg's
bpftracerecipes (runqlat.bt,tcpaccept.bt,vfsstat.bt, etc.) and run them. Document each.
Week 12 - eBPF in Production: Observability Tools - Build a Production-Grade eBPF Tool¶
Write connsnoop:
- Hooks tcp_v4_connect and tcp_v6_connect (kprobe), inet_csk_accept (kretprobe), tcp_close.
- Records per-connection: 5-tuple, PID, process name, duration, bytes-tx/rx.
- Aggregates in-kernel via per-CPU hash maps, ships completion events through a ring buffer.
- Userspace consumer in C (with libbpf) or Go (with cilium/ebpf). Outputs JSON.
- Verifier-clean, CO-RE-portable across kernels 5.10+.
Week 9 - Namespaces - Hand-Built Container¶
Write a C program that:
1. clone()s with CLONE_NEWUSER | CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWNET | CLONE_NEWCGROUP.
2. Configures UID/GID mappings via /proc/<pid>/uid_map and gid_map.
3. Creates a veth pair to give the namespace network access.
4. pivot_roots into a minimal Alpine rootfs.
5. execves /bin/sh.
You should now have a working terminal "inside" a "container" that you wrote in ~150 lines of C.
Week 13 - The Network Stack: Sockets, NAPI, conntrack - Packet Forensics¶
- `tcpdump -i any -nn -X 'tcp port 443' -c 10 - capture and dissect TLS handshake bytes.
- Trace a TCP connection's lifecycle with
bpftrace'stcplife.bt. - Set up a gratuitous DROP rule with
iptables -I INPUT -p icmp -j DROPand verify withping. Remove. Repeat withnft. - Inspect conntrack:
cat /proc/net/nf_conntrackwhile a long-lived connection is open.
Week 14 - Netfilter / nftables / iptables, IPVS - Build a Stateful Firewall and a Load Balancer¶
- Convert an existing iptables ruleset to nftables. Verify equivalence with packet probes.
- Set up IPVS-DR: VIP with two real servers; load test with
wrk. Compare with HAProxy on the same setup. - Saturate the conntrack table on purpose; observe
nf_conntrack: table full, dropping packetin dmesg. Tunenf_conntrack_max.
Week 15 - XDP and AF_XDP - An XDP DDoS Scrubber¶
Write an XDP program that:
- Drops UDP packets with source port < 1024 (a coarse DDoS-vector heuristic).
- Counts dropped packets per source IP in an LRU-hash map (1M entries).
- Userspace tool reads the map every second and emits Prometheus metrics.
- Test with pktgen or trafgen. Measure throughput and CPU overhead.
Week 16 - Bridges, VLANs, OVS - Three Network Topologies¶
- Two namespaces connected via a Linux bridge: classic container networking.
- Two namespaces on tagged VLANs sharing one bridge.
- The same topology in OVS, with explicit OpenFlow rules.
For each, verify connectivity with ping, capture with tcpdump, document the difference.
Week 17 - Discretionary and Mandatory Access Control - MAC for an Echo Service¶
Take week 1's echo service. Author:
1. An SELinux type-enforcement module that allows it to bind its socket and read its config but nothing else.
2. An equivalent AppArmor profile.
Verify with deliberate violations (try to read /etc/shadow)-both should deny and audit.
Week 18 - Capabilities, Seccomp, no_new_privs - Capabilities and Seccomp¶
- Convert a service that runs as root to one that runs as a non-root user with only the minimum capabilities.
- Author a seccomp policy using libseccomp that allows only the syscalls the service uses. Verify by attempting denied syscalls.
- Apply via systemd
SystemCallFilter=and confirm.
Week 19 - Encryption at Rest: LUKS, dm-crypt, dm-verity - Encrypt a Disk End to End¶
- Create a LUKS2 volume on a spare disk or loopback file.
- Format with ext4. Mount.
- Add a TPM2-bound key slot. Enroll a recovery passphrase.
- Configure auto-unlock at boot via
crypttab. - Simulate disk theft: dump the device contents; verify they are opaque without the key.
Week 20 - Audit, Integrity Measurement, and Compliance - An Audited Host¶
- Configure auditd with a baseline ruleset.
- Trigger expected events (failed
su, edit of/etc/passwd); verify logs. - Run `lynis audit system - record the score and address the top 5 findings.
- (Optional) Boot with IMA enabled; measure the impact on boot time and observe
/sys/kernel/security/ima/ascii_runtime_measurements.
Week 21 - Loadable Kernel Modules (LKM) - A Character Device LKM¶
Write pkv, a simple in-kernel key/value character device:
- /dev/pkv accepts writes of the form key=value\n and reads return the value for the last-written key.
- 100 KV slots, in-kernel hash table.
- Concurrency-correct under multiple writers/readers (use a mutex; pursue an rwlock variant as a stretch).
- ioctl operations for LIST and DELETE.
- KUnit tests in tree.
- Loads/unloads cleanly with no lockdep or KASAN warnings (turn both on in your test kernel).
Week 22 - Tracing and Performance Mastery: ftrace, perf, BPF - End-to-End Profiling¶
- Take a service running on a host. Capture:
perf record -F 99 -ag -- sleep 30. - Generate a flamegraph.
- Identify the top three CPU consumers; for each, propose a hypothesis and a fix.
- Compare with the same workload profiled by
parcaorpyroscopeif available.
Week 23 - Performance Tuning at Scale - Triage Drill¶
A scripted "broken host" is provided (or build one): a VM with one of {disk-bound, memory-bound, network-bound, lock-contended, scheduler-thrashing} pathologies. Diagnose using only the tools above. Document the inference chain. Then introduce a fix and verify.
Week 24 - Capstone Integration & Defense - Defend the Host¶
Schedule a 45-minute mock review with a peer. Walk through: the host's threat model, the capstone artifact, the observability story, and a live demo of triaging a fault. Defend every choice-cgroup policy, LSM type, sysctl values, auditd rules.
Python Mastery¶
Week 1 - Syntax, Values, Names, and the Data Model - The REPL Audit¶
- In an interactive session, evaluate:
a = [1,2,3]; b = a; b.append(4); print(a). Explain in writing. x = 256; y = 256; x is y→ True.x = 257; y = 257; x is y→ may be False. Explain.- Write a class
Moneywith__init__,__repr__,__eq__,__hash__,__lt__. Verify it sorts and deduplicates in aset. Addfunctools.total_ordering; observe what disappears. - Write a class
Vector2with__add__,__sub__,__mul__(scalar),__rmul__,__abs__,__iter__,__len__. Verify2 * vworks andlist(v)works.
Week 2 - Control Flow, Functions, Errors, and the Call Model - The Calculator and the Cancel¶
- Build a tiny expression evaluator over
+ - * /usingast.parse+ a customNodeVisitor. Reject anything else. (Do not useeval.) - Add a
--replmode. MakeCtrl-Cinterrupt the current expression but not exit. MakeCtrl-Dexit cleanly. - Wrap division; raise a custom
EvalErrorchained fromZeroDivisionErrorviafrom. - Add a
--time-budgetflag usingsignal.SIGALRM(POSIX) or a watchdog thread (cross-platform). Document the trade-off.
Week 3 - Collections, Comprehensions, Iterators, and Generators - Streaming Word Count¶
- Implement
wc -wover arbitrarily large files using a generator pipeline: file → lines → words → counts. Constant memory regardless of file size. - Add a
--top Kflag usingheapq.nlargest. Note that you must materialize the counter - discuss why. - Replace your hand-rolled tokenizer with
re.finditerand benchmark. Then benchmark astr.split()version. Explain the difference. - Add a
--parallel Nflag usingconcurrent.futures.ProcessPoolExecutoranditertools.batched. (We will revisit in Month 4.)
Week 4 - Modules, Packaging, Virtual Environments, and the Import System - Ship a CLI¶
- Build a CLI tool - e.g., a Markdown table of contents generator. Project layout:
src/toctool/{__init__.py,__main__.py,cli.py,core.py},tests/,pyproject.toml. - Configure
[project.scripts] toctool = "toctool.cli:main". Verifypipx install .makestoctoolavailable system-wide. - Add a
[project.optional-dependencies] dev = [...]group.uv sync --extra devinstalls the dev tools. - Tag
v0.1.0. Build wheel + sdist withuv build. Inspect the wheel withunzip -l. Confirm no test files leaked in. - (Optional, sets up later weeks) Publish to TestPyPI.
Week 5 - Object Model Deep Dive: Classes, Descriptors, Metaclasses - Build a Tiny ORM¶
- Implement a
Fielddescriptor with type validation and adefault.class User: name = Field(str); age = Field(int, default=0). - Use
__init_subclass__to collect declared fields intocls._fields. Auto-generate__init__and__repr__. - Compare your hand-rolled version to
@dataclass(slots=True). Note where dataclass is better (PEP-595 ordering,__eq__,__hash__). - Implement a
RegistryMetametaclass that records every subclass in a class-level dict. Then re-implement using__init_subclass__. Defend the simpler version in writing.
Week 6 - Decorators, functools, and contextlib - The Retry Decorator That Doesn't Lie About Its Type¶
- Write
@retry(times=3, on=(IOError,), backoff=0.1). Make it work on both sync and async functions (detect withasyncio.iscoroutinefunction). - Use
ParamSpecso thatpyright --strictpreserves the wrapped signature. - Add structured logging on each retry. Add a
tenacity-style backoff strategy (constant, exponential, jittered). - Compare to
tenacitylibrary; document where yours is simpler / worse / better.
Week 7 - Dataclasses, attrs, Pydantic, and the Validation Boundary - The Three-Layer Cake¶
- Build an HTTP service (FastAPI, but kept small):
- Boundary layer: Pydantic
RequestModel/ResponseModel. - Domain layer:
@dataclass(slots=True, frozen=True)value objects. - Persistence layer:
TypedDictrows fromsqlite3. - Write explicit converters between each layer. Resist the urge to make them the same type.
- Benchmark a 10k-request loop with Pydantic v1 (if installed) vs. v2. Document the 10x.
Week 8 - The Type System: Generics, Protocols, Variance, and typing.* - Make Pyright Strict¶
- Take a 500-LOC module of your existing code. Run
pyright --strict. Resolve every error. - Add a
Protocolfor a "thing-with-an-id" and refactor a function that previously tookAny. - Use
TypeIsto narrowdict | listreturned fromjson.loadsinto safe shapes for downstream use. - Where you find yourself reaching for
cast, document why and consider whether the boundary belongs at a Pydantic model.
Week 10 - Memory: Refcounts, Cyclic GC, the pymalloc Allocator - Find the Leak¶
- Write a service that has a deliberate leak: an unbounded
dictcache, a leaking closure, and a circular reference with a__del__. Run undermemrayandtracemalloc. Identify each leak from the output. - Bound the cache with
functools.lru_cache(maxsize=...). Confirm withmemraythat growth flatlines. - Profile a NumPy-heavy workload. Observe that pymalloc and Python refcounts are largely unused - most memory is in NumPy buffers. Internalize: "NumPy is a different memory world."
Week 11 - The GIL, Free-Threaded Python, and the Concurrency Model - GIL Awareness¶
- Compute primes up to 1M three ways: (a) single thread, (b)
threadingwith 8 threads, (c)multiprocessingwith 8 procs. Bench all three on stock CPython. - Run (b) on
python3.13t(free-threaded). Compare. - Replace the prime-test inner loop with a NumPy expression. Re-run (b) on stock CPython. Note the GIL-release effect.
- Capture
py-spy recordflame graphs for each. Identify GIL contention visually.
Week 12 - The Optimization Ladder: Algorithm → Vectorize → Native → JIT - Climb the Ladder¶
Take a deliberately slow workload - e.g., compute pairwise cosine similarity between 10k 768-dim vectors with a pure-Python triple loop. Time it. Then climb:
1. Algorithmic: skip pairs already computed.
2. Vectorize: NumPy batched matmul with norm.
3. Cython rewrite of the inner kernel.
4. Numba @njit on the same.
5. (Stretch) Rust + PyO3 implementation.
6. Compare to faiss / hnswlib.
Tabulate speedups in NOTES.md. The lesson is that step 2 usually wins by 100x and step 3+ by ~2x more - but step 6 (use the right library) wins by 1000x. Algorithm > implementation > tuning.
Week 9 - The CPython VM: Objects, Bytecode, the Eval Loop - Bytecode Forensics¶
- Write three implementations of "sum of squares": a
forloop, asum()+ genexp, andnumpy.dot(a, a).dis.diseach. Benchmark withtimeit. Explain the gap. - Take a function with a global lookup in its hot loop. Refactor to a default-argument cache. Re-bench. Quantify the win.
- Use
sys.setprofileto count opcode-level events on a small program. Compare counts before and after warm-up to observe specialization.
Week 13 - asyncio Foundations: Event Loop, Tasks, Coroutines - The Crawler That Doesn't Lie¶
- Build an async HTTP crawler with
httpx.AsyncClientand aTaskGroup. Limit concurrency with aSemaphore(N). - Add a 5-second per-request timeout using
asyncio.timeout. Verify cancellation propagates cleanly to thehttpxrequest. - Inject a deliberately blocking
time.sleep(2)somewhere. Detect it withasyncio.get_event_loop().slow_callback_duration = 0.1and the resulting log warnings. - Replace the blocker with
asyncio.sleep. Confirm viapy-spy dumpthat the loop never stalls.
Week 14 - Structured Concurrency, Cancellation, ExceptionGroups, anyio - The Fan-Out That Cleans Up After Itself¶
- Refactor your week-13 crawler to use
TaskGroup(oranyiotask group). - Add a "first-error wins" mode: as soon as any task raises, all siblings are cancelled and the group raises an
ExceptionGroup. - Add a "best-effort" mode: collect all results and exceptions, return both.
- Verify via test that cancelling the parent cancels every in-flight HTTP request within 100ms.
Week 15 - Threads, Processes, Subinterpreters, concurrent.futures - Pick Your Parallelism¶
For each workload, pick a model and justify: 1. Compress 10k JPEGs in parallel. 2. Run 10k HTTP requests against an external API (rate-limited). 3. Compute SHA-256 of 10k 1MB blobs. 4. Train 10 small models concurrently sharing a GPU.
Implement at least two of them three ways: threads, processes, asyncio. Bench. Write up the right answer.
Week 16 - Native Extensions, Releasing the GIL, FFI - Write the Hot Kernel in Rust¶
- Take the cosine-similarity workload from week 12. Implement it in Rust with PyO3.
- Use
py.allow_threads(|| ...)around the SIMD loop. Verify with a PythonThreadPoolExecutor(8)that you get ~8x speedup. - Compare to NumPy and to your Cython version. Write up the cost in code complexity.
- Bonus: expose a
Vector#[pyclass]and benchmark crossing the FFI per-call vs. per-batch. Internalize the per-call FFI cost.
Week 17 - Pythonic Design Patterns - Refactor a Junk Drawer¶
- Take a 1k-LOC script of mixed responsibilities. Extract:
domain/,adapters/,service/,entrypoints/. WriteProtocols for the seams. - Add a fake repository for tests; the real one talks to SQLite. Run the same test suite against both.
- Document, in a
docs/architecture.md, why each module exists and what it depends on.
Week 18 - Data Structures Beyond list/dict - Right Tool for Right Workload¶
- A leaderboard with frequent insert + top-K query: implement with
list(naive),heapq(better),SortedList(best). Bench at 10k/100k/1M elements. - A rolling-window deduplicator:
set(memory-unbounded), Bloom filter (memory-bounded, false positives),cachetools.TTLCache. Pick one with justification. - A nearest-neighbor lookup over 1M 768-dim vectors: brute-force NumPy,
hnswlib,faiss. Note recall/latency trade-offs.
Week 19 - Testing, Property-Based Testing, Mutation Testing, Fakes vs. Mocks - The Tests Find Bugs You Didn't Know You Had¶
- Add
hypothesisproperty tests to your week-3 word counter. Watch them find a UTF-8 boundary bug or an empty-input issue. - Add a stateful
hypothesistest against your tiny ORM from week 5. - Run
mutmut. Identify untested branches. - Replace any
Mockyou used with a fake implementing aProtocol.
Week 20 - Observability, FastAPI, Production Service Shape - Production-Shaped Service¶
Build a FastAPI service that:
1. Accepts a POST /jobs, persists to SQLite, returns a job ID.
2. Processes jobs in an asyncio.TaskGroup background worker with bounded concurrency.
3. Emits structured JSON logs with trace correlation.
4. Exposes /metrics (Prometheus) and /healthz//readyz.
5. Handles SIGTERM by draining in-flight jobs.
6. Runs under uvicorn with --workers 4 (multi-process). Document why workers > 1 for CPU-light I/O-bound services on stock CPython.
7. Has a docker-compose stack including Prometheus, Grafana, and Jaeger.
8. Has a k6 or locust load test in loadtest/ reproducing the latency SLO.
Week 21 - LLM-App Foundations: Prompts, Tokens, Streaming, Cost - A Disciplined LLM Client¶
- Build an
LLMClientabstraction overanthropicandopenaiasync SDKs. Methods:generate,stream,with_tools. - Add token accounting: pre-call estimate, post-call actual, running cost meter.
- Add caching headers (Anthropic prompt caching). Measure latency delta.
- Add structured-output mode using
instructor+ a Pydantic schema. Test on a deliberately ambiguous prompt; observe schema enforcement. - Add timeout, retry-with-backoff, and circuit breaker (
pybreakeror hand-rolled).
Week 22 - Retrieval-Augmented Generation: Doing It Properly - End-to-End RAG with Honest Evals¶
- Pick a corpus (your own docs, a Wikipedia subset, or a publicly available QA dataset). Ingest with at least two chunking strategies.
- Stand up
pgvectororqdrant. Index with two embedding models. - Implement hybrid retrieval (dense + BM25 + RRF) and add a reranker.
- Build a 50-question gold eval set with reference answers. Score with
ragas. Iterate retrieval until faithfulness > 0.85. - Plot the impact of each pipeline change in a results table. Resist the urge to tune blindly.
Week 23 - Agents, Tools, Durable Execution, Cost & Safety - An Agent That Doesn't Burn Money¶
- Build a research agent: takes a question, plans, calls
web_searchandfetch_urltools, synthesizes an answer with citations. - Add: max-turns=10, max-tokens=200k, max-wall-time=120s, max-cost=$0.50. Verify each cap fires correctly.
- Persist agent state (turn-by-turn) to Postgres. Recover after a kill -9.
- Write replay tests: feed a saved trace to a test, mock the LLM, assert tool calls happen in the right order.
- Add an evaluator-optimizer loop: a critic LLM grades the answer; if score < threshold, revise once.
Rust Mastery¶
Week 1 - The Toolchain and the Compiler Pipeline - Hello World, Audited¶
- Create
hello-audited. Pin a specific stable toolchain viarust-toolchain.toml. - Build with - -release
. Runobjdump -h target/release/hello-auditedand identify the.text,.rodata,.data,.bss, and.eh_frame` sections. - Strip with
strip -sand compare binary sizes. Now rebuild withRUSTFLAGS="-C strip=symbols -C panic=abort"and compare again. - Document the size delta from each flag in
NOTES.md. You should observe.eh_frameshrinking dramatically when panic=abort is set-explain why.
Week 2 - Memory Layout: Stack, Heap, Data, BSS, TLS - Layout Forensics¶
Build a binary that allocates one value of each "kind":
- a stack [u8; 64],
- a Box<[u8; 64]>,
- a static FOO: [u8; 64] = [0xAB; 64];,
- a static mut BAR: [u8; 64] = [0; 64];,
- a thread_local! RefCell<[u8; 64]>.
Print the address of each (&value as *const _ as usize). Run under cat /proc/self/maps (spawn cat from inside the program) and prove which segment each address falls in. Write up the mapping in NOTES.md.
Week 3 - Ownership, Borrowing, and Region Inference - Defeat the Borrow Checker, Then Submit¶
You will be given (as exercise files) ten programs that the borrow checker rejects. For each:
1. Predict which rule is violated before reading the diagnostic.
2. Fix it three different ways (e.g., scope shrinking, split borrow, Cell/RefCell).
3. Pick the idiomatic fix and justify it in a one-line comment-but only if the comment captures non-obvious reasoning. (See feedback rule on comments.)
Week 4 - The Error Model - A Library With Two Faces¶
Build parse-units: a small crate that parses strings like "3.5 GiB" into a structured Quantity. Requirements:
- Public API returns Result<Quantity, ParseError> where ParseError is a thiserror enum with at least four variants.
- Internally, use ? to compose. No unwrap allowed except in unit tests.
- Provide a binary parse-units-cli that uses anyhow and prints rich context with .with_context(|| ...).
- Ship 100% line coverage measured by cargo-llvm-cov.
Week 5 - Advanced Lifetimes, Variance, and HRTBs - A Lending Iterator¶
Implement a WindowsMut lending iterator that yields overlapping &mut [T] windows over a slice. This requires GATs. Property-test it against a naive O(n²) reference implementation.
Week 6 - Traits, Coherence, and Monomorphization - Bloat Forensics¶
- Write a generic function
fn process<T: Display>(items: &[T]) -> Stringthat formats and concatenates. Instantiate it with five distinct types in a binary. - Run
cargo bloat --release --filter processand confirm there are five symbols. - Refactor to a
dyn Displayversion (&[&dyn Display]). Re-runcargo bloat. Document the binary-size delta and the codegen tradeoff. - Now read the disassembly of the dyn version with
cargo asmand identify the indirect call.
Week 7 - Smart Pointers and Interior Mutability - Build a Tracing Rc¶
Implement TracingRc<T> from scratch using UnsafeCell and NonNull. It must:
- Refcount strong and weak references correctly (study std::rc for the algorithm).
- Log every clone/drop to a thread-local trace buffer.
- Pass Miri (cargo +nightly miri test)-meaning your unsafe code is provably free of undefined behavior under the stacked-borrows model.
Week 8 - Drop, the Drop Checker, and Destructor Discipline - Resource Acquisition Is Initialization¶
Build a FileLock type wrapping flock(2):
- On construction, acquire an advisory lock.
- On Drop, release it. Even on panic.
- Provide a try_lock constructor returning Result<FileLock, std::io::Error>.
- Add a test that asserts the lock is released after a panic by spawning a child process that panics while holding the lock and observing in the parent that the lock can be re-acquired.
Week 10 - Channels, Lock-Free Patterns, and loom - An SPSC Ring Buffer¶
Implement a fixed-capacity SPSC ring buffer:
- Two AtomicUsize indices (head, tail), each on its own cache line.
- push and pop use Acquire/Release ordering pairs.
- Validate under loom with at least 4 elements and 3 pushes/pops.
- Benchmark against rtrb with criterion. You should be within 2× on x86_64.
Week 11 - Async Foundations: Future, Pin, Unpin, the State Machine - An Async Channel From Scratch¶
Implement a single-shot async oneshot channel:
- Sender<T> has send(self, T).
- Receiver<T> is Future<Output = Result<T, Cancelled>>.
- Use a single Mutex<State> and a Waker slot.
- Test with both tokio::test and smol::block_on. The result must be runtime-agnostic.
Week 12 - Runtimes: Tokio Internals, Smol, Embassy - Roll-Your-Own Mini Executor¶
Build a single-threaded executor in ~150 lines:
- A VecDeque<Arc<Task>> ready queue.
- Task holds a Mutex<Pin<Box<dyn Future>>> and implements ArcWake (or Wake on stable).
- block_on polls the root future; auxiliary spawn adds tasks.
- Run a small TCP echo server on top using polling (the same crate Smol uses) for I/O.
Week 9 - Threading, Send, Sync, and the Memory Model - A Correct Spinlock¶
Implement Spinlock<T> from scratch using AtomicBool:
- lock() spins with Relaxed load, then Acquire CAS.
- unlock() Release stores false.
- Returns a SpinlockGuard<'_, T> whose Drop unlocks.
- Verify with loom (run all interleavings-see week 10) that no two threads enter the critical section.
Week 13 - Unsafe Rust: Raw Pointers, NonNull, MaybeUninit, UB - A Sound Vec¶
Re-implement Vec<T> from scratch (the Nomicon's chapter 9 walk-through is the reference). Requirements:
- RawVec allocator wrapper handling growth.
- ZST (zero-sized type) handling-Vec<()> must work without ever allocating.
- Drop correct under panic in T::drop.
- Iteration via IntoIter with proper drop on partial consumption.
- Pass Miri on every public method.
Week 14 - FFI: Calling C, Being Called By C - Bind a Real C Library and Expose a Rust One¶
Two parts:
1. Consume: write Rust bindings to libsodium's crypto_secretbox family. Use bindgen for the raw layer, then wrap in safe Rust (own the keys with Zeroizing<[u8; 32]>, use typed nonces, return Results).
2. Expose: take your parse-units crate from Month 1 and ship a C-callable parse_units_c library with a cbindgen - generated header. Provide aMakefile` that links a tiny C program against it.
Week 15 - Declarative Macros (macro_rules!) - A hashmap! Macro With Diagnostics¶
Implement a hashmap! macro:
- hashmap! { "a" => 1, "b" => 2 } produces a HashMap.
- Trailing comma allowed.
- Type-checks: a typo like hashmap! { "a" => 1, "b" -> 2 } should produce a useful error pointing at the bad token (use compile_error! strategically).
- Pre-allocates with HashMap::with_capacity.
Week 16 - Procedural Macros - All Three Flavors¶
Build the dtolnay/proc-macro-workshop exercises end-to-end:
1. derive_builder - derive a builder pattern with field-level attributes for renaming and each-element setters.
2.seq - function-like macro seq!(N in 0..8 { ... }) that emits N expansions.
3. `sorted - attribute macro that enforces enum-variant or match-arm sortedness with proper spans on errors.
This workshop is the gold standard for proc-macro pedagogy. Do all of it.
Week 17 - Hexagonal Architecture and Domain Modeling in Rust - A Hexagonal URL Shortener¶
Build a workspace implementing a URL shortener:
- domain crate: ShortUrl, UrlAlias newtypes, a UrlRepository trait.
- application crate: Shorten, Resolve use cases.
- adapters/postgres crate: implements UrlRepository with sqlx.
- adapters/http crate: axum handlers using the application layer.
- bin/api crate: composition root.
- An adapters/in-memory crate used by tests, so application logic is testable without a database.
Week 18 - Zero-Copy I/O and the Poll-Based Model - A Zero-Copy Line Protocol¶
Build a server speaking a minimal newline-delimited protocol:
- Read into a BytesMut with try_read_buf.
- Parse line-by-line with winnow, yielding &[u8] slices.
- Push each parsed message into a downstream channel as a Bytes (cloned cheaply, shared with the parser's allocation).
- Benchmark with wrk or tcpkali. Inspect with perf and confirm __memcpy is not a hot frame.
Week 19 - Observability: tracing, metrics, OpenTelemetry - Add Observability to the Hexagonal URL Shortener¶
Take week 17's URL shortener and add:
- tracing::instrument on every use case, with explicit fields (no PII).
- Prometheus /metrics endpoint with request counts and per-endpoint latency histograms.
- OTLP export to a local Jaeger via docker-compose.
- A flamegraph.svg from a 30-second load test, committed.
Week 20 - Testing Strategy: Unit, Property, Fuzz, Miri, Integration - Test-Pyramid the URL Shortener¶
- Property-test the alias-generation function (idempotent, collision-resistant under birthday-bound assumptions).
- Fuzz the public HTTP handlers via the
axum::Routerdirectly (no socket). - Integration-test the Postgres adapter against a real Postgres in
testcontainers. - Snapshot-test the OpenAPI spec with
insta. - Achieve 90%+ coverage per
cargo-llvm-cov.
Week 21 - Implementing Complex Data Structures From Scratch - Pick One and Ship It¶
Implement one of the three to publishable quality:
- Property-tested against std equivalent.
- Miri-clean.
- Loom-verified (for the lock-free).
- Criterion-benchmarked against std/dashmap/intrusive-collections.
- README explains the algorithmic choice and tradeoffs.
Week 22 - no_std, Custom Allocators, Embedded Targets - Two Targets¶
- Bare metal: blink an LED on a real or QEMU-emulated Cortex-M target using
embassy. Optional but recommended. - Custom allocator: write a simple bump allocator. Use it as
#[global_allocator]for a smallno_std + allocbenchmark and observe behavior.
Week 23 - Compiler Internals: MIR, Borrow Check, Codegen - Read, Build, Land¶
- Build rustc from source. Modify a single diagnostic message in
compiler/rustc_borrowck/src/...to add a new help line. Rebuild stage-1 and confirm the new message in - -explain`. - Find an issue with
E-easy. Read the linked discussion. Cross-reference withrustc-dev-guide. Do not yet open a PR; instead, write a one-page plan describing the proposed change. Discuss with a maintainer in the issue comments.
Week 24 - Capstone Integration, Profiling, Hardening, Defense - Defend the Design¶
Schedule a 45-minute mock review with a senior peer (or record yourself if none is available). Present:
- The architecture diagram.
- One slide per non-obvious decision (e.g., "why sharded RwLock instead of dashmap", "why tokio over glommio").
- A live demo of the test suite.
- A live demo of one production-hardening tool (PGO, BOLT, or fuzz corpus).
The deliverable is the defense, not the slides. If you cannot answer "what fails first under load?" or "what is your worst-case allocation pattern?", you have not yet finished the curriculum.