Skip to content

Labs

Every weekly module ends with a hands-on Lab. The labs are the curriculum -- internalizing a topic happens here, not in the prose above. This index aggregates every lab across every path so you can browse by domain instead of by curriculum.

187 labs across 8 paths.

Jump to a path

AI Systems

Week 1 - The Compute Hierarchy and the Cost Model - Roofline Sketch

Write a small program (Python+NumPy, or any) that: 1. Performs C = A @ B for square matrices N=64, 256, 1024, 4096. 2. Times each. Computes achieved FLOPS (= 2·N³ / time). 3. Computes the bytes moved (= 3·N²·sizeof(dtype)). 4. Plots achieved FLOPS vs arithmetic intensity on log-log axes. 5. Overlays the theoretical roofline of your laptop CPU (look up its peak FLOPS and DRAM bandwidth).

You should see the small N points sit under the bandwidth ramp and the large N points approach the compute roof. Keep the plot-every subsequent lab will produce another.

Week 2 - Linear Algebra Refresh, BLAS, NumPy - Three Matmuls

Implement 1024×1024 matmul three ways: 1. Naive triple-loop in Python (will take ~minutes; that's the point). 2. Naive in NumPy with explicit loops-only marginal speedup. 3. numpy.dot-measure speedup over (1).

You should see ~10,000× speedup between (1) and (3). Internalize why. Read Goto and van de Geijn's "Anatomy of a High-Performance Matrix Multiplication" if you want the deep version (recommended).

Week 3 - Tensors, Autograd, the Gradient Tape - Autograd From Scratch

Implement reverse-mode AD in ~100 lines of pure Python (no PyTorch). Support: - A Tensor class wrapping a NumPy array with a grad field. - __add__, __mul__, __matmul__, relu, sum. Each records its inputs and a backward function. - A backward() method that topologically sorts and traverses the graph. - Test on a tiny MLP: define f = x @ W1 + b1; g = relu(f); h = g @ W2 + b2; loss = h.sum(). Verify the gradients match a torch.autograd reference within float-precision.

This is Andrej Karpathy's micrograd exercise. Do it before reading his code; then read his code and compare.

Week 4 - The Honest Training Loop - Train Something Small, Right

Train a 1-layer transformer (or 2-layer MLP if transformer is too far) on TinyShakespeare or MNIST. Required: - Dataset class + DataLoader with num_workers=4, pin_memory=True. - AMP autocast (BF16 on Ampere+, FP16 with GradScaler on older). - LR schedule (warmup + cosine). - Checkpoint every N steps; able to resume from any checkpoint and produce identical loss thereafter (within 1e-5). - Per-step metrics: loss, tokens/sec, GPU memory, GPU util%. - Final report: train + val loss curves, throughput, peak memory, total cost in $ (compute hours × $/hr).

Week 5 - GPU Hardware Architecture - Inspect Your Hardware

  1. Run nvidia-smi and nvidia-smi -q. Read every line.
  2. Compile and run NVIDIA's deviceQuery sample. It prints all the numbers above for your specific GPU.
  3. Compile and run bandwidthTest (CUDA samples). Compare measured PCIe and HBM bandwidth to spec.
  4. Compute: at the measured HBM BW and compute peak of your GPU, what is the arithmetic intensity break-even? Sketch the roofline.

Week 6 - Your First CUDA Kernels - Kernel Speedrun

Write three kernels in CUDA C++: 1. Vector add: SAXPY (y = a*x + y). Time vs cuBLAS axpy. 2. Reduction: sum a million floats. Compare your naive version (one global atomic) with a hierarchical version (block-level reduction in shared memory, then global). Expect ~100× difference. 3. Naive matmul: 1024×1024 BF16. Compare to cuBLAS-expect to be 50-100× slower. Don't get discouraged; you'll close most of the gap in week 7.

For each: measure runtime with cudaEvent_t timing; compute achieved throughput; mark on the roofline.

Week 7 - Memory Optimization: Coalescing, Shared Memory, Tensor Cores - Climb the Roofline

Take your week 6 naive matmul and progressively optimize: 1. Coalesce loads (transpose access pattern). Re-time. 2. Tile in shared memory with 32×32 blocks. Re-time. 3. Double-buffer with cp.async. Re-time. 4. Use tensor cores with BF16. Re-time.

You should reach 30–60% of cuBLAS perf. Document each step's improvement and the residual gap. Read NVIDIA's cutlass examples for the production-grade version.

Week 8 - Triton: GPU Kernels From Python - Three Triton Kernels

  1. Elementwise add (the Hello World).
  2. Softmax with online maximum subtraction (numerical stability). Compare to torch.softmax perf.
  3. Naive matmul in Triton with autotuning. Compare to cuBLAS-you should reach 70-90% of peak for square BF16 matmul on common shapes.

Week 10 - torch.compile, TorchDynamo, Inductor - Compile and Compare

Take your honest-training-loop from Month 1. Add model = torch.compile(model). Measure: 1. First-step time (compilation cost). 2. Steady-state step time vs uncompiled. 3. With TORCH_LOGS="recompiles": how many recompilations occurred? Why? 4. With mode="max-autotune": extra speed vs default? Worth the compile time?

Triage any graph breaks; report in COMPILE_LOG.md.

Week 11 - JAX, XLA, HLO - JAX Equivalent

Re-implement your Month 1 training loop in JAX: - Pure-functional model (no nn.Module mutation). - optax for the optimizer. - jax.jit the train step. - Add jax.vmap somewhere meaningfully (e.g., per-example metric computation). - Compare end-to-end throughput with the PyTorch baseline.

Week 12 - Custom Operators: From CUDA Kernel to torch.ops - RMSNorm From Scratch

RMSNorm is used in modern LLMs (Llama, Qwen). Implement it three ways: 1. PyTorch: pure tensor ops. 2. Triton custom op: a fused kernel that reads input, computes RMS, normalizes, scales-all in one pass over HBM. 3. CUDA C++ extension: same kernel in CUDA C++ with a pybind11 binding.

For each: forward + backward, autograd-correct (numerical-grad test), benchmarked vs the others on (B, S, H) = (8, 4096, 4096) BF16. Your fused Triton version should beat PyTorch by 3-5×.

Week 9 - PyTorch Internals: Tensor, Dispatcher, ATen - Trace an Op

  1. From Python, run a + b for two CUDA tensors. Use TORCH_SHOW_DISPATCH_TRACE=1 (or torch._C._dispatch_print_registrations()) to see the dispatcher's path.
  2. Read `aten/src/ATen/native/cuda/BinaryOps.cu - find the actual CUDA kernel for add.
  3. Trace torch.matmul(a, b) similarly. Note that for BF16 it routes to cuBLAS.
  4. Document the call chain in TRACE.md.

Week 13 - Communication Primitives: NCCL, Allreduce, Topology - Allreduce Bench

On at least 2 GPUs (single node fine), run an allreduce benchmark: 1. torch.distributed.all_reduce on tensors from 1 KB to 1 GB. 2. Compute achieved bandwidth (= 2(N-1)/N · message_size / time). 3. Plot bandwidth vs message size; identify the message size at which BW saturates (the "knee"). 4. If you have access: run on 8 GPUs via single node (NVLink) and compare to 8 GPUs across 2 nodes (InfiniBand). Document the gap.

Week 14 - Data Parallelism: DDP, ZeRO, FSDP - FSDP a Small Model

On 4-8 GPUs (single node fine): 1. Train a 1B-parameter transformer in FSDP. Use transformer_auto_wrap_policy. 2. Compare memory and throughput: DDP-OOM-baseline (small model) vs FSDP small vs FSDP same-model-larger. 3. Add activation checkpointing; re-measure. 4. Add CPU offload; observe the speed cost. 5. Compute scaling efficiency (throughput_8gpu / (8 × throughput_1gpu)).

Week 15 - Tensor Parallelism and Pipeline Parallelism - Implement Tensor-Parallel Attention

By hand, in pure PyTorch + torch.distributed: 1. Implement the Megatron-style tensor-parallel multi-head attention: column-parallel QKV projection, sharded heads, row-parallel output projection. 2. Verify numerically against a single-GPU reference for correctness (allclose to atol=1e-3). 3. Benchmark on 4 GPUs vs 1-GPU baseline. Compute scaling efficiency.

Week 16 - Mixed Precision, FP8, Numerical Stability at Scale - FP8 Train a Small Model

On at least one H100/H200/B200 (you may need to rent for a day): 1. Take your week 14 FSDP setup. Replace all linear layers with te.Linear. Wrap blocks with te.fp8_autocast. 2. Train the same model in BF16 vs FP8. Compare: - Throughput. - Memory. - Loss curve (the test of stability-FP8 should match BF16 within noise). 3. Document any NaN events and recovery actions.

If H100+ is unavailable, do this lab in BF16 + torch.cuda.amp, comparing against FP32. The instability dynamics are similar at lower stakes.

Week 17 - LLM Inference, the KV-Cache, Attention Math - Decode From Scratch

  1. Implement greedy decoding for a small Hugging Face model (Llama-3-8B works on a single A100; smaller for L4):
  2. Prefill once, capture KV-cache.
  3. Decode loop: forward(token, kv_cache) → next_token.
  4. Append next_token to KV-cache.
  5. Measure tokens/sec. Compute the achieved HBM BW (model weights × tokens / time).
  6. Replace standard attention with flash_attn_with_kvcache. Re-measure.
  7. Document the decode-vs-prefill latency split for a 1K-prefill, 512-decode request.

Week 18 - Paged Attention, Continuous Batching, vLLM - vLLM Internals

  1. Install vLLM. Serve a 7B model. Run a load test (benchmark_serving.py) at various concurrency levels.
  2. Read vllm/core/scheduler.py and vllm/attention/backends/flash_attn.py end-to-end. Annotate the scheduler's iteration loop.
  3. Build a mini-scheduler in Python (not for prod; for understanding): manages a fixed pool of KV blocks, schedules decode steps, evicts on memory pressure. Use real model forward via vLLM's lower-level APIs or HuggingFace.
  4. Compare throughput of your mini-scheduler vs vLLM proper. The gap is likely 5-20×-that gap is your education.

Week 19 - Quantization: INT8, INT4, FP8, AWQ, GPTQ, SmoothQuant - Quantize and Compare

On a 7B-13B model: 1. Run baseline BF16 inference. Capture TTFT, TPOT, model size, throughput. 2. Quantize with AWQ (W4A16). Re-measure. Eval on a small held-out set (e.g., MMLU 200-question subset, or perplexity on Wikitext) for accuracy. 3. Quantize with FP8 (if on Hopper+). Re-measure. 4. Optionally: GPTQ comparison, AWQ INT8 comparison. 5. Build a tradeoff matrix: throughput, memory, perplexity / accuracy.

Week 20 - Speculative Decoding, Disaggregation, Inference Frontiers - Speculative Decoding

  1. Pair a small model (1B) drafting a larger model (7-13B).
  2. Implement vanilla speculative decoding: draft-then-verify.
  3. Measure: acceptance rate, tokens/sec gain, vs baseline single-model decoding.
  4. Tune K (draft length); sweep; identify the sweet spot for your workload.

Week 21 - ML on Kubernetes: KServe, KubeRay, Volcano, GPU Operators - Train and Serve on K8s

  1. Bring up a small GPU-enabled cluster (kind+nvidia, or a 2-node cloud cluster with 1-2 GPUs each).
  2. Install GPU Operator. Verify kubectl describe node shows nvidia.com/gpu: N.
  3. Install Volcano. Submit a 4-GPU gang-scheduled training job (a small FSDP run from week 14).
  4. Install KServe + vLLM runtime. Deploy a 7B model. Hit it with a load test. Demonstrate autoscaling.
  5. Document the YAML for each in a deployable repo.

Week 22 - Observability, Cost, Eval Pipelines, MLOps - Eval and Drift Pipeline

  1. Build a CI pipeline: on every model push, run lm-evaluation-harness on a fixed subset (MMLU 500-question, HumanEval pass@1).
  2. Compare against a baseline; fail the pipeline on >2% regression.
  3. Wire production traffic samples into a drift dashboard: input length distribution, output length distribution, refusal rate, fraction of failed JSON-mode outputs.
  4. Synthetic drift: shift the input distribution (longer prompts) and verify the dashboard catches it.

Week 23 - Safety, Red-Teaming, Alignment Infrastructure - A Safety Layer

Take your week 21 vLLM deployment. Add: 1. Input classifier (Llama Guard or a small custom classifier)-block obvious prompt injections. 2. Output classifier-block policy-violating outputs. 3. Constrained-decoding mode for any structured-output endpoint. 4. Audit logging to a separate, append-only store. 5. A nightly red-teaming job that fires 1000 adversarial prompts; measures failure rate; alerts on regression.

Week 24 - Capstone Integration & Defense - Defend the Design

Schedule a 60-minute mock review (peer or recorded). Walk through: 1. The architecture diagram. 2. The roofline analysis: where does your system sit on the roofline? What's bound by what? 3. One slide per non-obvious decision (e.g., "why FSDP-2 over DeepSpeed Stage-3", "why AWQ over GPTQ", "why your batching policy"). 4. A live demo of the end-to-end artifact. 5. A live demo of one production-quality concern: cost, observability, safety, or fault tolerance.

The deliverable is the defense, not the slides. If you cannot answer: - "What is your worst-case tail latency under 10× concurrent load?" - "What happens when a GPU fails mid-training?" - "What is your cost per million output tokens?" - "How would you scale this to 10× the model size?" ...you have not yet finished the curriculum.

Container Internals

Week 1 - The OCI Image Spec - An Image Without Docker

  1. skopeo copy docker://alpine:3.19 oci:./alpine-layout:3.19. Inspect the layout. Read index.json, the manifest blob, the config blob.
  2. Find a layer blob, decompress, list its contents (tar tzf <blob>).
  3. Compute one of the layer digests yourself (sha256sum) and verify.
  4. Modify the config (e.g., change the entrypoint) by writing a new config blob, generating a new manifest, updating index.json. Verify with skopeo inspect oci:./alpine-layout:3.19.

Week 2 - The OCI Runtime Spec, runc, and crun - Run a Container Without Docker

  1. Generate a default config: runc spec produces config.json.
  2. Build a rootfs: mkdir rootfs && skopeo copy docker://alpine:3.19 oci:./alpine && umoci unpack --image ./alpine:3.19 ./bundle (umoci gives you both rootfs + config in one step). Or do it manually.
  3. Run: sudo runc run mycontainer. You're inside the container.
  4. Modify the config to: drop all capabilities except CAP_NET_BIND_SERVICE, set a memory limit of 64M, mask /proc/sys. Re-run; verify with cat /proc/self/status | grep Cap and pressure tests.
  5. Repeat with crun. Time the startup difference (time runc run vs time crun run)-crun is typically 2–5× faster.

Week 3 - skopeo Deep Dive: Multi-Arch, Signing, Sync - A Daemonless Image Pipeline

  1. Pull a multi-arch image as an OCI index. Inspect each per-platform manifest.
  2. Write a script that, given an image reference, prints a table of platforms, layer counts, total compressed/uncompressed sizes, and labels.
  3. Use skopeo sync to mirror three images into your local registry. Verify by pulling the mirrored versions.
  4. Compare skopeo copy of a 1-GB image with and without - -multi-arch index-only` on the destination side.

Week 4 - Image Internals: Manifest Lists, Index, Annotations, Sparse Pulls - Build a Multi-Arch Image By Hand

  1. Build an image for linux/amd64 and linux/arm64 separately (use buildah --arch= or docker buildx).
  2. Use skopeo to assemble a manifest list pointing to both.
  3. Push to your local registry.
  4. Pull from each architecture; verify the right manifest is selected.
  5. Add OCI annotations (source, revision, created); verify they survive the pipeline.

Week 5 - OverlayFS and Storage Drivers - OverlayFS By Hand

  1. Create three lower dirs with different files. Mount as overlay. Verify merged view.
  2. Modify a file from the lower; observe copy-up in the upper.
  3. Delete a lower file from the merged view; observe the whiteout in the upper.
  4. Reproduce a "container layer": treat your container's tarball-extracted contents as a lower; create a fresh upper; mount; modify; tar up the upper to produce a new layer.

Week 6 - buildah: Building Images Without Dockerfiles - Image as a Shell Script

  1. Write a shell script that uses buildah from, run, copy, config, commit to produce a small Go-binary-on-alpine image. No Dockerfile.
  2. Add reproducibility flags: - -source-date-epoch, - -timestamp, SOURCE_DATE_EPOCH env. Build twice; verify hashes match.
  3. Build the same image with buildah bud -f Dockerfile. Compare hashes-they should be identical when both are reproducible.

Week 7 - Multi-Stage Builds, Distroless, Minimal Images - Three Image Diet

Take a Go (or Rust, or Python) service and produce three images: 1. Naive: FROM ubuntu, build inline. Measure size. 2. Distroless: multi-stage with gcr.io/distroless/static. Measure size. 3. Scratch: static build, FROM scratch. Measure size.

Document the size delta and any operational tradeoffs (e.g., scratch has no ca-certificates -tls.Configfailures unless youCOPY --from=alpine /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/`).

Week 8 - Layer Caching, Build Context, Reproducibility - Cache and Reproducibility

  1. Take a non-trivial image; measure clean-build time and incremental-build time (single source change). Reorder Dockerfile to maximize cache hits; re-measure.
  2. Enable BuildKit cache mounts; measure again.
  3. Build the same image on two machines with SOURCE_DATE_EPOCH set; verify the digests match.

Week 10 - CRI-O and the Kubernetes CRI - CRI Direct

  1. Install CRI-O on a clean machine (or use containerd with its CRI plugin).
  2. Use crictl runp (run pod), crictl create, crictl start to manually launch a pod-equivalent without Kubernetes. Inspect with crictl inspect.
  3. Add an OCI hook (e.g., a pre-start hook that logs every container) by configuring CRI-O's hooks_dir.

Week 11 - podman and the Rootless Model - Rootless Production

  1. As a non-root user, install podman. Configure /etc/subuid, /etc/subgid.
  2. Run a multi-container app with podman play kube (Kubernetes-YAML-as-podman-input).
  3. Generate systemd units; install with - -user. The service starts at user login and persists across reboots (withloginctl enable-linger`).
  4. Compare slirp4netns vs pasta networking throughput with iperf3.

Week 12 - Sandboxed Runtimes: gVisor and Kata Containers - Two Sandboxes

  1. Install gVisor. Register as a containerd runtime. Run nerdctl --runtime runsc against a test workload.
  2. Install Kata. Register as a containerd runtime. Run the same workload.
  3. Benchmark both vs runc for: startup time, syscall-heavy workload (e.g., find /usr -type f), and CPU-bound workload (e.g., sysbench cpu).
  4. Document the tradeoffs in a markdown matrix.

Week 9 - containerd Architecture - containerd Without Kubernetes

  1. Install containerd and nerdctl. Configure /etc/containerd/config.toml.
  2. Pull, run, exec, kill containers entirely via nerdctl. Confirm dockerd is not running.
  3. Enable the stargz-snapshotter. Pull a large image with eStargz layers. Measure first-run startup time vs cold pull.
  4. Use ctr to inspect tasks, snapshots, and content blobs at the daemon level.

Week 13 - The Default Threat Model - Audit a Real Image

  1. Pick a popular base image (nginx, redis). Run with docker scout cves and trivy image. Record findings.
  2. Run as default; identify how many capabilities it has via capsh --print.
  3. Re-run with - -cap-drop=ALL, - -security-opt=no-new-privileges, read-only rootfs. Identify what breaks. Fix what's needed.
  4. Document the minimum config to run safely.

Week 14 - Capabilities for Containers - Capability Diet

  1. For three services (e.g., a Go HTTP server, an Nginx reverse proxy, a Node.js app), run each with --cap-drop=ALL. Identify what fails (the error usually mentions the syscall - map back to the capability via capabilities(7)).
  2. Add back capabilities one at a time. Document the minimum set per service.
  3. Configure your container runtime (podman, containerd) or pod-security policy to apply this minimum by default.

Week 15 - Seccomp Profiles for Containers - Custom Seccomp

  1. Run a service under oci-seccomp-bpf-hook (or strace -ff) and exercise it with your integration tests.
  2. Generate a tight profile (default-deny + only the recorded syscalls).
  3. Run with the profile; verify the service works under load.
  4. Inject a "test" syscall (e.g., setns, unshare, or mount) the service doesn't legitimately use; verify it's blocked at runtime.

Week 16 - LSM for Containers: SELinux and AppArmor - MAC Per Workload

  1. On RHEL/Fedora: write a custom SELinux policy module for one service. Test enforcement.
  2. On Ubuntu/Debian: write an AppArmor profile for the same service. Test enforcement.
  3. Document the comparative effort and expressivity.

Week 17 - Software Bill of Materials (SBOM) - SBOM Pipeline

  1. Generate an SBOM for one of your images with Syft (SPDX) and Trivy (CycloneDX). Diff the two-note where they disagree.
  2. Attach the SBOM to the image with cosign attach sbom.
  3. From a downstream consumer, retrieve and parse the SBOM with cosign download sbom.
  4. Add a CI step that fails the build if the SBOM contains a known-bad license (e.g., AGPL in a closed-source project).

Week 18 - Vulnerability Scanning: Grype, Trivy, Clair - Triage in CI

  1. Run Trivy on an image; produce a SARIF report. Upload to GitHub Code Scanning (or your scanner of choice).
  2. Pick three findings; for each, write a one-paragraph triage decision: fix, accept, or VEX-suppress.
  3. Author the VEX statement using vexctl (OpenVEX). Attach to the image.
  4. Re-scan-verify the suppressed findings are now flagged as "not exploitable" rather than disappearing entirely.

Week 19 - Signing and Verification: Cosign, Sigstore - Signing Pipeline

  1. Sign an image with cosign keyless (GitHub OIDC). Verify.
  2. Attach SBOM and VEX as attestations.
  3. Configure policy-controller (Sigstore's Kubernetes admission controller) to require a valid signature from your CI's OIDC subject before allowing deploys.
  4. Try to deploy an unsigned image-observe the rejection.

Week 20 - SLSA, Provenance, and Reproducibility - SLSA L3 in CI

  1. Set up a GitHub Actions workflow that builds, scans, signs, and produces SLSA L3 provenance for an image on every release tag.
  2. Verify end-to-end: pull the image, retrieve its attestations, validate the provenance points back to the correct commit and CI run.
  3. Reproducibility: rebuild the same tag from a fresh runner; verify image digest stability.

Week 21 - Scaffolding: Project Setup, OCI Bundle Reading - Parse and Run

  1. Implement config.json parsing (the runtime-spec repo has a Go reference type definition).
  2. Implement a no-isolation mode: just chdir(rootfs), chroot(rootfs), execve. Verify it runs.
  3. Add command-line plumbing for the lifecycle subcommands.

Week 22 - Namespaces and Process Isolation - Namespaces Working

  1. Implement the parent/child fork-with-clone-flags. Verify lsns -p <pid> shows new namespaces.
  2. Implement pivot_root into the rootfs. Verify / inside the container is the bundle's rootfs/.
  3. Implement /proc mount inside the new PID namespace. Verify ps shows only the container's processes.
  4. Implement UID/GID mapping for user-namespaced runs.

Week 23 - Cgroups v2, Capabilities, Seccomp, OverlayFS - All The Layers

  1. Implement cgroup v2 setup. Verify memory.max=64M actually limits the container.
  2. Implement capability dropping. Verify getcap/capsh --print inside the container.
  3. Implement seccomp filter loading. Verify a denied syscall fails.
  4. (Optional) Implement OverlayFS rootfs construction from a multi-layer image.

Week 24 - Polish, Defense, Distribution - Defend the Project

Schedule a 45-minute mock review: - Live demo: build, run a container, exec into it, observe isolation. - Walk through the lifecycle code with the OCI spec open beside it. - Demo a hardened run (cgroups + caps + seccomp + LSM) and verify isolation. - Compare with runc/crun: what's missing? What's different? Why is your design simpler?

Go Mastery

Week 1 - The Toolchain and the Build Pipeline - Hello World, Audited

  1. Create hello-audited. Set go 1.22 and a toolchain go1.22.x directive.
  2. Build with go build -trimpath -ldflags="-s -w -X main.version=v0.1.0". Run go version -m ./hello-audited.
  3. Strip with strip and compare. Cross-compile to linux/arm64, darwin/arm64, windows/amd64 with GOOS=... GOARCH=... go build.
  4. Document the size delta from each flag in NOTES.md. - s -wtypically saves ~30%; - trimpath is a reproducibility flag (no local paths in the binary), not a size flag.
  5. Inspect the binary with go tool nm and go tool objdump. Identify the runtime symbols (runtime.main, runtime.gcStart, runtime.schedule).

Week 2 - The GMP Scheduler Model - Schedule Forensics

Build a tiny program that: 1. Spawns 1,000 goroutines, each computing a busy CPU loop for 10ms. 2. Records the time-to-completion distribution. 3. Re-runs with GOMAXPROCS=1, =2, =N (your core count). 4. Re-runs with runtime.Gosched() inserted in the loop. 5. Re-runs with the loop replaced by time.Sleep(10*time.Millisecond) (the netpoller path).

Tabulate the latency distributions in NOTES.md. Explain why GOMAXPROCS=1 without Gosched() produces high tail latency. Then, capture an execution trace with runtime/trace:

trace.Start(f)
defer trace.Stop()
View with go tool trace. Identify the per-P timeline, GC pauses, and proc transitions.

Week 3 - Stack Management - Stack Growth in the Wild

  1. Write a recursive function func depth(n int) int { if n == 0 { return 0 }; var buf [256]byte; _ = buf; return 1 + depth(n-1) }.
  2. Run with progressively larger n. Use GODEBUG=gctrace=1,scheddetail=1 and observe stack growth events.
  3. Re-run under runtime.ReadMemStats snapshots, recording StackInuse and StackSys.
  4. Now write the same function with a `goroutine - per-call style and observe how stack churn changes.

Week 4 - Escape Analysis and the Inliner - Escape Forensics

For each of the following snippets, predict whether the value escapes, then verify with - gcflags=-m: 1.func A() int { x := 7; return &x }2.func B() int { x := 7; p := &x; return p }3.func C() { x := 7; go func() { fmt.Println(x) }() }4.func D() { x := bytes.Buffer{}; x.WriteString("hi"); fmt.Println(x.String()) }5.func E(s []int) int { return len(s) }called asE(make([]int, 8)). 6.func F() any { return 7 }(boxing intointerface{}`). 7. A method call on an interface value vs the concrete type (covered in Week 7).

For each that escapes, propose a refactor that keeps it on the stack. Then write a Criterion-style benchmark (testing.B) and prove the win.

Week 5 - Memory Layout, Padding, Alignment - Layout Forensics

  1. Define five "interestingly bad" structs (e.g., struct{ a bool; b int64; c bool; d int64; e bool }). Compute their unsafe.Sizeof by hand, then verify.
  2. Reorder for minimal padding. Re-measure. Document each delta.
  3. Build a benchmark with []Struct of 1M elements; compare allocation/scan time with the badly-padded vs the optimally-packed version. Use runtime.ReadMemStats to capture HeapAlloc and GC pause durations.
  4. Construct a false-sharing example: two atomic counters incremented by different goroutines, with and without CacheLinePad between them. Benchmark contention. Expect 5–20× difference.

Week 6 - The Garbage Collector - GC Forensics

  1. Write a service that allocates 100 MB/s of short-lived objects. Run with GODEBUG=gctrace=1. Read each GC line and identify: total heap, live heap, pause time, pacer target.
  2. Set GOMEMLIMIT=512MiB and GOGC=off. Re-run; observe how the GC is now driven entirely by the memory ceiling.
  3. Set GOGC=50 (no GOMEMLIMIT). Re-run; observe more frequent, smaller GCs.
  4. Capture a go tool pprof -alloc_objects profile. Identify the top five allocation sites. Refactor at least two using sync.Pool or pre-allocated buffers. Re-benchmark.
  5. Capture a go tool trace and locate the GC mark phases visually.

Week 7 - Interface Values, itabs, and Dispatch Cost - Interface Bench

  1. Build a tight loop calling a method via three paths: concrete type, interface, generic type parameter. Benchmark with - benchmem`.
  2. Inspect the disassembly with go tool objdump -s 'main\.benchInterface'. Identify the indirect call.
  3. Refactor a real-world pattern (a Logger interface used 10× in a hot path) into a concrete type or a type-parameterized version. Measure the win or non-win.
  4. Build a worst-case allocation example: passing a stack int into fmt.Println(...). Show with - gcflags=-mthat the int escapes (boxing intoany). Replace withfmt.Println(strconv.Itoa(x))` and re-measure.

Week 8 - Allocation Profiling, sync.Pool, GC Tuning - Pool the Hot Path

  1. Take the JSON-handling hot path of any service. Run pprof -alloc_objects under load. Identify the top three allocation sites.
  2. Introduce a sync.Pool for the most appropriate one (typically bytes.Buffer or a decoder).
  3. Re-benchmark. The win should be visible in allocs/op and in p99 latency under load.
  4. Now intentionally misuse: Pool.Put without resetting state. Detect the bug under - race` or via a deliberately-inserted assertion.

Week 10 - sync Primitives and sync/atomic - Lock-Free SPSC Ring

Build a single-producer, single-consumer ring buffer using only atomic.Uint64 indices. Pad the indices to separate cache lines. Validate with go test -race -count=1000 running 1 producer and 1 consumer. Benchmark against chan T and against sync.Mutex - protected slice. Document the cache-line padding's effect with awithoutPad` variant-expect a 3–10× difference on modern x86.

Week 11 - context.Context, Cancellation, errgroup, singleflight - Context Discipline

  1. Take a small HTTP service. Audit every blocking operation (DB query, downstream RPC, Redis call). Each should accept and propagate ctx. Fail any goroutine that captures a request ctx and outlives the request.
  2. Implement a parallel fan-out using errgroup with N=8 workers, all cancellable on first error.
  3. Implement a cache stampede test: 1000 concurrent requests for the same uncached key. Without singleflight, observe N upstream calls. With singleflight, observe 1.
  4. Demonstrate context.AfterFunc cleanup: register a release-resource callback on cancellation; verify it fires under both timeout and explicit cancel.

Week 12 - Worker Pools, Leak Detection, Deadlock Prevention - Worker Pool Survival Test

Build a worker pool that handles: 1. Backpressure-bounded input channel, drop-with-metric on overflow. 2. Graceful shutdown-on ctx.Done(), drain in-flight tasks within a deadline, then abandon the rest. 3. Per-task timeouts-WithTimeout(ctx, 100ms) per task. 4. Panic isolation-a panic in one task does not kill the worker; recover and report. 5. Leak-clean-goleak passes after cancel(); pool.Wait().

Stress-test with 1M tasks across 1000 workers under - race`.

Week 9 - Channels, Deeply - Channel Internals

  1. Write a benchmark comparing: unbuffered chan, buffered chan(1), buffered chan(1024), sync.Mutex + slice queue, and a `sync/atomic - only SPSC ring buffer. Use 1 producer, 1 consumer, 10M messages.
  2. Plot the throughput. The atomic SPSC should be 5–10× the channel; the mutex queue may beat the buffered channel for small messages.
  3. Reproduce a nil - channel select pattern: a goroutine that toggles between two upstream channels by setting one tonil` to disable a case.
  4. Write an "unbounded channel" using a goroutine that bridges an in-channel to an out-channel via an internal slice buffer. Discuss why this exists and why it is dangerous (memory growth on slow consumer).

Week 13 - Reflection: reflect, Performance, and Discipline - A Reflective Validator

Build a struct validator that processes validate:"..." tags: - Must support: required, min=N, max=N, email, regexp=<re>. - Must cache per-type field metadata (one reflect.Type walk per type ever). - Must produce structured errors (path, rule, value). - Must beat a naive non-cached implementation by 10× in benchmarks.

Compare against go-playground/validator for both ergonomics and performance.

Week 14 - go/ast, go/parser, go/types: Static Analysis - Build a Custom Analyzer

Write an analyzer that flags: 1. context.Background() calls outside main and *_test.go files. 2. time.After inside a select body (the classic timer-leak pattern). 3. Goroutines launched with closures capturing a context.Context parameter named ctx of an enclosing HTTP handler (heuristic; document the false-positive risk).

Wire as a unitchecker binary. Run on a real codebase and triage findings. Document each false positive in ANALYZER_NOTES.md.

Week 15 - go generate and AST-Based Code Generation - Three Generators

Build three small generators: 1. Enum stringer-a from-scratch reimplementation of stringer for one annotation pattern. 2. Mock generator-for one interface, generate a struct with method recorders and call assertions. 3. JSON marshaler-generate a type-specific MarshalJSON that allocates zero maps. Compare allocations against encoding/json for the same type.

For each: go vet - clean output,gofmt - formatted, with a go generate directive in the consumer file.

Week 16 - Plugins: plugin, go-plugin, gRPC-Based Extensions - A Pluggable Storage Backend

Build a service whose storage backend is a plugin. The host defines an interface Storage { Get(key) (val, err); Put(key, val) error; Delete(key) error }. Ship two plugins: an in-memory backend, and a file-system backend. Both communicate via gRPC over go-plugin. Demonstrate hot-swap by killing one plugin process and starting the other.

Week 17 - DDD in Go: Hexagonal Architecture, Bounded Contexts - A Hexagonal URL Shortener

Build a workspace implementing a URL shortener: - internal/domain -ShortURLaggregate,URLRepoandHasherports. -internal/application - Shorten and Resolve use cases. - internal/adapter/postgres - implementsURLRepoagainst a real Postgres (usepgxnotdatabase/sql). -internal/adapter/http - REST handlers using application. - internal/adapter/memory - in-memoryURLRepofor tests. -cmd/api - wires everything.

The architectural test (a Go test) walks the import graph and fails if internal/domain imports any adapter package or stdlib networking package.

Week 18 - Observability: slog, pprof, trace, OpenTelemetry - Wire the URL Shortener

Take week 17's URL shortener and add: - slog JSON output with request-scoped logger via context. - /metrics Prometheus endpoint exposing request count, latency histogram, and Go runtime metrics. - OTLP traces exported to a local Jaeger via docker-compose. - /debug/pprof/* on a separate admin port, gated by IP allowlist. - A 30-second runtime/trace capture under load, committed as trace.out with a markdown analysis.

Week 19 - gRPC: Streaming, Interceptors, Deadlines, Retries, Outlier Ejection - A Hardened gRPC Service

Build a minimal Echo service with: - Unary + server-streaming + bidi methods. - Server interceptors for: panic recovery, request logging, OTel tracing, auth, rate limiting. - Client config with retries (UNAVAILABLE only), 2 s default deadline, round-robin load balancing. - A grpc.health.v1 health server. - A tools/grpc_load_test/ directory with `ghz - based load tests; capture latency p50/p95/p99 under 10K QPS.

Week 20 - Testing Strategy: Five Surfaces, Race-Clean - Test-Pyramid the URL Shortener

  • Unit: 100% line coverage on internal/domain and internal/application using mocks for ports.
  • Integration: testcontainers-go Postgres for the postgres adapter.
  • Fuzz: fuzz the alias-generation function, persisting any crashing inputs.
  • Property: gopter test that "shorten then resolve returns original URL."
  • E2E: a make e2e target that spins the full stack via docker-compose, hits the HTTP API, asserts behavior.
  • All five surfaces run in CI under - race -count=1`.

Week 21 - Consensus Algorithms: Raft (and a Glance at Paxos) - Read Raft in Anger

  1. Read etcd-io/raft/node.go and raft.go end-to-end. Annotate the state machine transitions.
  2. Build a minimal in-memory KV store on top: a single goroutine consumes from node.Ready(), applies entries to a map[string]string, persists log entries to a WAL, sends messages to peers, and acknowledges.
  3. Run a 3-node cluster locally. Kill the leader; observe an election. Restart; observe log catchup.
  4. Add a snapshot mechanism every 10K entries.

Week 22 - Distributed Storage Patterns - Harden the KV Store

Take the week 21 Raft KV and add: 1. Pebble as the storage engine for both the WAL and the state machine. 2. Snapshots every N entries, with InstallSnapshot to recovering followers. 3. Linearizable reads via read-index. 4. Membership changes: add and remove nodes online. 5. Metrics: per-node Raft state, log lag, snapshot duration, apply latency.

Week 23 - Performance Tuning: Profile, Tune, Re-Profile - Profile-Tune-Profile

Take your capstone (whatever track) and: 1. Capture a CPU profile under representative load. Identify the top 5 functions. 2. Pick one and propose a fix. Estimate the win in advance. 3. Implement, re-profile, compare with benchstat. Document each change in PERF_LOG.md. 4. Capture a runtime/trace and identify any GC or scheduler stalls. Fix one. 5. Apply PGO. Confirm the win.

Week 24 - Capstone Integration, Defense, Final Hardening - Defend the Design

Schedule a 45-minute mock review with a senior peer (or record yourself). Present: - The architecture diagram. - One slide per non-obvious decision (e.g., "why etcd-io/raft over hashicorp/raft", "why Pebble over BoltDB", "why server-streaming over polling"). - A live demo of the test suite ( - race`, fuzzing, integration). - A live demo of the observability stack (Jaeger, Prometheus, pprof). - A live demo of fault tolerance (kill the leader, watch recovery).

The deliverable is the defense, not the slides. If you cannot answer "what is the worst-case write latency under leader change?" or "what is your goroutine count under 10× load?", you have not yet finished the curriculum.

Java Mastery

Week 1 - Modern Syntax and the Type System

Re-implement a small JSON-shaped expression evaluator: sealed interface Expr permits Lit, Add, Mul, Neg. Use record patterns + exhaustive switch. No instanceof chains, no visitors, no enum faking ADTs.

Week 2 - Build Tools, Dependencies, and JPMS

Take any small library. Modularize it: write module-info.java, run jdeps to find required modules, jlink --add-modules <list> --output runtime/ to build a custom runtime. Measure size before (du -sh $JAVA_HOME) vs after (du -sh runtime/). Then mvn install to a local repo and consume from a separate Gradle project via mavenLocal().

Week 3 - Collections, Streams, and java.time

Take a CSV file of timestamped events. Compute per-hour aggregates two ways: 1. Stream + Collectors.groupingBy(e -> e.timestamp().truncatedTo(HOURS), Collectors.counting()). 2. Explicit for loop + HashMap<Instant, Long>.

JMH them in Week 8. Also measure peak memory (-Xlog:gc*=info + young-gen allocation rate).

Week 4 - Testing, Logging, and the Definition-of-Done

Take Week 3's CSV aggregator (or any small evaluator). Write: - 10 unit tests (JUnit 5 + AssertJ). - 3 parameterized tests covering edge cases (empty input, single row, malformed timestamp). - 1 property-based test (@ForAll List<Event> events -> aggregator.process(events).size() <= events.size()). - Structured JSON logging at boundaries (input received, output produced).

Week 5 - Class Loading and Bytecode

Take a 10-line Java method. Compile it. Read the javap -v -p output line by line. Then build a class with the same bytecode using the Class-File API, load it via a custom ClassLoader, invoke via reflection, confirm output matches.

Week 6 - The JIT: C1, C2, Graal, Tiered Compilation

Write a polymorphic dispatch site (Shape.area() over 1 / 2 / 3 implementations). Run with -XX:+PrintInlining for each cardinality. Observe monomorphic → bimorphic → megamorphic transitions. Reproduce a deopt by introducing a 4th type after warmup.

Week 7 - Method Handles, VarHandles, and Reflection

Build a tiny DI container (~150 lines): scan a package for @Inject constructors, topologically sort by dependencies, instantiate with MethodHandle (lookup.unreflectConstructor(ctor).invokeExact(deps)). JMH vs Constructor.newInstance(deps). Expect MethodHandle ~5-10× faster after warmup. Stretch: use LambdaMetafactory to get within 1.5× of a direct call.

Week 8 - JMH and Microbenchmarking

JMH the Week 3 Stream vs for-loop comparison. Skeleton:

@State(Scope.Benchmark)
public static class Inputs {
    @Param({"100", "10000", "1000000"}) int n;
    List<Event> events;
    @Setup public void setup() { events = generate(n); }
}
@Benchmark public Map<Hour, Long> streamWay(Inputs in) { /* ... */ }
@Benchmark public Map<Hour, Long> forWay(Inputs in)    { /* ... */ }
Fork 3, warmup 5×1s, measurement 10×1s, -prof gc. There IS a crossover - find and explain it.

Week 10 - The Generational Hypothesis and the Modern GCs

Take a sample allocation-heavy app (a small JSON parser benchmark works). Run it under all four major GCs with identical heap. Collect logs with -Xlog:gc*=info:file=gc.log:time,uptime. Plot pause times.

Week 11 - Heap Sizing, GOMEMLIMIT-Equivalents, and Container Awareness

Take your week-10 app. Run it in a 1GB container with default flags, observe the headroom. Then explicitly size every memory pool and run again. Measure RSS over time.

Week 12 - JFR, Heap Dumps, and Allocation Profiling

Write a deliberate memory leak (a static Map that accumulates request contexts). Run it, take a heap dump after some traffic, identify the leak in MAT. Then fix it, re-run, re-dump, confirm.

Week 9 - Object Layout, Headers, and Cache Effects

Use JOL to measure the size of: an empty object, a String, a HashMap with 0/1/10/100 entries, an ArrayList vs LinkedList of 1000 Integers. Predict each before measuring.

Week 13 - The Java Memory Model and java.util.concurrent Foundations

Implement a single-producer single-consumer ring buffer two ways: with synchronized+wait/notify, and with VarHandle + acquire/release. JMH them. Run with -XX:+PrintAssembly (HotSpot debug build, or use the hsdis plugin on a normal build) and find the membar instructions.

Week 14 - Executors, CompletableFuture, and the Pre-Loom World

Implement the same web-scraper-with-fan-out three ways: blocking ExecutorService, CompletableFuture chain, Reactor Flux.flatMap. Compare lines of code, readability, and error handling. Keep them; you will re-do the same task with virtual threads next week.

Week 15 - Virtual Threads, Structured Concurrency, and Scoped Values

Redo week 14's web scraper with Executors.newVirtualThreadPerTaskExecutor() + StructuredTaskScope. Compare lines of code and readability to the three previous versions. Stress to 100k concurrent requests. Watch with JFR's virtual-thread events.

Week 16 - Lock-Free Patterns, VarHandle Memory Modes, and jcstress

Implement a Treiber stack with VarHandle.compareAndSet (single VarHandle on the head pointer). Write three jcstress tests: 1. Linearizability - concurrent push + pop produces an ordering consistent with some serial schedule. 2. No lost pops - every pushed element is popped exactly once. 3. ABA exposure - under contention, a pop-then-push cycle can corrupt a CAS; document the scenario even if you don't fix it (the standard fix is hazard pointers or versioned pointers).

Run under all available -m modes (default, sequential consistency, relaxed).

Week 17 - Spring Boot 3 and Quarkus

Build the same small REST service (book CRUD, JSON over HTTP, Postgres backend) in both Spring Boot 3 and Quarkus. Measure: - Build time (time ./mvnw package) - Image size (du -sh target/quarkus-app vs the Spring jar) - Cold start (time-to-first-response after launch) - Warm p99 latency under load (k6 or wrk, 100 RPS for 60s) - RSS at steady state (ps -o rss -p $(pgrep -f yourapp))

Document the trade in a one-pager.

Week 18 - Observability: Logs, Metrics, Traces

Wire all three pillars into your Week 17 service: - OTel agent: download opentelemetry-javaagent.jar, set OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317. - Micrometer Prometheus: add micrometer-registry-prometheus, expose /actuator/prometheus. - Logback JSON: replace the default text layout with LogstashEncoder + %X{trace_id} in the pattern.

Bring up a local stack via docker-compose: Prometheus + Tempo (traces) + Loki (logs) + Grafana. Generate load with k6 run script.js or wrk -t4 -c100 -d60s. In Grafana, find a slow request: open its trace, click through to the matching log lines via trace ID.

Week 19 - Persistence, RPC, and Resilience

Add to your week-17 service: a Postgres backend via Spring Data JDBC + Flyway, a downstream gRPC dependency (a fake "pricing service") with a Resilience4j circuit breaker, and 95th-percentile latency-bound retries. Chaos-test by killing the downstream and watching the breaker behavior in metrics.

Week 20 - Containers, Native Images, and Deployment

Produce four images of your service: plain JRE Dockerfile; jlink-trimmed; Buildpacks; native-image. Tabulate size, cold start, warm p99, RSS. Pick a winner for an explicit deployment profile (e.g., "always-on internal API" vs "scale-to-zero per-request webhook").

Kubernetes

Week 1 - etcd and the Raft Consensus Foundation - etcd, Up Close

  1. Bring up a 3-node etcd cluster locally (etcd binaries, no Kubernetes yet). Configure peer/client URLs.
  2. Use etcdctl to put/get keys; observe consistent reads.
  3. Kill the leader. Use etcdctl endpoint status --cluster to identify the new leader within seconds.
  4. Use etcdctl watch /foo from one terminal; put values from another. Internalize the watch model.
  5. Use etcdctl --command-timeout=60s defrag to compact + defragment. Observe disk-usage drop.

Week 2 - The kube-apiserver - Read the Pipeline

  1. Use kubectl --v=8 to dump the wire-level request/response of a kubectl apply. Read it carefully.
  2. Use kubectl get --raw to hit /apis/, /api/v1, /apis/apps/v1 and see the discovery surface.
  3. Configure the apiserver to log all requests with - -audit-policy-file=audit.yaml`. Apply a few changes; read the audit log.
  4. Write a tiny mutating webhook (in Go, using controller-runtime's webhook facilities) that adds a label to every Pod. Deploy and verify.

Week 3 - The Scheduler - Scheduler in Action

  1. Use kubectl describe on a pending Pod to see filter/score reasons.
  2. Set a Node taint (kubectl taint nodes node1 key=value:NoSchedule); observe new Pods avoid it.
  3. Define PriorityClasses (high, default, batch); deploy mixed-priority Pods; trigger preemption by oversaturating.
  4. Write a custom scheduler plugin (a tiny score plugin) using the scheduler framework. Configure your scheduler binary; run it. Verify selection difference vs default.

Week 4 - Built-in Controllers and client-go Foundations - Read the Deployment Controller

  1. Read pkg/controller/deployment/deployment_controller.go end-to-end (~1500 lines).
  2. Trace a kubectl rollout through the source: which conditions are checked, which fields updated, what triggers the next loop iteration.
  3. Reproduce a stuck-rollout scenario (deploy a bad image); observe Progressing=False after the deadline; inspect status conditions.
  4. Manually scale a Deployment to 0 with kubectl scale; trace what the controller does in response.

Week 5 - Kubelet Internals - Kubelet Forensics

  1. SSH to a node. journalctl -u kubelet -f and trigger a Pod creation. Watch the log.
  2. crictl ps, crictl pods, `crictl inspect - operate at the CRI layer directly.
  3. Place a static pod manifest; observe kubelet picking it up.
  4. Trigger a memory eviction by setting low evictionHard and oversubscribing. Read the eviction event and the kubelet's decision.

Week 6 - CRI: kubelet ↔ Runtime - CRI Direct

  1. From a node, crictl pull alpine; crictl runp pod-config.json; crictl create <pod-id> ctr-config.json img-config.json; crictl start <ctr-id>. You've launched a pod-equivalent without the apiserver.
  2. Compare with kubectl deploying the same: trace each CRI call in the kubelet log.
  3. Configure containerd with multiple runtimes (runc + runsc); register both as RuntimeClasses; deploy Pods against each.

Week 7 - kube-proxy, Services, and the Networking Dataplane - Service Path

  1. Create a Service + Deployment. From a Pod, curl <service>.<ns>.svc.cluster.local. Trace the DNS lookup (CoreDNS) and the iptables/IPVS rules that DNAT.
  2. Switch kube-proxy to IPVS mode (mode: ipvs in kube-proxy config). Verify with ipvsadm -L -n.
  3. Install Cilium with kubeProxyReplacement=true. Observe kube-proxy not running. Verify Service connectivity still works.
  4. Compare per-packet latency under each mode with a small benchmark.

Week 8 - CSI, Storage, and Device Plugins - Storage Hands-On

  1. Install a local-path CSI driver (rancher/local-path-provisioner works for kind). Create a PVC; observe binding.
  2. Take a snapshot; restore to a new PVC.
  3. Author a mock device plugin that exposes 4 instances of a fake resource. Deploy a Pod requesting it; verify scheduling and resource accounting.
  4. Read the CSI proto. Diagram the provision + attach + mount flow on paper.

Week 10 - controller-runtime and Kubebuilder - Rebuild Week 9 in controller-runtime

Take week 9's mirror controller; rebuild with kubebuilder + controller-runtime. Compare LOC and verbosity. The framework should save substantial code.

Week 11 - CRDs: Schema, Versioning, Validation - A Well-Versioned CRD

  1. Define a CRD with v1alpha1.
  2. Add validation, defaults, status conditions, printer columns.
  3. Add a v1beta1 with renamed fields and a conversion webhook between them.
  4. Verify round-trip: kubectl get -o v1alpha1 then - o v1beta1` returns identical content.

Week 12 - Operator Patterns: Finalizers, External Resources, Multi-Cluster - An Operator That Manages an External Resource

Build an operator with a GitHubRepo CRD: spec includes a repo name and visibility; the controller calls the GitHub API to create/update/delete the repo to match. Includes: - Authentication via a Secret referenced by the CR. - Finalizers for cleanup. - Status conditions: Ready, Synced, Error with reasons. - Rate-limited reconciles with exponential backoff. - E2E test using a fake GitHub API server.

Week 9 - client-go Internals and a Bare Controller - Controller From Scratch

Build a controller that watches ConfigMaps with the label mirror=true and copies them into every namespace whose name matches a configurable prefix. - Use client-go informers + workqueue directly. - Add leader election. - Idempotent: same input twice produces same result. - Handle deletions: when the source is deleted, delete all mirrors. - Run as a Deployment in the cluster.

Week 13 - The CNI Spec and Pod Networking - Read a CNI's Source

  1. Pick a simple CNI (flannel or the reference bridge plugin from containernetworking/plugins). Read its cmdAdd end to end.
  2. Deploy a small kind cluster; trace a Pod creation in the kubelet log; correlate with the CNI binary invocation.
  3. Use nsenter -t <pause-pid> -n ip a to inspect the container's network namespace from the host.

Week 14 - Cilium and eBPF Networking - Install and Drive Cilium

  1. Install via Helm with: kubeProxyReplacement=true, hubble.enabled=true, hubble.relay.enabled=true, hubble.ui.enabled=true, encryption.enabled=true, encryption.type=wireguard.
  2. Use the Hubble UI (cilium hubble ui) to visualize pod-to-pod traffic in real time.
  3. Author L4 NetworkPolicy (standard k8s API); test enforcement with a denied + allowed flow.
  4. Author an L7 CiliumNetworkPolicy (e.g., allow only HTTP GET /api/* from frontend → backend); test enforcement.
  5. Enable Cilium Service Mesh; observe sidecar-free mTLS between two test services.

Week 15 - Service Meshes: Istio, Linkerd, Cilium Service Mesh - Three Meshes

  1. Install Istio in ambient mode on a test cluster. Apply a VirtualService that does 90/10 canary routing. Verify with Hubble or Kiali.
  2. Repeat with Linkerd. Compare install footprint, configuration ergonomics, and observability quality.
  3. (If running Cilium) enable Cilium Service Mesh. Compare again.
  4. Document tradeoffs: install effort, per-Pod overhead, feature gaps.

Week 16 - CSI at Scale: Snapshots, Backup, Cloning - Backup and Restore

  1. Install Velero against a MinIO bucket.
  2. Schedule a daily backup of one namespace.
  3. Delete the namespace; restore from backup; verify Pods come back, PVs reattach, data intact.
  4. Create a stateful workload (Postgres via an operator); test snapshot + clone flow for fast dev/test environment provisioning.

Week 17 - GitOps: ArgoCD and Flux - Two GitOps Stacks

  1. Install ArgoCD. Set up an Application for a small app from a git repo. Verify auto-sync and auto-prune.
  2. Install Flux. Set up the equivalent. Compare ergonomics.
  3. Use ApplicationSet (Argo) to deploy the same app to three environment overlays (dev, staging, prod). Verify per-environment configuration via Kustomize overlays.

Week 18 - IaC From Within K8s: Crossplane and Terraform - Self-Service Database

  1. Install Crossplane. Install provider-aws (or provider-gcp).
  2. Configure provider credentials.
  3. Define an XRD XDatabase with parameters: size, engine, version, region.
  4. Define a Composition that materializes an RDS instance + a Secret with credentials.
  5. As an "app team" persona, create a Database claim. Watch it become a real RDS instance. Delete; watch it be torn down.

Week 19 - HPA, VPA, KEDA: Autoscaling - Autoscale on Custom Metrics

  1. Deploy a load-test target with a Prometheus-exposed requests_per_second metric.
  2. Install prometheus-adapter mapping that metric to custom.metrics.k8s.io.
  3. Author HPA targeting AverageValue=200 of that metric. Drive load; watch scaling.
  4. Add KEDA in front for scale-to-zero behavior. Verify cold-start latency.

Week 20 - Admission Control: Webhooks, OPA Gatekeeper, Kyverno - Three Policy Layers

  1. Apply Pod Security Admission per-namespace: restricted everywhere except a priv namespace.
  2. Author 5 Gatekeeper Constraints: require resource limits, forbid latest tags, enforce non-root, label-required, namespace-must-have-team-label.
  3. Author equivalents in Kyverno. Compare expressiveness.
  4. Run in audit-mode for a week against a pre-existing cluster; triage findings before enforcing.

Week 21 - Bootstrap: VMs, Certificates, etcd - Bring Up etcd

  1. Provision 3 VMs labeled etcd-{1,2,3}.
  2. Generate CA + per-node certs.
  3. Install etcd binaries; configure systemd units with mTLS.
  4. Bring up; verify etcdctl member list shows healthy quorum.
  5. Take a snapshot. Restore on a separate test machine.

Week 22 - Control Plane and Worker Nodes - Cluster Live

  1. Bring up 3 control-plane nodes; HAProxy in front.
  2. Bring up 3 workers; join via bootstrap tokens.
  3. Install Cilium; verify Pod-to-Pod connectivity.
  4. Install CoreDNS; verify Service DNS works.
  5. Smoke test: deploy a sample app + Service + Ingress; verify end-to-end.

Week 23 - RBAC, Multi-Tenancy, mTLS Everywhere - Onboard a Tenant

  1. Author a tenant Composition (Crossplane) or Helm chart that, given {tenant: "acme"}, materializes everything in §23.2.
  2. Onboard acme. Have a "tenant developer" persona deploy an app via GitOps.
  3. Verify isolation: from acme's namespace, can you read another tenant's secrets? Pods? Logs? Each should fail.

Week 24 - Defense, Documentation, and the Capstone Demo - Defend the Cluster

Schedule a 60-minute mock review. Demo: 1. The architecture diagram. 2. Provisioning (Ansible/Terraform/Crossplane). 3. Tenant onboarding from request to running app. 4. Failure injection: kill a control-plane node; show cluster recovery. 5. Observability: trace a request from ingress through service mesh to backend, with metrics, logs, and trace ID correlation. 6. Backup + restore.

Linux Kernel

Week 1 - Boot, Init, Systemd - A Hardened Echo Service

  1. Write a tiny C program that listens on a Unix socket and echoes input. Static-link with - static`.
  2. Write a echo.socket and echo.service pair using socket activation.
  3. Apply every hardening directive that is plausible for an echo server. Run systemd-analyze security echo.service and aim for a score under 1.0.
  4. Verify isolation: from inside the service (debug via systemd-run --shell --unit=echo.service), confirm ProtectSystem makes /usr read-only.

Week 2 - Syscalls and the Kernel/Userspace Boundary - Syscall Forensics

  1. `strace -c ls /etc - produce a count summary of syscalls. Predict the top 5; verify.
  2. Implement cat in pure C using only open, read, write, close. No libc helpers (syscall(SYS_open, ...)).
  3. Run under strace -f to verify zero unexpected calls.
  4. Build a minimal seccomp allowlist for your cat, allowing only the syscalls actually used. Verify it kills attempts to invoke other syscalls.

Week 3 - The Virtual File System (VFS) - VFS Forensics

  1. Catalogue every entry in /proc/<pid>/ for one of your processes. Document what each gives.
  2. Read /proc/<pid>/maps and explain every region (text, heap, stack, vdso, vvar, shared libs).
  3. Use eBPF's vfs_open kprobe (via bpftrace) to log every open system-wide for 5 seconds. Triage the noise.
  4. Mount tmpfs at a custom path, fill it, and observe the allocator behavior in /proc/meminfo (Shmem).

Week 4 - Processes, Threads, and Signals - Process Forensics

  1. Write a C program that forks 4 children, each computing for 5 s. Use ptrace or strace -f to observe all four.
  2. Add a signal handler that catches SIGTERM and logs cleanly to all children before exit.
  3. Reproduce a classic bug: a parent that ignores SIGCHLD and a child that exits, producing zombies. Verify with ps -ef | grep defunct.
  4. Convert to signalfd + `epoll - the modern signal-handling pattern that integrates with event loops.

Week 5 - Virtual Memory, Paging, and the Page Cache - Memory Forensics

  1. Run vmstat 1 and free -h while loading a 4-GB file with cat file > /dev/null. Watch Cached grow.
  2. echo 3 > /proc/sys/vm/drop_caches and observe the eviction.
  3. mmap a large file MAP_PRIVATE, write to it, observe AnonHugePages and the COW behavior in /proc/<pid>/smaps.
  4. Configure vm.nr_hugepages=512 (1 GiB of 2 MiB pages). Allocate via MAP_HUGETLB. Measure the latency-distribution change vs default pages.

Week 6 - Swapping, OOM, Memory Pressure (PSI) - Pressure and the OOM Killer

  1. Write a memory-eater program. Run inside a memory.high=512M cgroup. Observe pressure.memory rise.
  2. Push past memory.max; watch the OOM killer. Check dmesg and journalctl -k | grep -i 'killed process'.
  3. Set oom_score_adj=-500 on a critical process; verify it survives an OOM event triggered by another, lower-priority hog.
  4. Measure PSI under realistic load: capture pressure.memory every second for 5 minutes during a workload spike. Plot.

Week 7 - The CPU Scheduler (CFS, EEVDF) - Scheduler Forensics

  1. Run two CPU hogs at nice 0. Observe split CPU. Lower one to nice 19, verify ~95/5 split.
  2. Use bpftrace -e 'tracepoint:sched:sched_switch { @[comm] = count() }' to see context-switch rates.
  3. Pin a workload to specific CPUs with taskset -c 0,1. Compare cache-miss rate vs unpinned with perf stat.
  4. Place two services in cgroups with cpu.weight=100 and cpu.weight=1000. Verify the 10:1 split under contention.

Week 8 - Disk I/O Scheduling, Filesystems Beyond ext4 - I/O Forensics

  1. Run fio with a representative workload. Measure baseline.
  2. Toggle the I/O scheduler. Re-run. Compare.
  3. Use bpftrace -e 'tracepoint:block:block_rq_issue { @[args->comm] = count() }' to see who's hitting the disk.
  4. Mount with vs without noatime and measure metadata-write traffic difference.

Week 10 - Control Groups v2 - Multi-Tenant Cgroups

  1. Create three sibling cgroups: tenant-a, tenant-b, tenant-c under /sys/fs/cgroup/test/.
  2. Set cpu.weight 100/200/400 - under contention (run stress-ng --cpu N in each), verify the 1:2:4 split with top.
  3. Set memory.high=1G memory.max=2G on each, run a memory hog (stress-ng --vm 1 --vm-bytes 3G), observe throttling first (memory.events.high ticks, latency increases) then OOM (memory.events.oom_kill ticks).
  4. Set io.max to limit disk bandwidth on a specific device for one cgroup; run fio inside, verify with iostat -x 1.

Week 11 - eBPF: Foundations - First eBPF Tools

  1. Install bpftrace. Run bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)) }' and watch the system-wide open trace. Triage.
  2. Write a bpftrace script that histograms read() syscall sizes by process.
  3. Convert one of the recipes to libbpf C + a userspace consumer using libbpf-bootstrap as the template.
  4. Read 10 of Brendan Gregg's bpftrace recipes (runqlat.bt, tcpaccept.bt, vfsstat.bt, etc.) and run them. Document each.

Week 12 - eBPF in Production: Observability Tools - Build a Production-Grade eBPF Tool

Write connsnoop: - Hooks tcp_v4_connect and tcp_v6_connect (kprobe), inet_csk_accept (kretprobe), tcp_close. - Records per-connection: 5-tuple, PID, process name, duration, bytes-tx/rx. - Aggregates in-kernel via per-CPU hash maps, ships completion events through a ring buffer. - Userspace consumer in C (with libbpf) or Go (with cilium/ebpf). Outputs JSON. - Verifier-clean, CO-RE-portable across kernels 5.10+.

Week 9 - Namespaces - Hand-Built Container

Write a C program that: 1. clone()s with CLONE_NEWUSER | CLONE_NEWPID | CLONE_NEWNS | CLONE_NEWUTS | CLONE_NEWIPC | CLONE_NEWNET | CLONE_NEWCGROUP. 2. Configures UID/GID mappings via /proc/<pid>/uid_map and gid_map. 3. Creates a veth pair to give the namespace network access. 4. pivot_roots into a minimal Alpine rootfs. 5. execves /bin/sh.

You should now have a working terminal "inside" a "container" that you wrote in ~150 lines of C.

Week 13 - The Network Stack: Sockets, NAPI, conntrack - Packet Forensics

  1. `tcpdump -i any -nn -X 'tcp port 443' -c 10 - capture and dissect TLS handshake bytes.
  2. Trace a TCP connection's lifecycle with bpftrace's tcplife.bt.
  3. Set up a gratuitous DROP rule with iptables -I INPUT -p icmp -j DROP and verify with ping. Remove. Repeat with nft.
  4. Inspect conntrack: cat /proc/net/nf_conntrack while a long-lived connection is open.

Week 14 - Netfilter / nftables / iptables, IPVS - Build a Stateful Firewall and a Load Balancer

  1. Convert an existing iptables ruleset to nftables. Verify equivalence with packet probes.
  2. Set up IPVS-DR: VIP with two real servers; load test with wrk. Compare with HAProxy on the same setup.
  3. Saturate the conntrack table on purpose; observe nf_conntrack: table full, dropping packet in dmesg. Tune nf_conntrack_max.

Week 15 - XDP and AF_XDP - An XDP DDoS Scrubber

Write an XDP program that: - Drops UDP packets with source port < 1024 (a coarse DDoS-vector heuristic). - Counts dropped packets per source IP in an LRU-hash map (1M entries). - Userspace tool reads the map every second and emits Prometheus metrics. - Test with pktgen or trafgen. Measure throughput and CPU overhead.

Week 16 - Bridges, VLANs, OVS - Three Network Topologies

  1. Two namespaces connected via a Linux bridge: classic container networking.
  2. Two namespaces on tagged VLANs sharing one bridge.
  3. The same topology in OVS, with explicit OpenFlow rules.

For each, verify connectivity with ping, capture with tcpdump, document the difference.

Week 17 - Discretionary and Mandatory Access Control - MAC for an Echo Service

Take week 1's echo service. Author: 1. An SELinux type-enforcement module that allows it to bind its socket and read its config but nothing else. 2. An equivalent AppArmor profile. Verify with deliberate violations (try to read /etc/shadow)-both should deny and audit.

Week 18 - Capabilities, Seccomp, no_new_privs - Capabilities and Seccomp

  1. Convert a service that runs as root to one that runs as a non-root user with only the minimum capabilities.
  2. Author a seccomp policy using libseccomp that allows only the syscalls the service uses. Verify by attempting denied syscalls.
  3. Apply via systemd SystemCallFilter= and confirm.

Week 19 - Encryption at Rest: LUKS, dm-crypt, dm-verity - Encrypt a Disk End to End

  1. Create a LUKS2 volume on a spare disk or loopback file.
  2. Format with ext4. Mount.
  3. Add a TPM2-bound key slot. Enroll a recovery passphrase.
  4. Configure auto-unlock at boot via crypttab.
  5. Simulate disk theft: dump the device contents; verify they are opaque without the key.

Week 20 - Audit, Integrity Measurement, and Compliance - An Audited Host

  1. Configure auditd with a baseline ruleset.
  2. Trigger expected events (failed su, edit of /etc/passwd); verify logs.
  3. Run `lynis audit system - record the score and address the top 5 findings.
  4. (Optional) Boot with IMA enabled; measure the impact on boot time and observe /sys/kernel/security/ima/ascii_runtime_measurements.

Week 21 - Loadable Kernel Modules (LKM) - A Character Device LKM

Write pkv, a simple in-kernel key/value character device: - /dev/pkv accepts writes of the form key=value\n and reads return the value for the last-written key. - 100 KV slots, in-kernel hash table. - Concurrency-correct under multiple writers/readers (use a mutex; pursue an rwlock variant as a stretch). - ioctl operations for LIST and DELETE. - KUnit tests in tree. - Loads/unloads cleanly with no lockdep or KASAN warnings (turn both on in your test kernel).

Week 22 - Tracing and Performance Mastery: ftrace, perf, BPF - End-to-End Profiling

  1. Take a service running on a host. Capture: perf record -F 99 -ag -- sleep 30.
  2. Generate a flamegraph.
  3. Identify the top three CPU consumers; for each, propose a hypothesis and a fix.
  4. Compare with the same workload profiled by parca or pyroscope if available.

Week 23 - Performance Tuning at Scale - Triage Drill

A scripted "broken host" is provided (or build one): a VM with one of {disk-bound, memory-bound, network-bound, lock-contended, scheduler-thrashing} pathologies. Diagnose using only the tools above. Document the inference chain. Then introduce a fix and verify.

Week 24 - Capstone Integration & Defense - Defend the Host

Schedule a 45-minute mock review with a peer. Walk through: the host's threat model, the capstone artifact, the observability story, and a live demo of triaging a fault. Defend every choice-cgroup policy, LSM type, sysctl values, auditd rules.

Python Mastery

Week 1 - Syntax, Values, Names, and the Data Model - The REPL Audit

  1. In an interactive session, evaluate: a = [1,2,3]; b = a; b.append(4); print(a). Explain in writing.
  2. x = 256; y = 256; x is y → True. x = 257; y = 257; x is y → may be False. Explain.
  3. Write a class Money with __init__, __repr__, __eq__, __hash__, __lt__. Verify it sorts and deduplicates in a set. Add functools.total_ordering; observe what disappears.
  4. Write a class Vector2 with __add__, __sub__, __mul__ (scalar), __rmul__, __abs__, __iter__, __len__. Verify 2 * v works and list(v) works.

Week 2 - Control Flow, Functions, Errors, and the Call Model - The Calculator and the Cancel

  1. Build a tiny expression evaluator over + - * / using ast.parse + a custom NodeVisitor. Reject anything else. (Do not use eval.)
  2. Add a --repl mode. Make Ctrl-C interrupt the current expression but not exit. Make Ctrl-D exit cleanly.
  3. Wrap division; raise a custom EvalError chained from ZeroDivisionError via from.
  4. Add a --time-budget flag using signal.SIGALRM (POSIX) or a watchdog thread (cross-platform). Document the trade-off.

Week 3 - Collections, Comprehensions, Iterators, and Generators - Streaming Word Count

  1. Implement wc -w over arbitrarily large files using a generator pipeline: file → lines → words → counts. Constant memory regardless of file size.
  2. Add a --top K flag using heapq.nlargest. Note that you must materialize the counter - discuss why.
  3. Replace your hand-rolled tokenizer with re.finditer and benchmark. Then benchmark a str.split() version. Explain the difference.
  4. Add a --parallel N flag using concurrent.futures.ProcessPoolExecutor and itertools.batched. (We will revisit in Month 4.)

Week 4 - Modules, Packaging, Virtual Environments, and the Import System - Ship a CLI

  1. Build a CLI tool - e.g., a Markdown table of contents generator. Project layout: src/toctool/{__init__.py,__main__.py,cli.py,core.py}, tests/, pyproject.toml.
  2. Configure [project.scripts] toctool = "toctool.cli:main". Verify pipx install . makes toctool available system-wide.
  3. Add a [project.optional-dependencies] dev = [...] group. uv sync --extra dev installs the dev tools.
  4. Tag v0.1.0. Build wheel + sdist with uv build. Inspect the wheel with unzip -l. Confirm no test files leaked in.
  5. (Optional, sets up later weeks) Publish to TestPyPI.

Week 5 - Object Model Deep Dive: Classes, Descriptors, Metaclasses - Build a Tiny ORM

  1. Implement a Field descriptor with type validation and a default. class User: name = Field(str); age = Field(int, default=0).
  2. Use __init_subclass__ to collect declared fields into cls._fields. Auto-generate __init__ and __repr__.
  3. Compare your hand-rolled version to @dataclass(slots=True). Note where dataclass is better (PEP-595 ordering, __eq__, __hash__).
  4. Implement a RegistryMeta metaclass that records every subclass in a class-level dict. Then re-implement using __init_subclass__. Defend the simpler version in writing.

Week 6 - Decorators, functools, and contextlib - The Retry Decorator That Doesn't Lie About Its Type

  1. Write @retry(times=3, on=(IOError,), backoff=0.1). Make it work on both sync and async functions (detect with asyncio.iscoroutinefunction).
  2. Use ParamSpec so that pyright --strict preserves the wrapped signature.
  3. Add structured logging on each retry. Add a tenacity-style backoff strategy (constant, exponential, jittered).
  4. Compare to tenacity library; document where yours is simpler / worse / better.

Week 7 - Dataclasses, attrs, Pydantic, and the Validation Boundary - The Three-Layer Cake

  1. Build an HTTP service (FastAPI, but kept small):
  2. Boundary layer: Pydantic RequestModel / ResponseModel.
  3. Domain layer: @dataclass(slots=True, frozen=True) value objects.
  4. Persistence layer: TypedDict rows from sqlite3.
  5. Write explicit converters between each layer. Resist the urge to make them the same type.
  6. Benchmark a 10k-request loop with Pydantic v1 (if installed) vs. v2. Document the 10x.

Week 8 - The Type System: Generics, Protocols, Variance, and typing.* - Make Pyright Strict

  1. Take a 500-LOC module of your existing code. Run pyright --strict. Resolve every error.
  2. Add a Protocol for a "thing-with-an-id" and refactor a function that previously took Any.
  3. Use TypeIs to narrow dict | list returned from json.loads into safe shapes for downstream use.
  4. Where you find yourself reaching for cast, document why and consider whether the boundary belongs at a Pydantic model.

Week 10 - Memory: Refcounts, Cyclic GC, the pymalloc Allocator - Find the Leak

  1. Write a service that has a deliberate leak: an unbounded dict cache, a leaking closure, and a circular reference with a __del__. Run under memray and tracemalloc. Identify each leak from the output.
  2. Bound the cache with functools.lru_cache(maxsize=...). Confirm with memray that growth flatlines.
  3. Profile a NumPy-heavy workload. Observe that pymalloc and Python refcounts are largely unused - most memory is in NumPy buffers. Internalize: "NumPy is a different memory world."

Week 11 - The GIL, Free-Threaded Python, and the Concurrency Model - GIL Awareness

  1. Compute primes up to 1M three ways: (a) single thread, (b) threading with 8 threads, (c) multiprocessing with 8 procs. Bench all three on stock CPython.
  2. Run (b) on python3.13t (free-threaded). Compare.
  3. Replace the prime-test inner loop with a NumPy expression. Re-run (b) on stock CPython. Note the GIL-release effect.
  4. Capture py-spy record flame graphs for each. Identify GIL contention visually.

Week 12 - The Optimization Ladder: Algorithm → Vectorize → Native → JIT - Climb the Ladder

Take a deliberately slow workload - e.g., compute pairwise cosine similarity between 10k 768-dim vectors with a pure-Python triple loop. Time it. Then climb: 1. Algorithmic: skip pairs already computed. 2. Vectorize: NumPy batched matmul with norm. 3. Cython rewrite of the inner kernel. 4. Numba @njit on the same. 5. (Stretch) Rust + PyO3 implementation. 6. Compare to faiss / hnswlib.

Tabulate speedups in NOTES.md. The lesson is that step 2 usually wins by 100x and step 3+ by ~2x more - but step 6 (use the right library) wins by 1000x. Algorithm > implementation > tuning.

Week 9 - The CPython VM: Objects, Bytecode, the Eval Loop - Bytecode Forensics

  1. Write three implementations of "sum of squares": a for loop, a sum() + genexp, and numpy.dot(a, a). dis.dis each. Benchmark with timeit. Explain the gap.
  2. Take a function with a global lookup in its hot loop. Refactor to a default-argument cache. Re-bench. Quantify the win.
  3. Use sys.setprofile to count opcode-level events on a small program. Compare counts before and after warm-up to observe specialization.

Week 13 - asyncio Foundations: Event Loop, Tasks, Coroutines - The Crawler That Doesn't Lie

  1. Build an async HTTP crawler with httpx.AsyncClient and a TaskGroup. Limit concurrency with a Semaphore(N).
  2. Add a 5-second per-request timeout using asyncio.timeout. Verify cancellation propagates cleanly to the httpx request.
  3. Inject a deliberately blocking time.sleep(2) somewhere. Detect it with asyncio.get_event_loop().slow_callback_duration = 0.1 and the resulting log warnings.
  4. Replace the blocker with asyncio.sleep. Confirm via py-spy dump that the loop never stalls.

Week 14 - Structured Concurrency, Cancellation, ExceptionGroups, anyio - The Fan-Out That Cleans Up After Itself

  1. Refactor your week-13 crawler to use TaskGroup (or anyio task group).
  2. Add a "first-error wins" mode: as soon as any task raises, all siblings are cancelled and the group raises an ExceptionGroup.
  3. Add a "best-effort" mode: collect all results and exceptions, return both.
  4. Verify via test that cancelling the parent cancels every in-flight HTTP request within 100ms.

Week 15 - Threads, Processes, Subinterpreters, concurrent.futures - Pick Your Parallelism

For each workload, pick a model and justify: 1. Compress 10k JPEGs in parallel. 2. Run 10k HTTP requests against an external API (rate-limited). 3. Compute SHA-256 of 10k 1MB blobs. 4. Train 10 small models concurrently sharing a GPU.

Implement at least two of them three ways: threads, processes, asyncio. Bench. Write up the right answer.

Week 16 - Native Extensions, Releasing the GIL, FFI - Write the Hot Kernel in Rust

  1. Take the cosine-similarity workload from week 12. Implement it in Rust with PyO3.
  2. Use py.allow_threads(|| ...) around the SIMD loop. Verify with a Python ThreadPoolExecutor(8) that you get ~8x speedup.
  3. Compare to NumPy and to your Cython version. Write up the cost in code complexity.
  4. Bonus: expose a Vector #[pyclass] and benchmark crossing the FFI per-call vs. per-batch. Internalize the per-call FFI cost.

Week 17 - Pythonic Design Patterns - Refactor a Junk Drawer

  1. Take a 1k-LOC script of mixed responsibilities. Extract: domain/, adapters/, service/, entrypoints/. Write Protocols for the seams.
  2. Add a fake repository for tests; the real one talks to SQLite. Run the same test suite against both.
  3. Document, in a docs/architecture.md, why each module exists and what it depends on.

Week 18 - Data Structures Beyond list/dict - Right Tool for Right Workload

  1. A leaderboard with frequent insert + top-K query: implement with list (naive), heapq (better), SortedList (best). Bench at 10k/100k/1M elements.
  2. A rolling-window deduplicator: set (memory-unbounded), Bloom filter (memory-bounded, false positives), cachetools.TTLCache. Pick one with justification.
  3. A nearest-neighbor lookup over 1M 768-dim vectors: brute-force NumPy, hnswlib, faiss. Note recall/latency trade-offs.

Week 19 - Testing, Property-Based Testing, Mutation Testing, Fakes vs. Mocks - The Tests Find Bugs You Didn't Know You Had

  1. Add hypothesis property tests to your week-3 word counter. Watch them find a UTF-8 boundary bug or an empty-input issue.
  2. Add a stateful hypothesis test against your tiny ORM from week 5.
  3. Run mutmut. Identify untested branches.
  4. Replace any Mock you used with a fake implementing a Protocol.

Week 20 - Observability, FastAPI, Production Service Shape - Production-Shaped Service

Build a FastAPI service that: 1. Accepts a POST /jobs, persists to SQLite, returns a job ID. 2. Processes jobs in an asyncio.TaskGroup background worker with bounded concurrency. 3. Emits structured JSON logs with trace correlation. 4. Exposes /metrics (Prometheus) and /healthz//readyz. 5. Handles SIGTERM by draining in-flight jobs. 6. Runs under uvicorn with --workers 4 (multi-process). Document why workers > 1 for CPU-light I/O-bound services on stock CPython. 7. Has a docker-compose stack including Prometheus, Grafana, and Jaeger. 8. Has a k6 or locust load test in loadtest/ reproducing the latency SLO.

Week 21 - LLM-App Foundations: Prompts, Tokens, Streaming, Cost - A Disciplined LLM Client

  1. Build an LLMClient abstraction over anthropic and openai async SDKs. Methods: generate, stream, with_tools.
  2. Add token accounting: pre-call estimate, post-call actual, running cost meter.
  3. Add caching headers (Anthropic prompt caching). Measure latency delta.
  4. Add structured-output mode using instructor + a Pydantic schema. Test on a deliberately ambiguous prompt; observe schema enforcement.
  5. Add timeout, retry-with-backoff, and circuit breaker (pybreaker or hand-rolled).

Week 22 - Retrieval-Augmented Generation: Doing It Properly - End-to-End RAG with Honest Evals

  1. Pick a corpus (your own docs, a Wikipedia subset, or a publicly available QA dataset). Ingest with at least two chunking strategies.
  2. Stand up pgvector or qdrant. Index with two embedding models.
  3. Implement hybrid retrieval (dense + BM25 + RRF) and add a reranker.
  4. Build a 50-question gold eval set with reference answers. Score with ragas. Iterate retrieval until faithfulness > 0.85.
  5. Plot the impact of each pipeline change in a results table. Resist the urge to tune blindly.

Week 23 - Agents, Tools, Durable Execution, Cost & Safety - An Agent That Doesn't Burn Money

  1. Build a research agent: takes a question, plans, calls web_search and fetch_url tools, synthesizes an answer with citations.
  2. Add: max-turns=10, max-tokens=200k, max-wall-time=120s, max-cost=$0.50. Verify each cap fires correctly.
  3. Persist agent state (turn-by-turn) to Postgres. Recover after a kill -9.
  4. Write replay tests: feed a saved trace to a test, mock the LLM, assert tool calls happen in the right order.
  5. Add an evaluator-optimizer loop: a critic LLM grades the answer; if score < threshold, revise once.

Rust Mastery

Week 1 - The Toolchain and the Compiler Pipeline - Hello World, Audited

  1. Create hello-audited. Pin a specific stable toolchain via rust-toolchain.toml.
  2. Build with - -release. Runobjdump -h target/release/hello-auditedand identify the.text,.rodata,.data,.bss, and.eh_frame` sections.
  3. Strip with strip -s and compare binary sizes. Now rebuild with RUSTFLAGS="-C strip=symbols -C panic=abort" and compare again.
  4. Document the size delta from each flag in NOTES.md. You should observe .eh_frame shrinking dramatically when panic=abort is set-explain why.

Week 2 - Memory Layout: Stack, Heap, Data, BSS, TLS - Layout Forensics

Build a binary that allocates one value of each "kind": - a stack [u8; 64], - a Box<[u8; 64]>, - a static FOO: [u8; 64] = [0xAB; 64];, - a static mut BAR: [u8; 64] = [0; 64];, - a thread_local! RefCell<[u8; 64]>.

Print the address of each (&value as *const _ as usize). Run under cat /proc/self/maps (spawn cat from inside the program) and prove which segment each address falls in. Write up the mapping in NOTES.md.

Week 3 - Ownership, Borrowing, and Region Inference - Defeat the Borrow Checker, Then Submit

You will be given (as exercise files) ten programs that the borrow checker rejects. For each: 1. Predict which rule is violated before reading the diagnostic. 2. Fix it three different ways (e.g., scope shrinking, split borrow, Cell/RefCell). 3. Pick the idiomatic fix and justify it in a one-line comment-but only if the comment captures non-obvious reasoning. (See feedback rule on comments.)

Week 4 - The Error Model - A Library With Two Faces

Build parse-units: a small crate that parses strings like "3.5 GiB" into a structured Quantity. Requirements: - Public API returns Result<Quantity, ParseError> where ParseError is a thiserror enum with at least four variants. - Internally, use ? to compose. No unwrap allowed except in unit tests. - Provide a binary parse-units-cli that uses anyhow and prints rich context with .with_context(|| ...). - Ship 100% line coverage measured by cargo-llvm-cov.

Week 5 - Advanced Lifetimes, Variance, and HRTBs - A Lending Iterator

Implement a WindowsMut lending iterator that yields overlapping &mut [T] windows over a slice. This requires GATs. Property-test it against a naive O(n²) reference implementation.

Week 6 - Traits, Coherence, and Monomorphization - Bloat Forensics

  1. Write a generic function fn process<T: Display>(items: &[T]) -> String that formats and concatenates. Instantiate it with five distinct types in a binary.
  2. Run cargo bloat --release --filter process and confirm there are five symbols.
  3. Refactor to a dyn Display version (&[&dyn Display]). Re-run cargo bloat. Document the binary-size delta and the codegen tradeoff.
  4. Now read the disassembly of the dyn version with cargo asm and identify the indirect call.

Week 7 - Smart Pointers and Interior Mutability - Build a Tracing Rc

Implement TracingRc<T> from scratch using UnsafeCell and NonNull. It must: - Refcount strong and weak references correctly (study std::rc for the algorithm). - Log every clone/drop to a thread-local trace buffer. - Pass Miri (cargo +nightly miri test)-meaning your unsafe code is provably free of undefined behavior under the stacked-borrows model.

Week 8 - Drop, the Drop Checker, and Destructor Discipline - Resource Acquisition Is Initialization

Build a FileLock type wrapping flock(2): - On construction, acquire an advisory lock. - On Drop, release it. Even on panic. - Provide a try_lock constructor returning Result<FileLock, std::io::Error>. - Add a test that asserts the lock is released after a panic by spawning a child process that panics while holding the lock and observing in the parent that the lock can be re-acquired.

Week 10 - Channels, Lock-Free Patterns, and loom - An SPSC Ring Buffer

Implement a fixed-capacity SPSC ring buffer: - Two AtomicUsize indices (head, tail), each on its own cache line. - push and pop use Acquire/Release ordering pairs. - Validate under loom with at least 4 elements and 3 pushes/pops. - Benchmark against rtrb with criterion. You should be within 2× on x86_64.

Week 11 - Async Foundations: Future, Pin, Unpin, the State Machine - An Async Channel From Scratch

Implement a single-shot async oneshot channel: - Sender<T> has send(self, T). - Receiver<T> is Future<Output = Result<T, Cancelled>>. - Use a single Mutex<State> and a Waker slot. - Test with both tokio::test and smol::block_on. The result must be runtime-agnostic.

Week 12 - Runtimes: Tokio Internals, Smol, Embassy - Roll-Your-Own Mini Executor

Build a single-threaded executor in ~150 lines: - A VecDeque<Arc<Task>> ready queue. - Task holds a Mutex<Pin<Box<dyn Future>>> and implements ArcWake (or Wake on stable). - block_on polls the root future; auxiliary spawn adds tasks. - Run a small TCP echo server on top using polling (the same crate Smol uses) for I/O.

Week 9 - Threading, Send, Sync, and the Memory Model - A Correct Spinlock

Implement Spinlock<T> from scratch using AtomicBool: - lock() spins with Relaxed load, then Acquire CAS. - unlock() Release stores false. - Returns a SpinlockGuard<'_, T> whose Drop unlocks. - Verify with loom (run all interleavings-see week 10) that no two threads enter the critical section.

Week 13 - Unsafe Rust: Raw Pointers, NonNull, MaybeUninit, UB - A Sound Vec

Re-implement Vec<T> from scratch (the Nomicon's chapter 9 walk-through is the reference). Requirements: - RawVec allocator wrapper handling growth. - ZST (zero-sized type) handling-Vec<()> must work without ever allocating. - Drop correct under panic in T::drop. - Iteration via IntoIter with proper drop on partial consumption. - Pass Miri on every public method.

Week 14 - FFI: Calling C, Being Called By C - Bind a Real C Library and Expose a Rust One

Two parts: 1. Consume: write Rust bindings to libsodium's crypto_secretbox family. Use bindgen for the raw layer, then wrap in safe Rust (own the keys with Zeroizing<[u8; 32]>, use typed nonces, return Results). 2. Expose: take your parse-units crate from Month 1 and ship a C-callable parse_units_c library with a cbindgen - generated header. Provide aMakefile` that links a tiny C program against it.

Week 15 - Declarative Macros (macro_rules!) - A hashmap! Macro With Diagnostics

Implement a hashmap! macro: - hashmap! { "a" => 1, "b" => 2 } produces a HashMap. - Trailing comma allowed. - Type-checks: a typo like hashmap! { "a" => 1, "b" -> 2 } should produce a useful error pointing at the bad token (use compile_error! strategically). - Pre-allocates with HashMap::with_capacity.

Week 16 - Procedural Macros - All Three Flavors

Build the dtolnay/proc-macro-workshop exercises end-to-end: 1. derive_builder - derive a builder pattern with field-level attributes for renaming and each-element setters. 2.seq - function-like macro seq!(N in 0..8 { ... }) that emits N expansions. 3. `sorted - attribute macro that enforces enum-variant or match-arm sortedness with proper spans on errors.

This workshop is the gold standard for proc-macro pedagogy. Do all of it.

Week 17 - Hexagonal Architecture and Domain Modeling in Rust - A Hexagonal URL Shortener

Build a workspace implementing a URL shortener: - domain crate: ShortUrl, UrlAlias newtypes, a UrlRepository trait. - application crate: Shorten, Resolve use cases. - adapters/postgres crate: implements UrlRepository with sqlx. - adapters/http crate: axum handlers using the application layer. - bin/api crate: composition root. - An adapters/in-memory crate used by tests, so application logic is testable without a database.

Week 18 - Zero-Copy I/O and the Poll-Based Model - A Zero-Copy Line Protocol

Build a server speaking a minimal newline-delimited protocol: - Read into a BytesMut with try_read_buf. - Parse line-by-line with winnow, yielding &[u8] slices. - Push each parsed message into a downstream channel as a Bytes (cloned cheaply, shared with the parser's allocation). - Benchmark with wrk or tcpkali. Inspect with perf and confirm __memcpy is not a hot frame.

Week 19 - Observability: tracing, metrics, OpenTelemetry - Add Observability to the Hexagonal URL Shortener

Take week 17's URL shortener and add: - tracing::instrument on every use case, with explicit fields (no PII). - Prometheus /metrics endpoint with request counts and per-endpoint latency histograms. - OTLP export to a local Jaeger via docker-compose. - A flamegraph.svg from a 30-second load test, committed.

Week 20 - Testing Strategy: Unit, Property, Fuzz, Miri, Integration - Test-Pyramid the URL Shortener

  • Property-test the alias-generation function (idempotent, collision-resistant under birthday-bound assumptions).
  • Fuzz the public HTTP handlers via the axum::Router directly (no socket).
  • Integration-test the Postgres adapter against a real Postgres in testcontainers.
  • Snapshot-test the OpenAPI spec with insta.
  • Achieve 90%+ coverage per cargo-llvm-cov.

Week 21 - Implementing Complex Data Structures From Scratch - Pick One and Ship It

Implement one of the three to publishable quality: - Property-tested against std equivalent. - Miri-clean. - Loom-verified (for the lock-free). - Criterion-benchmarked against std/dashmap/intrusive-collections. - README explains the algorithmic choice and tradeoffs.

Week 22 - no_std, Custom Allocators, Embedded Targets - Two Targets

  • Bare metal: blink an LED on a real or QEMU-emulated Cortex-M target using embassy. Optional but recommended.
  • Custom allocator: write a simple bump allocator. Use it as #[global_allocator] for a small no_std + alloc benchmark and observe behavior.

Week 23 - Compiler Internals: MIR, Borrow Check, Codegen - Read, Build, Land

  1. Build rustc from source. Modify a single diagnostic message in compiler/rustc_borrowck/src/... to add a new help line. Rebuild stage-1 and confirm the new message in - -explain`.
  2. Find an issue with E-easy. Read the linked discussion. Cross-reference with rustc-dev-guide. Do not yet open a PR; instead, write a one-page plan describing the proposed change. Discuss with a maintainer in the issue comments.

Week 24 - Capstone Integration, Profiling, Hardening, Defense - Defend the Design

Schedule a 45-minute mock review with a senior peer (or record yourself if none is available). Present: - The architecture diagram. - One slide per non-obvious decision (e.g., "why sharded RwLock instead of dashmap", "why tokio over glommio"). - A live demo of the test suite. - A live demo of one production-hardening tool (PGO, BOLT, or fuzz corpus).

The deliverable is the defense, not the slides. If you cannot answer "what fails first under load?" or "what is your worst-case allocation pattern?", you have not yet finished the curriculum.