Saltar a contenido

Go Mastery

Runtime, GMP scheduler, GC, channels, distributed systems.

Printing this page

Use your browser's PrintSave as PDF. The print stylesheet hides navigation, comments, and other site chrome; pages break cleanly at section boundaries; advanced content stays included regardless of beginner-mode state.


Go Mastery Blueprint-A 24-Week Master-Level Syllabus

Authoring lens: Senior Staff Software Engineer / Distributed Systems Architect. Target outcome: A graduate of this curriculum should be capable of (a) submitting non-trivial PRs against golang/go (runtime, compiler, or stdlib), (b) owning a high-throughput distributed control plane (Kubernetes-class), or (c) operating a hyperscale fleet of Go services with a coherent observability story and zero-downtime deploys.

This is not "Tour of Go in 24 weeks." It assumes the reader can write working Go and has shipped production code in some language. The premise: most Go bugs at scale are not language bugs-they are runtime, scheduler, allocator, and GC bugs in disguise. This curriculum surfaces all four.


Repository Layout

File Purpose
00_PRELUDE_AND_PHILOSOPHY.md The "Go-ness" of Go; the design ethics; the cost model; reading list.
01_MONTH_RUNTIME_FOUNDATIONS.md Weeks 1–4. Toolchain, GMP scheduler, stack management, escape analysis.
02_MONTH_MEMORY_AND_GC.md Weeks 5–8. Memory layout, tricolor GC, interface itabs, GOMEMLIMIT.
03_MONTH_CONCURRENCY_MASTERY.md Weeks 9–12. Channel internals, atomics, context, leak/deadlock prevention.
04_MONTH_REFLECTION_CODEGEN_PLUGINS.md Weeks 13–16. reflect, go/ast, go generate, plugin, go-plugin.
05_MONTH_PRODUCTION_DISTRIBUTED.md Weeks 17–20. DDD, observability, gRPC, hardened testing.
06_MONTH_CAPSTONE.md Weeks 21–24. Consensus, distributed storage, perf tuning, capstone defense.
APPENDIX_A_PRODUCTION_HARDENING.md pprof, trace, golangci-lint, race detector, ldflags, build tags.
APPENDIX_B_DATA_STRUCTURES_AND_PATTERNS.md Build-from-scratch reference: lock-free queue, ring buffer, Bloom filter, LRU.
APPENDIX_C_CONTRIBUTING_TO_GO.md The Go pipeline; Gerrit; first CL playbook; runtime PR strategy.
CAPSTONE_PROJECTS.md Three terminal projects: Raft KV store, gRPC mesh, streaming pipeline.

How Each Week Is Structured

Every weekly module follows the same five-section format so the reader can budget time:

  1. Conceptual Core-the why, with a mental model.
  2. Mechanical Detail-the how, down to runtime source where relevant (src/runtime/proc.go, src/runtime/mgc.go, etc.).
  3. Lab-a hands-on exercise that cannot be completed without internalizing the concept.
  4. Idiomatic & golangci-lint Drill-read 2–3 lints, refactor a sample to silence them, understand why each lint exists.
  5. Production Hardening Slice-a pprof/trace/ - race/go vet` micro-task that compounds into a publishable hardening template.

Each week is sized for ~12–16 focused hours. Skip the labs at your peril; the labs are the curriculum.


Progression Strategy

The phases form a dependency DAG, not a linear track:

Runtime Foundations ──► Memory & GC ──► Concurrency ──► Reflection / Codegen / Plugins
        │                    │                │                       │
        └────────────────────┴────────┬───────┴───────────────────────┘
                       Production & Distributed Systems
                            Capstone & Defense

The Production Hardening slice is intentionally orthogonal-it accumulates a hardening/ template that, by week 24, is a publishable Go module starter.


Non-Goals

  • This curriculum does not cover web frameworks (Gin/Echo/Fiber) as primary subjects. They appear only as integration surfaces in Month 5; net/http is sufficient for everything taught here.
  • Front-end / GopherJS / TinyGo are out of scope (pointers given in 00_PRELUDE).
  • "Why Go is better than X" advocacy is explicitly avoided. The reader should finish the program able to argue against using Go when it is the wrong tool.

Capstone Tracks (pick one in Month 6)

  1. Distributed Storage Track-a Raft-replicated key-value store with linearizable reads, snapshot/restore, multi-region demo.
  2. Service Mesh Track-a gRPC-based microservices mesh with a custom service registry, health checking, deadline propagation, and outlier ejection.
  3. Streaming Pipeline Track-a Kafka-protocol-compatible (or NATS-style) ingestion + stream-processing pipeline with at-least-once delivery and replay.

Details in CAPSTONE_PROJECTS.md.


Versioning Note

This curriculum targets Go 1.22+ as the baseline (range-over-int, range-over-func iterators stable, loopvar semantics fixed in 1.22, slog stable since 1.21, PGO stable since 1.21, GOMEMLIMIT since 1.19, async preemption since 1.14). Do not start this curriculum on a Go older than 1.22-too many of the modern idioms will be unavailable.

Prelude-The Philosophy Behind the Syllabus

Sit with this document for an evening before week 1. The rest of the curriculum is mechanically dense; this is the only chapter where we step back and define the shape of the discipline.


1. Go Is a Runtime, Not a Language

The most damaging misconception a Go engineer can hold is that "Go is just C with garbage collection and goroutines." A working master-level practitioner thinks the inverse:

Go is a runtime-a sophisticated user-space scheduler, allocator, and concurrent garbage collector-that ships with a small, deliberately under-featured language attached.

Almost every interesting performance bug in production Go has its root in the runtime, not in the language semantics. Almost every elegant high-throughput Go architecture is a thin layer over runtime primitives (runtime.Gosched, runtime.LockOSThread, runtime/pprof, runtime/trace, runtime/debug.SetGCPercent, debug.SetMemoryLimit).

Internalize this and the rest of the curriculum makes sense.


2. The Five-Axis Cost Model

A working Go engineer reasons about every line of code along five axes simultaneously:

Axis Question to ask
Allocation Does this escape to the heap? Could it stay on the stack?
Scheduling Will this goroutine block? On what-channel, syscall, lock, network? Will it preempt the P?
GC pressure How much live data does this add? How long-lived? Pointer-rich?
Concurrency safety Is this aliasable across goroutines? Is the access pattern visible to the race detector?
Failure What happens on panic? On context.Canceled? On a deadlocked send?

Beginner courses teach axis 4 only (and incompletely). This curriculum forces all five into your hands by week 12.


3. The "Go Way"-Aesthetic as Engineering Constraint

Go's design ethic is, famously, "simplicity." That word is doing more work than newcomers think. Specifically:

  • Composition over inheritance. Embedding, not subclassing. Interfaces are implicitly implemented; consumers define interfaces, not producers.
  • Errors are values. No exceptions. if err != nil is not boilerplate-it is a deliberate choice to make every failure path local and visible.
  • Concurrency by communication. "Don't communicate by sharing memory; share memory by communicating." Channels first, mutexes when channels would obscure intent.
  • The stdlib is the framework. net/http, encoding/json, database/sql, context, `log/slog - together they cover ~80% of any service. Reach for third-party only when stdlib runs out.
  • Tooling is part of the language. gofmt, go vet, go test -race, go test -fuzz, pprof, trace. A Go engineer who does not know these is half-trained.

If you fight these defaults, you will fight the language. If you internalize them, you will write code that any other Go engineer can pick up in under a day. That is the actual deliverable Go optimizes for.


4. The Reading List

These are referenced throughout the curriculum. You are not expected to read them cover-to-cover before starting; they are pinned tabs.

Primary - The Go Programming Language (Donovan & Kernighan). The canonical text. - 100 Go Mistakes and How to Avoid Them (Teiva Harsanyi). The single best second book. - Concurrency in Go (Katherine Cox-Buday). Read once at week 9, again at week 12. - The Go Memory Model-go.dev/ref/mem. Normative spec for memory ordering. - Effective Go + the Go Code Review Comments wiki-both ~30 minutes of reading, both load-bearing for code-review fluency.

Runtime & internals - The runtime source itself: src/runtime/proc.go (scheduler), src/runtime/mgc.go (GC), src/runtime/malloc.go (allocator), src/runtime/stack.go, src/runtime/chan.go, src/runtime/iface.go. Treat these as primary literature, not reference. - Go internals by jhh.io and Russ Cox's "research.swtch.com" archive. Particularly Go's Work-Stealing Scheduler and The Tricolor Garbage Collector. - Dmitry Vyukov's writings on the scheduler (Vyukov co-designed the work-stealing scheduler). - Madhav Jivrajani's GoTeam talks ("Go scheduler: a deep dive").

Distributed systems canon (not Go-specific, but mandatory) - Lamport, Time, Clocks, and the Ordering of Events. - Diego Ongaro, In Search of an Understandable Consensus Algorithm (the Raft paper). Read in week 21. - Brewer, CAP Twelve Years Later. The original CAP paper is famously misread; this is the cleaner statement. - Kleppmann, Designing Data-Intensive Applications. Read chapters 5–9 in the back half of the curriculum.

Adjacent canon - Drepper, What Every Programmer Should Know About Memory. Re-read in week 5. - Herlihy & Shavit, The Art of Multiprocessor Programming, chapters 7, 9, 13.


5. Curriculum Philosophy: "Read the Source, Ship the Lab"

Three rules govern every module:

  1. Source first, blog second. When the curriculum says "study the channel send path," it means open src/runtime/chan.go and read chansend1. Blogs go stale; commits are dated.
  2. One lab per concept, one CL per phase. By the end of each month, the reader has produced one open-source-quality artifact (module, gist, or upstream contribution)-not a notebook of toy snippets.
  3. The race detector and pprof are the teachers. When you do not understand why a program misbehaves, the first response is go test -race, the second is go tool pprof, and only the third is to ask another human.

6. What Go Is Not For

A graduate of this curriculum should be able to argue these points in a design review without sounding ideological:

  • CPU-bound numerical code. No SIMD intrinsics in the language; the compiler's autovectorizer is conservative; the GC tax is non-zero. Use Rust, C++, or call out to BLAS/LAPACK via cgo.
  • Hard-real-time systems. GC pauses are short but non-zero. Audio DSP, motor control, kernel drivers-wrong tool.
  • Heavy generic-numeric libraries. Generics landed in 1.18 but have constraints (no method-on-type-parameter dispatch, no specialization). This is fine for collections; it is awkward for `numpy - equivalent libraries.
  • Code where the team will demand inheritance hierarchies. Go has no inheritance. A team that resists composition will fight Go forever.

The signal that Go is the right tool: you have a concurrency, deployability, and team-onboarding-speed constraint that ranks above raw CPU efficiency or expressiveness.


7. A Note on AI-Assisted Workflows

Modern Go authors use LLM tooling. Three rules:

  1. **Never accept generated concurrent code without - race.** The most common failure mode of generated Go is plausible-looking but racy patterns (closing a channel from multiple goroutines, reading amap` from one goroutine while writing from another).
  2. Verify generated interface satisfaction. Models hallucinate methods. Always compile.
  3. Treat suggested context-handling skeptically. The most common context bug-capturing the request ctx in a goroutine that outlives the request-is endemic in generated code.

You are now ready for Week 1. Open 01_MONTH_RUNTIME_FOUNDATIONS.md.

Month 1-Runtime Foundations: Toolchain, GMP, Stacks, Escape Analysis

Goal: by the end of week 4 you can (a) describe the GMP scheduler model and trace a goroutine through go func()runqputfindrunnable → execution, (b) predict whether a value will escape to the heap by reading the source, (c) explain why a goroutine that never yields can stall a P, and (d) ship a small CLI as a statically linked binary with reproducible builds.


Weeks

Week 1 - The Toolchain and the Build Pipeline

1.1 Conceptual Core

  • The Go toolchain is a single binary that bundles the compiler, linker, formatter, dependency manager, test runner, race detector, profiler, and tracer. Every other ecosystem you've used distributes these as separate tools; Go's choice to integrate them is itself a design statement.
  • go build is not just a compiler invocation. It is a dependency graph walker that:
  • Resolves the module graph (go.mod + go.sum).
  • Computes the build action graph (run with go build -x or go build -n to inspect).
  • Compiles each package to an archive (.a) cached in $GOCACHE (default $HOME/.cache/go-build).
  • Links into a final binary or .so/.a.
  • The build cache is content-addressed. Identical inputs → identical outputs → cache hit. This is what makes go build feel instantaneous on second invocations.

1.2 Mechanical Detail

  • Module mode is the only mode. GOPATH mode is dead-do not start a project under $GOPATH/src in 2026.
  • go.mod directives: module, go, toolchain, require, replace, exclude, retract. Memorize all of them.
  • Minimum Version Selection (MVS): Go's resolver picks the minimum version of each dependency that satisfies all require directives. This is the opposite of npm/pip "latest compatible." Read Russ Cox's MVS paper.
  • go.sum is a content-addressed integrity ledger, not a lock file. It records hashes of every module version ever depended on, including transitively-dropped versions. Never edit by hand.
  • The vendor/ directory: dead in OSS, alive in air-gapped enterprise. Use go mod vendor only when offline builds are mandatory.
  • Useful introspection commands:
  • `go env - every environment variable the toolchain consults.
  • `go list -m all - every module in the build.
  • `go list -deps -json ./... - the package graph as JSON.
  • `go version -m - the modules embedded in a built binary (BuildInfo).

1.3 Lab-"Hello World, Audited"

  1. Create hello-audited. Set go 1.22 and a toolchain go1.22.x directive.
  2. Build with go build -trimpath -ldflags="-s -w -X main.version=v0.1.0". Run go version -m ./hello-audited.
  3. Strip with strip and compare. Cross-compile to linux/arm64, darwin/arm64, windows/amd64 with GOOS=... GOARCH=... go build.
  4. Document the size delta from each flag in NOTES.md. - s -wtypically saves ~30%; - trimpath is a reproducibility flag (no local paths in the binary), not a size flag.
  5. Inspect the binary with go tool nm and go tool objdump. Identify the runtime symbols (runtime.main, runtime.gcStart, runtime.schedule).

1.4 Idiomatic & golangci-lint Drill

  • Install golangci-lint. Enable a strict config: errcheck, govet, staticcheck, gosimple, ineffassign, revive, gocritic, gosec, bodyclose, nilerr, prealloc, unconvert. Run on a small repo. Read each finding's URL and understand the rationale.

1.5 Production Hardening Slice

  • Add a Makefile (or Taskfile.yml) target that runs gofmt -l -d, go vet ./..., golangci-lint run, go test -race -count=1 ./..., go build -trimpath. This is the baseline CI invocation; every subsequent week's hardening slice extends it.
  • Adopt go-licenses to scan dependency licenses. Commit the report.

Week 2 - The GMP Scheduler Model

2.1 Conceptual Core

  • G = Goroutine. A user-space concurrent execution context with a stack, program counter, and runtime metadata. Cheap (~2 KB initial stack), millions per process.
  • M = Machine. An OS thread (pthread on POSIX). Has a kernel-managed stack. Goroutines run on M's.
  • P = Processor. A logical execution context that holds a local run queue of runnable G's plus a few caches (per-P mcache for the allocator). GOMAXPROCS sets the number of P's; default = number of CPU cores.
  • The invariant: an M needs a P to run Go code (specifically, to call into the scheduler). When an M makes a blocking syscall, it hands off its P to another M so other goroutines can run. This is the magic that makes blocking syscalls cheap in Go.

2.2 Mechanical Detail

Read these source files in this order: 1. src/runtime/runtime2.go - the data structures:g,m,p,schedt. Keep this open as a reference. 2.src/runtime/proc.go::schedule - the heart of the scheduler: pick a runnable G and switch to it. 3. src/runtime/proc.go::findrunnable - the search order: local runq → global runq → netpoll → work-stealing from peer P's. 4.src/runtime/proc.go::newproc - what happens at go func(). Particularly note runqput and the work-stealing-friendly slot ordering.

Key concepts: - Local run queue: each P has a 256-slot ring buffer of runnable G's. Push to tail, pop from head; work-stealers take from peer P's heads. - Global run queue: a doubly-linked list under sched.lock. Used as overflow for local queues and for goroutines woken from netpoll. - Work stealing: when a P's local queue is empty, it picks a victim P at random and steals half its queue. This is what amortizes load across cores. - runtime.LockOSThread(): pin the calling goroutine to its current M. Necessary for cgo calls to OS APIs that require thread affinity (most GUI toolkits, OpenGL, some signal handlers). - runtime.Gosched(): cooperative yield. The goroutine is moved back to the global queue. - Asynchronous preemption (since Go 1.14): tight CPU loops without function-call boundaries used to monopolize a P; now the runtime sends SIGURG to the M to force a safe-point. Read runtime/preempt.go. - netpoller: integrates with epoll/kqueue/IOCP. When a goroutine blocks on a network read, it parks and the M can run other goroutines. The goroutine is unparked when the FD is ready.

2.3 Lab-"Schedule Forensics"

Build a tiny program that: 1. Spawns 1,000 goroutines, each computing a busy CPU loop for 10ms. 2. Records the time-to-completion distribution. 3. Re-runs with GOMAXPROCS=1, =2, =N (your core count). 4. Re-runs with runtime.Gosched() inserted in the loop. 5. Re-runs with the loop replaced by time.Sleep(10*time.Millisecond) (the netpoller path).

Tabulate the latency distributions in NOTES.md. Explain why GOMAXPROCS=1 without Gosched() produces high tail latency. Then, capture an execution trace with runtime/trace:

trace.Start(f)
defer trace.Stop()
View with go tool trace. Identify the per-P timeline, GC pauses, and proc transitions.

2.4 Idiomatic & golangci-lint Drill

  • staticcheck SA1019 (deprecated APIs), staticcheck SA5008 (forgotten defer vs loop variables), revive: confusing-naming. Less about scheduler correctness here, more about hygiene.

2.5 Production Hardening Slice

  • Wire runtime/trace to a /debug/trace HTTP handler (gated by build tag debug). Add pprof handlers (net/http/pprof import for side effect). Document how to capture a 10-second trace from a running process.

Week 3 - Stack Management

3.1 Conceptual Core

  • Every goroutine has its own stack, separate from the OS thread stack. Initial size: 2 KB. Stacks grow (and shrink) dynamically. There is no fixed maximum per goroutine until you hit runtime.SetMaxStack (default 1 GB on 64-bit).
  • Contiguous stacks (since Go 1.4): when a goroutine needs more stack, the runtime allocates a new, larger contiguous region, copies the old stack into it, and rewrites all internal pointers. This is what the compiler-emitted "stack guard" preamble enables.
  • The relationship to escape analysis is direct: stack-allocated values are free; heap-allocated values cost an allocation, GC tracking, and a future scan. Master Go performance work is, in large part, the art of keeping values on the stack.

3.2 Mechanical Detail

  • Stack growth flow (src/runtime/stack.go):
  • Function prologue checks g.stackguard0 against SP.
  • If SP < stackguard0, jump to runtime.morestack.
  • morestack calls newstack, which allocates a new stack 2× the old size, copies, and rewrites pointers (including pointers to local variables and function parameters).
  • Resume execution.
  • Stack shrinking is performed by the GC when it observes the goroutine is using less than 1/4 of its stack.
  • Pointer adjustment during copy: this is the reason Go does not let you take stable pointers to stack-allocated locals across goroutine boundaries-moving the stack invalidates them. The escape analysis catches this; values that escape are heap-promoted.
  • Unsafe consequences: storing a uintptr (rather than unsafe.Pointer) does not protect against stack moves. The GC will not update the address. The Go memory model documents this; the unsafe package docs make it explicit.

3.3 Lab-"Stack Growth in the Wild"

  1. Write a recursive function func depth(n int) int { if n == 0 { return 0 }; var buf [256]byte; _ = buf; return 1 + depth(n-1) }.
  2. Run with progressively larger n. Use GODEBUG=gctrace=1,scheddetail=1 and observe stack growth events.
  3. Re-run under runtime.ReadMemStats snapshots, recording StackInuse and StackSys.
  4. Now write the same function with a `goroutine - per-call style and observe how stack churn changes.

3.4 Idiomatic & golangci-lint Drill

  • gocritic: deepEqualByteSlice, prealloc. The latter flags ranged loops appending to a slice that could be make'd with capacity-relevant to allocator pressure but not stack-specific.

3.5 Production Hardening Slice

  • Add runtime/debug.SetMaxStack(64 * 1024 * 1024) (64 MiB) in your service binaries. Default 1 GiB is rarely what you want; bounding stack per-goroutine catches runaway recursion early.

Week 4 - Escape Analysis and the Inliner

4.1 Conceptual Core

  • Escape analysis is the compiler pass that decides whether a variable can live on the stack (cheap, freed on return) or must live on the heap (allocated by mallocgc, GC-tracked). It is not a runtime decision; it is purely static.
  • The two questions the compiler asks for each &x / new(T) / make(...):
  • Does the address outlive the current function? (Returned, stored in a heap object, captured by a goroutine, captured by an interface escape.)
  • Is the size statically known and bounded? (Variable-length stack allocations are limited.)
  • If yes to escape: heap. If no: stack. The compiler dumps its reasoning under - gcflags=-m`.

4.2 Mechanical Detail

  • Common escape triggers:
  • Returning &x for a local x.
  • Storing &x in a heap-allocated struct, slice, or map.
  • Capturing x by a goroutine closure (the goroutine outlives the frame).
  • Boxing a value into an interface{} of nontrivial size-the value is copied to the heap so the interface header can hold a pointer.
  • Calls to functions whose parameters are passed via interface{} (e.g., fmt.Printf("%d", x) boxes x).
  • Slices grown beyond the inlined-make size threshold.
  • The inliner: small functions are inlined. Inlining matters for escape analysis because escape decisions are made across inlined call sites-a function that "would escape if not inlined" may stay on the stack when inlined into its caller.
  • //go:noinline and //go:nosplit: directives to suppress inlining or stack-split checks. Reserved for runtime-internal code; rarely justified in application code.
  • Allocation profile: go test -bench=. -memprofile=mem.out then go tool pprof -alloc_objects mem.out. The - alloc_objectsview counts allocations (escapes); - inuse_space counts retained bytes.

4.3 Lab-"Escape Forensics"

For each of the following snippets, predict whether the value escapes, then verify with - gcflags=-m: 1.func A() int { x := 7; return &x }2.func B() int { x := 7; p := &x; return p }3.func C() { x := 7; go func() { fmt.Println(x) }() }4.func D() { x := bytes.Buffer{}; x.WriteString("hi"); fmt.Println(x.String()) }5.func E(s []int) int { return len(s) }called asE(make([]int, 8)). 6.func F() any { return 7 }(boxing intointerface{}`). 7. A method call on an interface value vs the concrete type (covered in Week 7).

For each that escapes, propose a refactor that keeps it on the stack. Then write a Criterion-style benchmark (testing.B) and prove the win.

4.4 Idiomatic & golangci-lint Drill

  • staticcheck SA6002 (sync.Pool accepting non-pointer types-silent allocation), gocritic: hugeParam, prealloc, makezero. Each maps to an allocation pathology.

4.5 Production Hardening Slice

  • Configure golangci-lint to fail on new escape-related issues introduced by a PR. Add a CI step that runs go test -bench=. -benchmem on critical packages and diffs allocations against a baseline (benchstat).

Month 1 Capstone Deliverable

A workspace runtime-foundations/ with three modules: 1. schedule-forensics (week 2 lab)-produces a labeled trace.out and a markdown latency-distribution report. 2. stack-growth (week 3 lab)-produces a graph of StackInuse over time. 3. escape-clinic (week 4 lab)-six benchmarks with before / after allocation counts.

CI must run: gofmt -l, go vet, golangci-lint, go test -race, go test -bench=. -benchmem | tee bench.txt, benchstat baseline.txt bench.txt. The baseline is captured at week 4's end and tracked from then on.

Month 2-Memory, the Garbage Collector, and Interface Internals

Goal: by the end of week 8 you can (a) read a pprof heap profile and explain why an object is retained, (b) describe the tricolor mark-sweep algorithm including write barriers and the mark assist, (c) predict from a type signature whether method calls will allocate, and (d) tune GOGC and GOMEMLIMIT for a real workload.


Weeks

Week 5 - Memory Layout, Padding, Alignment

5.1 Conceptual Core

  • A Go value lives in exactly one of: a goroutine stack, the heap (managed by mallocgc), the data segment (mutable globals), or the read-only data segment (string literals, constants).
  • Struct field ordering is significant for memory footprint: Go does not reorder fields. The compiler inserts padding to satisfy alignment requirements. Misordering can double the size of a hot struct.
  • False sharing is the silent killer of concurrent Go: two unrelated atomic counters in the same cache line cause cores to evict each other's caches on every update. The fix is padding to 64 bytes (or 128 on Apple Silicon).

5.2 Mechanical Detail

  • unsafe.Sizeof, unsafe.Alignof, unsafe.Offsetof. Memorize the size of every primitive: bool 1, int8 1, int16 2, int32/float32/rune 4, int64/float64/int/uintptr 8 (on 64-bit), pointer 8, slice header 24 (ptr+len+cap), string header 16 (ptr+len), interface header 16 (itab/type+data), map pointer 8, channel pointer 8.
  • Field reordering for size: sort fields by alignment, descending. Tools: fieldalignment (a vet analyzer in golang.org/x/tools/go/analysis/passes/fieldalignment). Wire it into CI.
  • runtime/internal/sys.CacheLineSize is 64 on most platforms. Use [7]uint64 padding (or the helper in golang.org/x/sys/cpu / a custom CacheLinePad) to isolate hot atomics.
  • Slice internals: a slice is a 24-byte header {Data *T, Len int, Cap int}. s = append(s, x) may reallocate; the old backing array is GC'd if no other slice references it. The growth strategy: ~2× under 1024 elements, ~1.25× above. Read runtime/slice.go::growslice.
  • Map internals: runtime/map.go. Hash-bucketed open addressing with overflow buckets. 8 entries per bucket. Iteration order is deliberately randomized. Maps are never safe for concurrent write+anything; use sync.Map (specialized read-mostly) or a sharded map for general concurrency.

5.3 Lab-"Layout Forensics"

  1. Define five "interestingly bad" structs (e.g., struct{ a bool; b int64; c bool; d int64; e bool }). Compute their unsafe.Sizeof by hand, then verify.
  2. Reorder for minimal padding. Re-measure. Document each delta.
  3. Build a benchmark with []Struct of 1M elements; compare allocation/scan time with the badly-padded vs the optimally-packed version. Use runtime.ReadMemStats to capture HeapAlloc and GC pause durations.
  4. Construct a false-sharing example: two atomic counters incremented by different goroutines, with and without CacheLinePad between them. Benchmark contention. Expect 5–20× difference.

5.4 Idiomatic & golangci-lint Drill

  • fieldalignment (vet analyzer), unconvert, gocritic: builtinShadowDecl. Wire fieldalignment into CI as a hard fail.

5.5 Production Hardening Slice

  • Add runtime.ReadMemStats instrumentation to your service template. Export HeapAlloc, HeapInuse, StackInuse, NumGC, PauseTotalNs as Prometheus metrics (or expvar). This becomes the Month 5 observability baseline.

Week 6 - The Garbage Collector

6.1 Conceptual Core

  • Go's GC is a concurrent, tricolor, mark-sweep, non-generational, non-compacting collector. Each adjective is doing work:
  • Concurrent: marks happen while the application runs ("mutator").
  • Tricolor: every object is white (unreached), gray (reached but children unscanned), or black (reached and scanned). The invariant: a black object never points to a white object.
  • Mark-sweep: phase 1 marks reachable; phase 2 reclaims unmarked.
  • Non-generational: no separate young/old heap. (The Pacer compensates.)
  • Non-compacting: objects do not move. This is what allows direct pointer interior addressing and unsafe.Pointer to remain valid.
  • Why these choices: Go optimizes for predictable low-pause behavior at the cost of throughput. A compacting collector would have lower steady-state heap, but compaction stops the world.

6.2 Mechanical Detail

  • Phases (runtime/mgc.go):
  • Sweep termination (STW, microseconds): finish previous cycle's sweep.
  • Mark setup (STW, microseconds): enable write barrier, scan stacks (briefly STW each goroutine).
  • Concurrent mark: workers and mutator-assist mark the heap. Write barrier intercepts pointer writes.
  • Mark termination (STW, ~100 µs to ms on huge heaps): finalize.
  • Concurrent sweep: lazily reclaim white objects as the next allocation requests space.
  • Write barrier: the Dijkstra-style barrier records pointer writes during mark so the mutator cannot "hide" a white object behind a black one. Implemented as a runtime call inserted by the compiler around pointer stores. This is why pointer-heavy code is GC-expensive: every write costs a barrier.
  • Mark assist: when a goroutine allocates, it is forced to do proportional GC work. This couples allocation rate to GC progress and is the mechanism that prevents heap blowup.
  • The Pacer: targets next_gc = live_after_last_gc * (1 + GOGC/100). Default GOGC=100: GC when heap doubles. Tunable via GOGC=off, GOGC=50 (more frequent, lower memory), GOGC=200 (less frequent, higher memory).
  • GOMEMLIMIT (since Go 1.19): a soft total-memory ceiling. The GC adjusts pacing to stay under the limit even if GOGC would not have triggered. Use it as your primary memory control in containers; leave GOGC at default.
  • Stack scanning: each goroutine's stack is rooted in marking. Goroutines are paused (briefly) for stack scan; this is part of the STW-mark-setup phase but is per-goroutine and parallelized.

6.3 Lab-"GC Forensics"

  1. Write a service that allocates 100 MB/s of short-lived objects. Run with GODEBUG=gctrace=1. Read each GC line and identify: total heap, live heap, pause time, pacer target.
  2. Set GOMEMLIMIT=512MiB and GOGC=off. Re-run; observe how the GC is now driven entirely by the memory ceiling.
  3. Set GOGC=50 (no GOMEMLIMIT). Re-run; observe more frequent, smaller GCs.
  4. Capture a go tool pprof -alloc_objects profile. Identify the top five allocation sites. Refactor at least two using sync.Pool or pre-allocated buffers. Re-benchmark.
  5. Capture a go tool trace and locate the GC mark phases visually.

6.4 Idiomatic & golangci-lint Drill

  • staticcheck SA6002 (sync.Pool with non-pointer types), prealloc, gocritic: rangeValCopy (large struct copies in range loops).

6.5 Production Hardening Slice

  • In your service template, set GOMEMLIMIT from a MEMORY_LIMIT environment variable computed at container start (debug.SetMemoryLimit(int64(0.9 * cgroup_memory_limit))). This is the single most impactful production tuning knob.
  • Export GC metrics: go_gc_duration_seconds (histogram), go_memstats_*. Use prometheus/client_golang's collectors.NewGoCollector(collectors.WithGoCollections(collectors.GoRuntimeMemStatsCollection | collectors.GoRuntimeMetricsCollection)) for the modern collector.

Week 7 - Interface Values, itabs, and Dispatch Cost

7.1 Conceptual Core

  • A Go interface value is a two-word header:
  • For non-empty interfaces (io.Reader, error, etc.): (itab, data).
  • For empty interfaces (any/interface{}): (type, data).
  • itab = interface table = the dynamic dispatch vtable for one (interface, concrete type) pair. Holds the type pointer, the interface type pointer, a hash, and an array of function pointers (the methods).
  • data is a pointer to the concrete value, or the value itself if it fits in a word (small integer types, etc., as an optimization in some Go versions). Modern Go (>=1.4) always uses an indirect-be careful with old assumptions.

7.2 Mechanical Detail

  • Read src/runtime/iface.go. Key functions: getitab, convT2I, assertI2I, assertE2I. The getitab cache is keyed by `(interface_type, concrete_type) - first call may allocate the itab; subsequent calls hit the cache.
  • Cost of an interface call:
  • Load itab from interface header (cache hit).
  • Load function pointer from itab.fun[N].
  • Indirect call. Plus: the call cannot be inlined. So an interface call is 1 indirect call + lost inlining opportunities. On hot paths, this matters.
  • Boxing cost: assigning a non-pointer concrete value to any allocates if the value is larger than a word. var x any = 42 may allocate (depends on Go version's int boxing optimization); var x any = SomeBigStruct{} definitely allocates.
  • Type assertions:
  • v.(T) panics on failure; allocates an itab if T is an interface.
  • v, ok := v.(T) is the same but no panic.
  • switch v := v.(type) is the same machinery, optimized for multiple cases.
  • Type switches are faster than chains of type assertions because the compiler may emit a hashed dispatch table.
  • Generics vs interfaces: generics (since 1.18) compile to a single generic body parameterized by the GC shape ("GCShape stenciling"), with a per-shape dictionary. Generics are not specialized like Rust monomorphization-there is still indirection for method calls on type parameters. The performance vs interface tradeoff is subtle and workload-dependent. Read compiler/internal/types2/ and Russ Cox's GCShape blog post.

7.3 Lab-"Interface Bench"

  1. Build a tight loop calling a method via three paths: concrete type, interface, generic type parameter. Benchmark with - benchmem`.
  2. Inspect the disassembly with go tool objdump -s 'main\.benchInterface'. Identify the indirect call.
  3. Refactor a real-world pattern (a Logger interface used 10× in a hot path) into a concrete type or a type-parameterized version. Measure the win or non-win.
  4. Build a worst-case allocation example: passing a stack int into fmt.Println(...). Show with - gcflags=-mthat the int escapes (boxing intoany). Replace withfmt.Println(strconv.Itoa(x))` and re-measure.

7.4 Idiomatic & golangci-lint Drill

  • gocritic: typeAssertChain, gosimple S1034 (omit comma-ok in type-switch). Re-read the Go FAQ on "Why no implicit type conversions?"-the answer informs API design.

7.5 Production Hardening Slice

  • Add a benchmark to CI that asserts 0 allocs/op on critical paths (e.g., the request-handling hot path of your service template). Use testing.B.ReportAllocs() and a script that diffs allocs/op against a committed baseline. Any PR that introduces an allocation on a 0-alloc path fails CI.

Week 8 - Allocation Profiling, sync.Pool, GC Tuning

8.1 Conceptual Core

  • The cheapest allocation is the one you do not make. The second cheapest is the one you reuse.
  • sync.Pool is a per-P caches-of-objects mechanism. Items can be reclaimed by the GC at any time (typically at the start of each GC cycle), so it is a cache, not a resource pool. Use it for short-lived, frequently-allocated objects (bytes.Buffer, []byte scratch space, parser nodes).
  • The two production-grade memory tuning knobs: GOGC (heap growth ratio) and GOMEMLIMIT (absolute ceiling). For containerized services, pin GOMEMLIMIT to ~90% of cgroup memory; leave GOGC default unless profiles say otherwise.

8.2 Mechanical Detail

  • sync.Pool mechanics (src/sync/pool.go):
  • Get() returns from the local-P cache, falls back to a victim cache (objects from the previous GC), falls back to New().
  • Put() stores into the local-P cache.
  • At GC, the local cache is moved to victim, victim is freed.
  • Therefore: do not assume a Pool.Get returns recently Put data. Always reset state on Get.
  • Common sync.Pool mistake: putting non-pointer values. The pool stores interface{}, so a non-pointer goes through boxing-net allocation. Always store pointers.
  • bytes.Buffer reuse pattern:
    var bufPool = sync.Pool{New: func() any { return new(bytes.Buffer) }}
    buf := bufPool.Get().(*bytes.Buffer)
    buf.Reset()
    defer bufPool.Put(buf)
    
  • Allocation profile interpretation: pprof -alloc_objects (count) tells you "where churn happens"; - alloc_space(bytes) tells you "where pressure happens"; - inuse_space tells you "what is currently retained." Use all three.
  • runtime/metrics (since 1.16): the modern API for runtime metrics. Replaces ad-hoc MemStats reads. Returns histograms for /gc/pauses:seconds, /sched/latencies:seconds, etc.

8.3 Lab-"Pool the Hot Path"

  1. Take the JSON-handling hot path of any service. Run pprof -alloc_objects under load. Identify the top three allocation sites.
  2. Introduce a sync.Pool for the most appropriate one (typically bytes.Buffer or a decoder).
  3. Re-benchmark. The win should be visible in allocs/op and in p99 latency under load.
  4. Now intentionally misuse: Pool.Put without resetting state. Detect the bug under - race` or via a deliberately-inserted assertion.

8.4 Idiomatic & golangci-lint Drill

  • staticcheck SA6002, gocritic: appendAssign, prealloc. Re-read Dave Cheney's "High Performance Go Workshop" notes (a classic standing reference).

8.5 Production Hardening Slice

  • Add a /debug/pprof HTTP endpoint behind an auth-or-build-tag gate (do not expose it on the public listener). Document the on-call runbook for capturing CPU/heap profiles from a misbehaving production process.
  • Add `runtime/metrics - based exporters for GC pause histograms and scheduler latencies. These are the signals an SRE wants when a Go service misbehaves.

Month 2 Capstone Deliverable

A memory-and-gc/ workspace: 1. layout-forensics (week 5)-with fieldalignment enforced in CI. 2. gc-forensics (week 6)-with annotated gctrace=1 logs and a tuning playbook. 3. iface-bench (week 7)-concrete vs interface vs generic, three-way benchmark. 4. pool-the-hot-path (week 8)-before/after profile diff, baseline benchmark in CI.

Workspace-level CI must add: fieldalignment analyzer, 0-alloc regression guard on critical benchmarks, pprof artifacts captured on demand from a make profile target.

Month 3-Concurrency Mastery: Channels, Atomics, Context, Patterns

Goal: by the end of week 12 you can (a) implement a correct lock-free single-producer single-consumer ring buffer using sync/atomic, (b) read the channel send path in runtime/chan.go and explain chansend1 line-by-line, (c) detect goroutine leaks before they reach production, and (d) design a worker-pool that survives backpressure, partial failures, and graceful shutdown.


Weeks

Week 9 - Channels, Deeply

9.1 Conceptual Core

  • A channel is a typed, bounded (or unbounded), thread-safe queue with select integration. Internally it is a struct (hchan) protected by a mutex, with two FIFO wait lists for blocked senders and receivers.
  • The CSP slogan ("share memory by communicating") is partly aspirational. In practice, large Go systems use channels for ownership transfer and signaling, and use mutexes/atomics for shared state. Both are idiomatic-picking the wrong one for a given problem is the bug.
  • Send/receive semantics:
  • Buffered channel with space → non-blocking send.
  • Buffered channel full / unbuffered → block until a receiver is ready (or vice versa).
  • Closed channel → send panics; receive returns zero value with ok=false.
  • nil channel → send and receive block forever. Useful in select to disable a case.

9.2 Mechanical Detail

Read src/runtime/chan.go. Particularly: - hchan struct: qcount, dataqsiz, buf, elemsize, closed, sendx, recvx, recvq, sendq, lock. - chansend: lock, then either copy to buffer / hand-off to waiting receiver / park sender. - chanrecv: symmetric. - closechan: marks closed, wakes all waiters. - The hand-off optimization: if a sender finds a parked receiver, it copies directly into the receiver's stack and parks no goroutine. This is what makes unbuffered channels efficient. - Select (runtime/select.go): randomized-fair selection across ready cases. The selectgo function is among the most subtle in the runtime; read it slowly. Note: select with a default is a non-blocking try. - Closing discipline: close from the sender side, never from a receiver. Use sync.Once if multiple goroutines might close. The standard idiom for graceful shutdown is a separate done channel (or a context.Context), not closing the data channel.

9.3 Lab-"Channel Internals"

  1. Write a benchmark comparing: unbuffered chan, buffered chan(1), buffered chan(1024), sync.Mutex + slice queue, and a `sync/atomic - only SPSC ring buffer. Use 1 producer, 1 consumer, 10M messages.
  2. Plot the throughput. The atomic SPSC should be 5–10× the channel; the mutex queue may beat the buffered channel for small messages.
  3. Reproduce a nil - channel select pattern: a goroutine that toggles between two upstream channels by setting one tonil` to disable a case.
  4. Write an "unbounded channel" using a goroutine that bridges an in-channel to an out-channel via an internal slice buffer. Discuss why this exists and why it is dangerous (memory growth on slow consumer).

9.4 Idiomatic & golangci-lint Drill

  • staticcheck SA1015 (time.Tick leak), staticcheck SA1030 (time.After in select-loops leaks), gocritic: emptyDecl, revive: empty-block. The first two are classic concurrency leaks.

9.5 Production Hardening Slice

  • Add goleak.VerifyTestMain(m) (Uber's go.uber.org/goleak) to the test entry point of every package that uses goroutines. CI will now fail any test that leaves a goroutine running.

Week 10 - sync Primitives and sync/atomic

10.1 Conceptual Core

  • sync.Mutex: a fast, fair-but-not-strict mutex with a starvation mode (since 1.9) that switches to FIFO if a goroutine has waited >1ms. Read src/sync/mutex.go.
  • sync.RWMutex: reader-writer lock. Writer-preferring. The read path is fast under low contention, but cache-line bounces under heavy reading; consider sharding before reaching for RWMutex.
  • sync.Once: exactly-once initialization with a memory-barrier guarantee.
  • sync.WaitGroup: not a barrier; a counter with wait-on-zero. Misuse #1: wg.Add(1) inside the goroutine instead of before launching it (race with wg.Wait). Misuse #2: reusing across goroutine generations without resetting.
  • sync.Cond: Mesa-style condition variable. Almost always the wrong tool-channels or chan struct{} + atomic patterns are clearer.
  • sync.Map: optimized for the case where keys are written once and read many times across goroutines. Worse than map + RWMutex for read-modify-write patterns.
  • sync/atomic: low-level atomic operations. Modern API (since 1.19): atomic.Int64, atomic.Pointer[T], atomic.Bool, atomic.Value. Prefer the typed values over the legacy free functions-the typed API prevents most misuse.

10.2 Mechanical Detail

  • The Go memory model: read go.dev/ref/mem once carefully. Key facts:
  • There is happens-before, defined per-channel-op, per-mutex-op, per-atomic-op.
  • There is no total order across atomic operations on different addresses-atomics establish per-location ordering only. (This is closer to C++'s acquire/release than seq_cst.)
  • Reads and writes of word-sized values that are not synchronized are races and have undefined behavior. The race detector is the source of truth.
  • sync.Mutex source walk-through:
  • State is a 32-bit word: locked bit, woken bit, starvation bit, waiter count.
  • Fast path: CAS the locked bit. ~1 ns uncontended.
  • Slow path: spin briefly, then park. Wake order biased toward the most recent waiter except in starvation mode.
  • Atomic patterns:
  • Counter: atomic.Int64.Add(1). Use for stats; do not assume monotonicity across atomic types.
  • Read-only snapshot publish: atomic.Pointer[T].Store(newPtr) paired with Load(). The classic copy-on-write.
  • CAS loop for lock-free updates: for { old := p.Load(); newV := f(old); if p.CompareAndSwap(old, newV) { break } }. Every CAS retry is wasted work; bound the loop or back off.
  • Memory ordering in Go: sync/atomic operations are sequentially consistent on Go-supported architectures (in practice). Do not rely on weaker orderings; the spec does not give you the knobs C++ does.

10.3 Lab-"Lock-Free SPSC Ring"

Build a single-producer, single-consumer ring buffer using only atomic.Uint64 indices. Pad the indices to separate cache lines. Validate with go test -race -count=1000 running 1 producer and 1 consumer. Benchmark against chan T and against sync.Mutex - protected slice. Document the cache-line padding's effect with awithoutPad` variant-expect a 3–10× difference on modern x86.

10.4 Idiomatic & golangci-lint Drill

  • govet: copylocks (mutexes must not be copied), staticcheck SA2000 (WaitGroup.Add after Wait), gocritic: deferUnlambda.

10.5 Production Hardening Slice

  • Run every test with - race` in CI. Make this non-negotiable.
  • Add a CI step that runs critical concurrency tests under - race -count=100` to catch low-probability races. Budget the CI time accordingly.

Week 11 - context.Context, Cancellation, errgroup, singleflight

11.1 Conceptual Core

  • context.Context is the cancellation propagation primitive in Go. Every blocking operation that crosses an API boundary should accept a context.Context as its first parameter.
  • A context carries:
  • Deadline (or no deadline).
  • Cancellation channel (<-ctx.Done()) and reason (ctx.Err()).
  • Request-scoped values (ctx.Value(key))-sparingly.
  • Contexts are immutable trees: each derivation (WithCancel, WithTimeout, WithValue) produces a child. Cancelling a parent cancels all descendants.

11.2 Mechanical Detail

  • The cancellation rules:
  • Pass context.Context as the first parameter, named ctx.
  • Do not store context in struct fields except for short-lived adapters. (One narrow exception: long-running services that derive an internal context once from context.Background().)
  • Always call the cancel function returned by WithCancel/WithTimeout/WithDeadline, even on the success path. Otherwise the context's resources (timer, goroutine in propagateCancel) leak.
  • Do not use context.Value for required parameters. It is a request-scoped sidecar, not a function-call mechanism. Type-safe alternatives (function arguments, struct fields) are always better.
  • errgroup (golang.org/x/sync/errgroup): spawn N goroutines, propagate the first error, cancel siblings, wait for all. The standard pattern for parallel sub-tasks. Read the source-it is ~120 lines.
  • singleflight (golang.org/x/sync/singleflight): deduplicate concurrent identical requests. The classic cache-stampede mitigator. Use for expensive lookups (DB, RPC) where a thundering herd is plausible.
  • context.AfterFunc (since Go 1.21): register a callback to fire when a context is cancelled. Replaces the boilerplate of go func() { <-ctx.Done(); cleanup() }().
  • context.Cause (since Go 1.20): retrieve the cancellation reason, including custom errors via WithCancelCause.

11.3 Lab-"Context Discipline"

  1. Take a small HTTP service. Audit every blocking operation (DB query, downstream RPC, Redis call). Each should accept and propagate ctx. Fail any goroutine that captures a request ctx and outlives the request.
  2. Implement a parallel fan-out using errgroup with N=8 workers, all cancellable on first error.
  3. Implement a cache stampede test: 1000 concurrent requests for the same uncached key. Without singleflight, observe N upstream calls. With singleflight, observe 1.
  4. Demonstrate context.AfterFunc cleanup: register a release-resource callback on cancellation; verify it fires under both timeout and explicit cancel.

11.4 Idiomatic & golangci-lint Drill

  • contextcheck (verifies context propagation), noctx (forbids context.Background() outside main/tests), staticcheck SA1029 (context.WithValue with built-in key type-collision hazard).

11.5 Production Hardening Slice

  • Wire context deadlines to your gRPC server's per-RPC timeouts. The pattern: take the incoming RPC deadline, optionally tighten it for downstream calls, and propagate. Document the deadline-budget calculation in your service's RUNBOOK.md.

Week 12 - Worker Pools, Leak Detection, Deadlock Prevention

12.1 Conceptual Core

  • Worker pool is the canonical "bounded concurrency" pattern: N worker goroutines consuming from a shared task channel. Bounds CPU, memory, and downstream RPC concurrency simultaneously.
  • Goroutine leaks are Go's silent OOM. Most common shapes:
  • Goroutine blocked on a channel that is never closed and never sent to.
  • Goroutine blocked on <-ctx.Done() of a context that nobody cancels.
  • Goroutine holding a reference (closure capture) to a request object that is now done.
  • time.After in a select loop (allocates a timer per iteration; the timer leaks until expiry).
  • Deadlocks in Go are detected only by the runtime's "all goroutines asleep" check, which fires only when every goroutine is blocked. Most production deadlocks are partial: a subsystem deadlocks while the rest of the program runs. The race detector does not catch these.

12.2 Mechanical Detail

  • The canonical worker pool:
    func RunPool[T, R any](ctx context.Context, n int, in <-chan T, fn func(context.Context, T) (R, error)) <-chan Result[R] {
        out := make(chan Result[R])
        var wg sync.WaitGroup
        wg.Add(n)
        for i := 0; i < n; i++ {
            go func() {
                defer wg.Done()
                for {
                    select {
                    case <-ctx.Done():
                        return
                    case task, ok := <-in:
                        if !ok { return }
                        r, err := fn(ctx, task)
                        select {
                        case out <- Result[R]{r, err}:
                        case <-ctx.Done():
                            return
                        }
                    }
                }
            }()
        }
        go func() { wg.Wait(); close(out) }()
        return out
    }
    
    Every line above is load-bearing: the double-select on input and output, the wg.Done in defer, the closer goroutine after wg.Wait.
  • Leak detection tooling:
  • goleak for tests.
  • pprof goroutine for production: curl /debug/pprof/goroutine?debug=2 dumps every goroutine's stack. Read it.
  • runtime.NumGoroutine() exported as a metric. A monotonically growing count is the leak signal.
  • Deadlock detection:
  • go-deadlock (sasha-s/go-deadlock) wraps sync.Mutex with timing-based deadlock detection in dev builds.
  • For partial deadlocks: instrumentation on the lock acquisition path (lock contention metrics from runtime/metrics).
  • Backpressure: when the worker pool is saturated, what should the caller see? Three strategies: block (default), drop (with metric), reject (return error). The choice is application-dependent; document it.

12.3 Lab-"Worker Pool Survival Test"

Build a worker pool that handles: 1. Backpressure-bounded input channel, drop-with-metric on overflow. 2. Graceful shutdown-on ctx.Done(), drain in-flight tasks within a deadline, then abandon the rest. 3. Per-task timeouts-WithTimeout(ctx, 100ms) per task. 4. Panic isolation-a panic in one task does not kill the worker; recover and report. 5. Leak-clean-goleak passes after cancel(); pool.Wait().

Stress-test with 1M tasks across 1000 workers under - race`.

12.4 Idiomatic & golangci-lint Drill

  • bodyclose (HTTP responses leaked), rowserrcheck (sql.Rows.Err unchecked), sqlclosecheck. All three are leak-class lints; enable them as - D warnings`.

12.5 Production Hardening Slice

  • Add a /debug/pprof/goroutine periodic snapshot job to your service template: every 5 minutes, capture the goroutine count and the top-N stacks. Surface as a Prometheus gauge with stack-hash labels (low cardinality). On a leak, you will see which stack is growing without paging anyone.

Month 3 Capstone Deliverable

A concurrency-lab/ workspace: 1. chan-bench (week 9)-channel vs mutex vs atomic ring, with a markdown writeup. 2. spsc-ring (week 10)-atomic-only, race-clean, with cache-pad ablation. 3. context-discipline (week 11)-a refactored HTTP service plus a singleflight cache demo. 4. survival-pool (week 12)-the worker pool that survives the five failure modes.

CI gates additions: - raceon every test, - race -count=100 on critical packages, goleak baseline, 0-alloc regression guard on the SPSC ring's hot path. Open one upstream PR-even a doc fix to errgroup or `singleflight - by month end.

Month 4-Reflection, Code Generation, Plugins

Goal: by the end of week 16 you can (a) write a reflect - based serializer that allocates exactly once per top-level call, (b) implement ago generatedirective that walks anast.Packageand emits idiomatic Go, (c) ship a customgolangci-lint - compatible analyzer using go/analysis, and (d) build a hot-loadable plugin system using HashiCorp's go-plugin.


Weeks

Week 13 - Reflection: reflect, Performance, and Discipline

13.1 Conceptual Core

  • The reflect package exposes the runtime type system: every Go value has a reflect.Type (its static type) and a reflect.Value (a wrapper holding the value plus its type). Together they let you inspect and manipulate values whose concrete type is known only at runtime.
  • The two reflection use cases that matter:
  • Generic serialization / deserialization (encoding/json, encoding/gob, gorm, sqlx)-when the input is any.
  • Schema-driven adapters-config loaders, ORM tag parsers, validators.
  • Reflection is slow. Roughly 5–50× the cost of direct field access. The standard libraries that use it (encoding/json) compensate by caching reflect.Type lookups and method tables per type.

13.2 Mechanical Detail

  • reflect.Type is comparable (==) by identity-two reflect.Type values are equal iff they describe the same Go type. This makes map[reflect.Type]Cache a load-bearing pattern.
  • reflect.Value.Kind() returns the underlying kind (Struct, Ptr, Slice, etc.). reflect.Value.Type() returns the named type. The two differ for named types: type MyInt int has Kind Int, Type MyInt.
  • Field iteration: t.NumField(), t.Field(i) returns a StructField with Name, Type, Tag, Index, Anonymous, PkgPath. Tag.Get("json") is the canonical tag-parsing path.
  • Method invocation: v.Method(i).Call([]reflect.Value{...}). Allocates the slice and the result.
  • unsafe.Pointer shortcut: for performance-critical reflection, take the field address via unsafe.Pointer(v.Field(i).UnsafeAddr()) and read it as the typed value. This is what mapstructure and high-performance JSON libraries do internally. Read the safety contract carefully-it's narrow.
  • Caching pattern:
    var typeInfoCache sync.Map // reflect.Type -> *typeInfo
    func infoFor(t reflect.Type) *typeInfo {
        if v, ok := typeInfoCache.Load(t); ok { return v.(*typeInfo) }
        info := buildInfo(t)            // expensive
        typeInfoCache.Store(t, info)
        return info
    }
    

13.3 Lab-"A Reflective Validator"

Build a struct validator that processes validate:"..." tags: - Must support: required, min=N, max=N, email, regexp=<re>. - Must cache per-type field metadata (one reflect.Type walk per type ever). - Must produce structured errors (path, rule, value). - Must beat a naive non-cached implementation by 10× in benchmarks.

Compare against go-playground/validator for both ergonomics and performance.

13.4 Idiomatic & golangci-lint Drill

  • staticcheck SA1019 (deprecated reflect APIs), gocritic: hugeParam. The pattern of accepting any then immediately calling reflect.ValueOf is a smell-prefer typed APIs whenever possible.

13.5 Production Hardening Slice

  • Add a benchmark that captures the per-call allocation count for the validator's hot path. The hot path (validating a previously-seen type) must allocate ≤1 time. CI fails on regressions.

Week 14 - go/ast, go/parser, go/types: Static Analysis

14.1 Conceptual Core

  • The go/ast package represents Go source as a syntax tree. The go/parser package parses source files into ast.Files. The go/types package performs type checking and resolves identifiers to declarations.
  • The triad (ast + parser + types) is the foundation for every serious Go tool: gofmt, goimports, gopls, golangci-lint, staticcheck, mockgen, sqlc.
  • golang.org/x/tools/go/packages is the modern entry point for loading a Go program for analysis. It handles modules, build tags, and CGO transparently. Use this; do not call parser.ParseFile directly except for single-file tools.
  • golang.org/x/tools/go/analysis is the framework for writing analyzers-small, composable passes consumed by go vet, golangci-lint, and standalone drivers.

14.2 Mechanical Detail

  • Loading a package:
    cfg := &packages.Config{Mode: packages.NeedTypes | packages.NeedSyntax | packages.NeedTypesInfo}
    pkgs, _ := packages.Load(cfg, "./...")
    
    The Mode flags determine cost; load only what you need.
  • Walking AST:
    ast.Inspect(file, func(n ast.Node) bool {
        if call, ok := n.(*ast.CallExpr); ok { /* ... */ }
        return true
    })
    
  • Type information:
  • pkg.TypesInfo.Types[expr] → the type of an expression.
  • pkg.TypesInfo.Defs[ident] / Uses[ident] → the object an identifier defines or uses.
  • pkg.TypesInfo.ObjectOf(ident) → the resolved object (can be a *types.Var, *types.Func, *types.TypeName, etc.).
  • Writing an analyzer:
    var Analyzer = &analysis.Analyzer{
        Name: "noprintln",
        Doc:  "disallow fmt.Println in production code",
        Run:  func(pass *analysis.Pass) (any, error) { /* walk pass.Files */ },
    }
    
    Compile as a binary using unitchecker or load via the golangci-lint plugin system.
  • Common pitfalls: position information (token.Pos) is meaningless without the token.FileSet it was created from; always pass them together. Comment groups are a separate field on ast.File, not attached to AST nodes by default-ast.CommentMap bridges them.

14.3 Lab-"Build a Custom Analyzer"

Write an analyzer that flags: 1. context.Background() calls outside main and *_test.go files. 2. time.After inside a select body (the classic timer-leak pattern). 3. Goroutines launched with closures capturing a context.Context parameter named ctx of an enclosing HTTP handler (heuristic; document the false-positive risk).

Wire as a unitchecker binary. Run on a real codebase and triage findings. Document each false positive in ANALYZER_NOTES.md.

14.4 Idiomatic & golangci-lint Drill

  • Read staticcheck's source for two of its analyzers (e.g., SA1015 and SA4006). Internalize the analyzer-author idioms.

14.5 Production Hardening Slice

  • Publish your analyzer as a module. Add a golangci-lint custom plugin entry so it runs alongside the standard suite. CI now enforces your project's idioms automatically.

Week 15 - go generate and AST-Based Code Generation

15.1 Conceptual Core

  • go generate is a convention, not a feature. It scans source files for //go:generate <command> comments and runs them. The output is normal Go source, committed to the repo.
  • The pattern is preferred over reflection for performance-critical paths: generate exhaustive code at build time, with no reflect cost at runtime.
  • Canonical tools:
  • stringer-String() method for enum-like int types.
  • mockgen-interface mocks for testing.
  • sqlc-SQL → typed Go from query files.
  • ent-schema → typed Go ORM.
  • buf + protoc-gen-go-grpc-protobuf → Go.

15.2 Mechanical Detail

  • Writing a generator (template-based):
    //go:embed tmpl/api.tmpl
    var apiTmpl string
    
    type binding struct{ Name, Method, Path, Result string }
    
    func main() {
        cfg := &packages.Config{Mode: packages.NeedTypes | packages.NeedSyntax | ...}
        pkgs, _ := packages.Load(cfg, ".")
        bindings := extractBindings(pkgs[0]) // walks AST
        var buf bytes.Buffer
        template.Must(template.New("api").Parse(apiTmpl)).Execute(&buf, bindings)
        formatted, _ := format.Source(buf.Bytes()) // gofmt the output
        os.WriteFile("api_generated.go", formatted, 0644)
    }
    
  • format.Source-always run generated bytes through it. Ungofmt'd generated code is an immediate code-review smell.
  • Token-based building (when templates get unwieldy): go/ast + go/printer. Construct AST nodes programmatically; printer.Fprint(w, fset, node) writes them out. More verbose, more correct.
  • Generation hygiene:
  • Add // Code generated by foo. DO NOT EDIT. as the first line. gopls and reviewers honor this convention.
  • Commit the generated files. Do not run generation in CI by default; verify it is up-to-date via go generate ./... && git diff --exit-code.
  • Keep generators small and composable. A 5000-line generator is a sign you should be using a real schema language (protobuf, openapi).

15.3 Lab-"Three Generators"

Build three small generators: 1. Enum stringer-a from-scratch reimplementation of stringer for one annotation pattern. 2. Mock generator-for one interface, generate a struct with method recorders and call assertions. 3. JSON marshaler-generate a type-specific MarshalJSON that allocates zero maps. Compare allocations against encoding/json for the same type.

For each: go vet - clean output,gofmt - formatted, with a go generate directive in the consumer file.

15.4 Idiomatic & golangci-lint Drill

  • revive: file-header (require the DO NOT EDIT line on generated files), gocritic: dupArg. Configure golangci-lint to skip generated files for most lints (exclude-files or per-linter exclude-rules).

15.5 Production Hardening Slice

  • Add a CI step make generate && git diff --exit-code that fails when generated code is stale relative to its inputs. This catches the "I forgot to regenerate" PR antipattern.

Week 16 - Plugins: plugin, go-plugin, gRPC-Based Extensions

16.1 Conceptual Core

  • Go has two plugin stories:
  • plugin package (stdlib)-load .so files at runtime via dlopen. Linux/macOS only; brittle in practice (every dependency must match the host's exact build, including the Go version).
  • HashiCorp go-plugin-out-of-process plugins communicating via gRPC or net/rpc over a local pipe. Used by Terraform, Vault, Packer, Nomad. Robust, polyglot, version-tolerant.
  • For any production extensibility story today, use go-plugin (or its design pattern)-not the stdlib plugin package.

16.2 Mechanical Detail

  • plugin package mechanics:
  • plugin.Open("./plug.so") loads the shared object.
  • p.Lookup("Symbol") returns an interface{} that you type-assert.
  • Constraints: same Go version, same module versions of every shared dependency, same build flags. In practice, used only for narrow, controlled use cases.
  • go-plugin design:
  • Host process spawns plugin as a subprocess.
  • Plugin advertises a "magic cookie" to confirm both sides agree.
  • They negotiate a protocol version and one of (gRPC, net/rpc) as the transport.
  • The host calls the plugin's interface methods, which round-trip over the pipe.
  • On host shutdown, the plugin process is killed.
  • Versioning: declare a HandshakeConfig and one or more Plugin interfaces per protocol version. Drop old versions on major bumps.
  • Performance: per-call latency is microseconds (in-process) or tens of microseconds (cross-process). Not for hot paths; use for control-plane operations (provisioning, configuration, lifecycle).

16.3 Lab-"A Pluggable Storage Backend"

Build a service whose storage backend is a plugin. The host defines an interface Storage { Get(key) (val, err); Put(key, val) error; Delete(key) error }. Ship two plugins: an in-memory backend, and a file-system backend. Both communicate via gRPC over go-plugin. Demonstrate hot-swap by killing one plugin process and starting the other.

16.4 Idiomatic & golangci-lint Drill

  • staticcheck SA1019 (deprecated net/rpc patterns), gocritic: ifElseChain. Plugin code paths are often where dependency-injection mistakes accumulate; review with discipline.

16.5 Production Hardening Slice

  • Add structured logging across the host/plugin boundary using slog with consistent attribute keys. Add a health-check method to every plugin interface; the host periodically probes it and ejects unhealthy plugins.

Month 4 Capstone Deliverable

A reflect-codegen-plugins/ workspace: 1. validator-rs (week 13)-cached reflective validator with the 10× win. 2. noctx-analyzer (week 14)-unitchecker binary, runs in CI. 3. three-gens (week 15)-stringer + mock + JSON marshaler generators. 4. pluggable-storage (week 16)-go-plugin host + two backends.

CI gates additions: custom analyzer in golangci-lint, generated-code freshness check, go-plugin integration test under - race. By end of month, open one PR upstream againstgolangci-lint(a small custom-analyzer doc fix is sufficient) orgo-playground/validator` (a benchmark, a doc, anything).

Month 5-Production-Grade Distributed Systems Engineering

Goal: by the end of week 20 you can (a) lay out a non-trivial Go service following hexagonal/DDD principles and justify each boundary, (b) instrument a service with slog, pprof, OpenTelemetry traces, and metrics, (c) implement a gRPC service with proper deadlines, retries, interceptors, and outlier ejection, and (d) build a five-surface test pyramid that will catch races and goroutine leaks before production.


Weeks

Week 17 - DDD in Go: Hexagonal Architecture, Bounded Contexts

17.1 Conceptual Core

  • Domain-Driven Design in Go starts with one observation: Go's package system is a bounded-context tool. A package can hide types, expose only the interfaces consumers need, and the import graph enforces direction.
  • The hexagonal pattern in Go:
  • Domain package: pure types, behaviors, ports (interfaces) for external dependencies. No imports from net/http, database/sql, etc. This is the dependency-direction rule.
  • Adapter packages: one per external system (postgres, kafka, http-client). Each implements the ports the domain defines.
  • Application package: use cases-methods that orchestrate domain operations across adapters.
  • Cmd package: composition root. Wires adapters into a runnable binary.
  • The three Go-specific hazards:
  • Anaemic domain: types are bags of fields with all logic in services. Push behavior into the type.
  • Receiver-method abuse: mutating methods on value receivers (compile-pass, semantic-fail). Pick T vs *T deliberately.
  • internal/ not used: Go's internal/ directory restricts imports to subtrees. Use it aggressively to enforce layering.

17.2 Mechanical Detail

  • Layout for a hexagonal Go service:
    service/
      cmd/
        api/main.go              # composition root
      internal/
        domain/                  # pure types + ports (interfaces)
        application/             # use cases
        adapter/
          postgres/              # impl PostgresUserRepo
          kafka/                 # impl EventBus
          http/                  # impl HTTP handlers
        platform/
          observability/         # slog, otel, prom wiring
      pkg/                       # exported (rare; most things are internal/)
    
    Note internal/: nothing outside service/... can import it. This is the architectural test.
  • Defining ports as interfaces:
    package domain
    
    type UserRepo interface {
        ByID(ctx context.Context, id UserID) (User, error)
        Save(ctx context.Context, u User) error
    }
    
    Define interfaces where they are consumed (in domain or application), not where they are implemented. This is "consumer-defined interfaces," the Go counterpart to dependency inversion.
  • Errors as domain values: a domain error like ErrUserNotFound = errors.New("user not found") plus var ErrUserNotFound for errors.Is matching. Adapter packages translate sql.ErrNoRows to domain.ErrUserNotFound at the seam.
  • Avoiding leakage: never let a *sql.Tx or a *http.Request cross into domain. The compiler will not stop you; the architectural test will.

17.3 Lab-"A Hexagonal URL Shortener"

Build a workspace implementing a URL shortener: - internal/domain -ShortURLaggregate,URLRepoandHasherports. -internal/application - Shorten and Resolve use cases. - internal/adapter/postgres - implementsURLRepoagainst a real Postgres (usepgxnotdatabase/sql). -internal/adapter/http - REST handlers using application. - internal/adapter/memory - in-memoryURLRepofor tests. -cmd/api - wires everything.

The architectural test (a Go test) walks the import graph and fails if internal/domain imports any adapter package or stdlib networking package.

17.4 Idiomatic & golangci-lint Drill

  • depguard (forbids cross-layer imports), revive: empty-block, gocritic: dupCase. The depguard rules become the executable architecture documentation.

17.5 Production Hardening Slice

  • Add depguard rules forbidding internal/domain from importing net/http, database/sql, context.Background, and any third-party adapter packages. CI fails on a violation. This is the architectural test in lint form.

Week 18 - Observability: slog, pprof, trace, OpenTelemetry

18.1 Conceptual Core

  • The "three pillars"-logs, metrics, traces-map cleanly to four Go tools:
  • Logs: log/slog (stdlib, since 1.21). Structured, context-aware.
  • Metrics: prometheus/client_golang plus `runtime/metrics - derived collectors.
  • Traces: go.opentelemetry.io/otel with an OTLP exporter.
  • Profiles: pprof (CPU, heap, allocs, block, mutex, goroutine).
  • Execution traces: `runtime/trace - the tool when none of the above tell you why a goroutine is slow.
  • Two cross-cutting principles:
  • Correlation: every log line, trace span, and metric label uses the same trace_id and request_id. This requires plumbing through context.Context.
  • Cardinality discipline: never put unbounded values (user IDs, URLs with query strings, request IDs) into metric labels.

18.2 Mechanical Detail-slog

  • slog.Default(), slog.New(handler), slog.With(attrs...). Handlers: JSONHandler, TextHandler, custom.
  • Context-aware logging: derive a logger per request, store in context (one of the few legitimate context.Value uses), retrieve at log sites:
    ctx = ContextWithLogger(ctx, slog.With("request_id", id))
    // ...
    Logger(ctx).Info("processed", "took_ms", elapsed.Milliseconds())
    
  • Sensitive-data redaction: implement a slog.LogValuer on types containing PII; the LogValue() method returns a redacted form. This pushes redaction into the type, not the call site.

18.3 Mechanical Detail-pprof and trace

  • Endpoints: import _ "net/http/pprof" registers handlers on http.DefaultServeMux. Mount on a separate port and gate behind auth-never on the public listener.
  • CPU profile: go tool pprof http://host:6060/debug/pprof/profile?seconds=30. Top-heavy stacks, flame graphs (pprof -http=:0).
  • Heap profile: pprof http://host:6060/debug/pprof/heap. Default mode inuse_space shows live retention; - alloc_objects` shows churn.
  • Block / mutex profiles: runtime.SetBlockProfileRate(1) and runtime.SetMutexProfileFraction(1). Off by default-turn on briefly when investigating.
  • Goroutine profile: ?debug=2 dumps full stacks. Read it line by line when chasing leaks.
  • Execution trace: runtime/trace.Start(w). Captures every G's lifecycle, GC events, syscall durations. Visualize with go tool trace. The most expensive but most informative tool.

18.4 Mechanical Detail-OpenTelemetry

  • SDK setup: otel.SetTracerProvider(...) + otel.SetTextMapPropagator(propagation.TraceContext{}). Use the OTLP gRPC exporter to a local collector.
  • Span creation: ctx, span := tracer.Start(ctx, "operation"); defer span.End(). Always defer span.End() immediately after creation.
  • Span attributes: low-cardinality. Never put PII in attributes.
  • Span events: structured logs attached to a span. Use sparingly; spans-as-logs leads to trace cardinality explosions.
  • Instrumentation libraries: otelhttp, otelgrpc, otelsql. Auto-propagate through standard transports.

18.5 Lab-"Wire the URL Shortener"

Take week 17's URL shortener and add: - slog JSON output with request-scoped logger via context. - /metrics Prometheus endpoint exposing request count, latency histogram, and Go runtime metrics. - OTLP traces exported to a local Jaeger via docker-compose. - /debug/pprof/* on a separate admin port, gated by IP allowlist. - A 30-second runtime/trace capture under load, committed as trace.out with a markdown analysis.

18.6 Idiomatic & golangci-lint Drill

  • forbidigo (forbid fmt.Println outside tests, log.Print* after slog adoption), loggercheck (uniform slog key conventions).

18.7 Production Hardening Slice

  • Add a sampling-based redaction layer at the slog.Handler level: any attribute matching email|password|token regex is redacted. Unit-test the redaction. This is a compliance prerequisite in regulated environments.

Week 19 - gRPC: Streaming, Interceptors, Deadlines, Retries, Outlier Ejection

19.1 Conceptual Core

  • gRPC is HTTP/2 with a binary protocol (Protocol Buffers) and four call shapes: unary, server-streaming, client-streaming, bidirectional-streaming.
  • The Go implementation (google.golang.org/grpc) is the canonical one. Read its source: google.golang.org/grpc/server.go, clientconn.go, stream.go.
  • Production gRPC concerns:
  • Deadlines: every call must have one. Set on the client; propagate via context.
  • Retries: configured via service config, with backoff. Idempotent calls only.
  • Interceptors: cross-cutting middleware (logging, tracing, metrics, auth). Both unary and stream variants.
  • Health checking: grpc.health.v1.Health standard service.
  • Load balancing: client-side, via resolver + balancer plugins.
  • Connection management: connections are HTTP/2 multiplexed; default max concurrent streams is 100-tune up for high-fanout clients.

19.2 Mechanical Detail

  • Server setup:
    s := grpc.NewServer(
        grpc.ChainUnaryInterceptor(
            recoveryInterceptor(),
            loggingInterceptor(logger),
            otelgrpc.UnaryServerInterceptor(),
            authInterceptor(),
        ),
        grpc.KeepaliveParams(keepalive.ServerParameters{
            MaxConnectionIdle: 5 * time.Minute,
        }),
    )
    
  • Client setup:
    cc, err := grpc.NewClient("dns:///service.local:50051",
        grpc.WithTransportCredentials(creds),
        grpc.WithChainUnaryInterceptor(otelgrpc.UnaryClientInterceptor()),
        grpc.WithDefaultServiceConfig(`{
            "loadBalancingConfig": [{"round_robin":{}}],
            "methodConfig": [{
                "name": [{"service":"foo.Bar"}],
                "retryPolicy": {
                    "maxAttempts": 3,
                    "initialBackoff": "0.1s",
                    "maxBackoff": "1s",
                    "backoffMultiplier": 2,
                    "retryableStatusCodes": ["UNAVAILABLE"]
                },
                "timeout": "2s"
            }]
        }`),
    )
    
  • Streaming patterns: server-streaming for log tail; client-streaming for batch upload; bidi for chat-like interaction. Always pair stream lifecycle with context.Context so cancellation works.
  • Outlier ejection: at the client balancer level, eject endpoints with high error rates. The xds balancer supports it natively; for simpler setups, implement a Picker wrapper.
  • Backpressure on streams: HTTP/2 has flow control. The Go gRPC implementation respects it. If your server is slow to send, the client will block writes, and vice versa. Do not rely on unbounded internal buffers.

19.3 Lab-"A Hardened gRPC Service"

Build a minimal Echo service with: - Unary + server-streaming + bidi methods. - Server interceptors for: panic recovery, request logging, OTel tracing, auth, rate limiting. - Client config with retries (UNAVAILABLE only), 2 s default deadline, round-robin load balancing. - A grpc.health.v1 health server. - A tools/grpc_load_test/ directory with `ghz - based load tests; capture latency p50/p95/p99 under 10K QPS.

19.4 Idiomatic & golangci-lint Drill

  • protogetter (use generated getters), goerr113 (use sentinel errors), nilerr, errchkjson.

19.5 Production Hardening Slice

  • Wire deadline propagation tests: a client request with a 500 ms deadline must result in the server seeing a context with a similar deadline (within a budget). Failure here is the single most common gRPC production bug.

Week 20 - Testing Strategy: Five Surfaces, Race-Clean

20.1 Conceptual Core

  • A production Go service has five test surfaces:
  • Unit-*_test.go in the same package, table-driven, fast.
  • Integration-*_test.go with a real Postgres/Kafka/Redis via testcontainers-go.
  • Property-based-gopter or stdlib testing/quick. Less common in Go than in Haskell/Rust, but valuable for parsers and serializers.
  • Fuzz-stdlib func FuzzX(f *testing.F) (since Go 1.18). Native, well-integrated, must be in CI.
  • End-to-end-the binary, with all real dependencies, via go run or compiled artifact.
  • Each surface answers a different question. Skipping one leaves a class of bugs uncovered.

20.2 Mechanical Detail

  • Table-driven test idiom:
    func TestParse(t *testing.T) {
        tests := []struct{
            name string
            in   string
            want Result
            err  error
        }{
            {"empty", "", Result{}, ErrEmpty},
            // ...
        }
        for _, tc := range tests {
            t.Run(tc.name, func(t *testing.T) {
                got, err := Parse(tc.in)
                if !errors.Is(err, tc.err) { t.Fatalf("err: got %v, want %v", err, tc.err) }
                if !cmp.Equal(got, tc.want) { t.Fatalf("got %v, want %v", got, tc.want) }
            })
        }
    }
    
  • testify/require for terse assertions, google/go-cmp/cmp for deep equality with custom comparers.
  • Fuzz tests:
    func FuzzParse(f *testing.F) {
        f.Add("hello")
        f.Fuzz(func(t *testing.T, in string) {
            out, err := Parse(in)
            if err == nil {
                if Roundtrip(out) != in { t.Fatal("not idempotent") }
            }
        })
    }
    
    Run as go test -fuzz=FuzzParse -fuzztime=30s. Persist the corpus.
  • testcontainers-go for integration: spin a real Postgres in-test, get a connection string, run schema migrations, exercise the adapter. Per-test cost is ~1–3 s container startup; amortize via test-package-level setup.
  • Race detector economics: - race` slows tests 5–10× and uses ~5–10× memory. Always run in CI; locally optional. Always run on a freshly written concurrent test before committing.

20.3 Lab-"Test-Pyramid the URL Shortener"

  • Unit: 100% line coverage on internal/domain and internal/application using mocks for ports.
  • Integration: testcontainers-go Postgres for the postgres adapter.
  • Fuzz: fuzz the alias-generation function, persisting any crashing inputs.
  • Property: gopter test that "shorten then resolve returns original URL."
  • E2E: a make e2e target that spins the full stack via docker-compose, hits the HTTP API, asserts behavior.
  • All five surfaces run in CI under - race -count=1`.

20.4 Idiomatic & golangci-lint Drill

  • tparallel, paralleltest (encourages t.Parallel()), thelper (mark test helpers), testifylint (correct testify usage).

20.5 Production Hardening Slice

  • Add a continuous-fuzzing job (e.g., scheduled GitHub Action) that runs each Fuzz* function for 5 minutes against the latest corpus. Persist the corpus as an artifact. Any new crashing input is a P0 issue with a runbook entry.

Month 5 Capstone Deliverable

A production-shaped url-shortener-prod/ workspace with: - Hexagonal layout enforced by depguard. - Full observability stack (slog + Prometheus + OTel + pprof). - gRPC sibling service (e.g., a metrics-export gRPC) with hardened client/server config. - Five test surfaces in CI, all - race - clean. - A one-page RUNBOOK.md describing alarms, dashboards, deadline budgets, and rollback procedures.

This is the first artifact that resembles a real production service. Treat it as a portfolio piece.

Month 6-Mastery: Consensus, Distributed Storage, Performance Tuning, Defense

Goal: by the end of week 24 you have shipped one capstone deliverable in your chosen track (Raft KV / gRPC mesh / streaming pipeline) and can defend every design decision in a senior-level technical interview.


Weeks

Week 21 - Consensus Algorithms: Raft (and a Glance at Paxos)

21.1 Conceptual Core

  • Consensus is the problem of getting N nodes to agree on a sequence of values despite arbitrary message loss, reordering, and node failure (but not Byzantine failure).
  • Raft is the modern teaching consensus: leader-based, log-replication-centric, decomposed into three sub-problems-leader election, log replication, safety. Read the Ongaro paper.
  • Paxos is the older, denser counterpart. Read the Paxos Made Simple paper for fluency, but use Raft in implementation.
  • The two properties Raft guarantees:
  • Log matching: if two logs contain an entry with the same index and term, they are identical up to that index.
  • Leader completeness: a committed entry exists in every leader's log thereafter.

21.2 Mechanical Detail

  • Roles: Follower → Candidate (on election timeout) → Leader (on majority vote).
  • Terms: monotonically increasing election epochs. Every RPC carries a term. Stale terms are rejected.
  • Log entries: (term, index, command). The leader appends client commands and replicates via AppendEntries.
  • Commit: an entry is committed when a majority has it in their log. The leader advances commitIndex. Once committed, the state machine can apply it.
  • Snapshots: long logs are compacted via InstallSnapshot. Without snapshots, restart time and storage grow unbounded.
  • Production Raft libraries in Go:
  • `hashicorp/raft - used by Consul, Nomad, Vault. Stable, mature, opinionated.
  • `etcd-io/raft - used by etcd, CockroachDB, Kubernetes etcd. More flexible, more low-level.
  • Read both. Pick etcd-io/raft for new builds-it has been hardened by years of CockroachDB and etcd production load.

21.3 Lab-"Read Raft in Anger"

  1. Read etcd-io/raft/node.go and raft.go end-to-end. Annotate the state machine transitions.
  2. Build a minimal in-memory KV store on top: a single goroutine consumes from node.Ready(), applies entries to a map[string]string, persists log entries to a WAL, sends messages to peers, and acknowledges.
  3. Run a 3-node cluster locally. Kill the leader; observe an election. Restart; observe log catchup.
  4. Add a snapshot mechanism every 10K entries.

21.4 Idiomatic & golangci-lint Drill

  • The Raft codebases are dense; do not lint-refactor them. Instead, study their style: small functions, explicit state transitions, testable seams.

21.5 Production Hardening Slice

  • Add jepsen-io/jepsen - style fault injection: random partition, random clock skew, random crash. Run for 30 minutes. Verify linearizability via youretcd-io/raft - derived KV's history.

Week 22 - Distributed Storage Patterns

22.1 Conceptual Core

  • The matching engine of a distributed storage system is consensus (week 21). The engineering of one is everything around it: durable storage, replication, partitioning, snapshotting, repair, observability.
  • Three patterns to know:
  • Replicated state machine (Raft, Paxos): one consensus group, each node holds the full data set. Linearizable; throughput limited by the leader.
  • Sharded replicated (etcd, CockroachDB ranges): many consensus groups, one per data shard. Horizontal scale.
  • Eventually consistent (Cassandra, DynamoDB): no consensus on the write path; quorum reads, hinted handoff, anti-entropy. Different consistency model.

22.2 Mechanical Detail

  • WAL discipline: every state-changing op is durably logged before acknowledgment. fsync after each batch (or per-op for stricter durability). The WAL is the source of truth for recovery.
  • Snapshots: periodic point-in-time captures of the state machine. Truncate the WAL behind them. Snapshot format must be efficient to ship to a recovering follower.
  • Membership changes: adding/removing nodes is the hardest correctness boundary. Raft's "joint consensus" handles this. Both hashicorp/raft and etcd-io/raft provide APIs; do not roll your own.
  • Linearizable reads: three options-read from the leader after a heartbeat round (etcd "linearizable read" with read-index), read from any node with a lease, or read after a no-op append. Each has tradeoffs.
  • Storage engine choice: BoltDB (simple, single-writer, great for Raft logs), BadgerDB (LSM-based, higher throughput), Pebble (CockroachDB's RocksDB replacement, the modern choice for high-throughput).

22.3 Lab-"Harden the KV Store"

Take the week 21 Raft KV and add: 1. Pebble as the storage engine for both the WAL and the state machine. 2. Snapshots every N entries, with InstallSnapshot to recovering followers. 3. Linearizable reads via read-index. 4. Membership changes: add and remove nodes online. 5. Metrics: per-node Raft state, log lag, snapshot duration, apply latency.

22.4 Idiomatic & golangci-lint Drill

  • errcheck, errorlint, wrapcheck. Distributed-systems code is almost entirely error handling; lint rigor is non-optional.

22.5 Production Hardening Slice

  • Add a Jepsen-style "nemesis" goroutine to your test harness that randomly partitions, pauses, and restarts nodes. Verify linearizability over 1M operations.

Week 23 - Performance Tuning: Profile, Tune, Re-Profile

23.1 Conceptual Core

  • The discipline: measure, then change. Never optimize from intuition. The profilers-pprof, trace, `runtime/metrics - are the source of truth.
  • A working flow: capture a baseline profile, propose a hypothesis, change one thing, re-profile, accept or reject. Commit each accepted change with the before/after profile linked.

23.2 Mechanical Detail-The Tuning Toolkit

  • CPU profile: pprof http://host/debug/pprof/profile?seconds=60. Top-heavy stacks identify the hot path. Look for runtime.mallocgc, runtime.scanobject, `runtime.gcDrain - these mean GC is the bottleneck.
  • Heap profile: identify retention. pprof -inuse_space shows what's live; - alloc_objects` shows churn.
  • Block profile: runtime.SetBlockProfileRate(1) then pprof /debug/pprof/block. Identifies channel/syscall waits.
  • Mutex profile: runtime.SetMutexProfileFraction(1) then pprof /debug/pprof/mutex. Identifies contended locks.
  • Goroutine profile: stack distribution. Sudden growth = leak.
  • Execution trace: go tool trace. The expensive but most informative tool. Identifies scheduler latencies, GC pauses, and goroutine-state transitions.
  • PGO (Profile-Guided Optimization): stable since Go 1.21. Capture a representative CPU profile in production, place it as default.pgo, rebuild. ~5–10% throughput win on hot paths.
  • benchstat: compare two go test -bench runs statistically. Reports geomean and significance.

23.3 Mechanical Detail-Common Wins

  • Replace interface{} boxing in hot paths with concrete types or generics.
  • Reuse allocations via sync.Pool (with the discipline from Week 8).
  • Pre-size slices and maps when capacity is known: make([]T, 0, n), make(map[K]V, n).
  • Avoid defer in hot loops (Go 1.14+ made defer ~zero-cost in most cases, but the loop variant still has overhead).
  • strings.Builder over += for building strings.
  • Slice instead of map for small collections (<~50 entries)-linear scan beats hash on modern caches.
  • Goroutine cost: launching is cheap, but the aggregate of millions of goroutines on a long-tail-blocked path is not. Bound concurrency.

23.4 Lab-"Profile-Tune-Profile"

Take your capstone (whatever track) and: 1. Capture a CPU profile under representative load. Identify the top 5 functions. 2. Pick one and propose a fix. Estimate the win in advance. 3. Implement, re-profile, compare with benchstat. Document each change in PERF_LOG.md. 4. Capture a runtime/trace and identify any GC or scheduler stalls. Fix one. 5. Apply PGO. Confirm the win.

23.5 Idiomatic & golangci-lint Drill

  • prealloc, gosimple S1024 (time.Sub instead of time.Now().Sub), gocritic: rangeValCopy. Final lint pass-your codebase should be near-zero findings.

23.6 Production Hardening Slice

  • Wire PGO into your release pipeline: a "canary" deploy collects a profile, the next "stable" build uses it. Document the procedure in RELEASE.md.

Week 24 - Capstone Integration, Defense, Final Hardening

24.1 Conceptual Core

The final week is integration, not new material. Bring your chosen capstone (see CAPSTONE_PROJECTS.md) to production-defensible quality.

24.2 The Final Hardening Checklist

By now, every previous module has fed the hardening/ template. Roll it up into one final release-checklist.md:

  • gofmt, go vet, golangci-lint run clean (zero findings, all nolint annotations have a documented reason).
  • All tests pass under - race -count=10`.
  • Fuzz harnesses for every parser/serializer; CI runs them for ≥30s per fuzzer.
  • goleak passes for every package using goroutines.
  • PGO applied; benchmark deltas committed.
  • pprof endpoints behind admin port + auth; documented.
  • OTel traces, Prometheus metrics, slog JSON logs-wired and tested.
  • GOMEMLIMIT set from cgroup memory at startup.
  • runtime.SetMaxStack set to a sane bound (default 1 GiB is too lenient).
  • Cross-compilation matrix green: linux/amd64, linux/arm64, darwin/arm64 minimum.
  • Build is reproducible: - trimpath, pinned toolchain, deterministicDockerfile`.
  • Binary size optimized: - ldflags="-s -w", optionallyupx` if startup time is irrelevant (rarely worth it).
  • SBOM generated (cyclonedx-gomod); release artifacts signed (cosign).
  • RUNBOOK.md, THREAT_MODEL.md, ADRs (≥3), and SECURITY.md present.
  • On-call alarms wired to the metrics that matter (p99 latency, error rate, goroutine count, GC pause p99, memory headroom).

24.3 Lab-"Defend the Design"

Schedule a 45-minute mock review with a senior peer (or record yourself). Present: - The architecture diagram. - One slide per non-obvious decision (e.g., "why etcd-io/raft over hashicorp/raft", "why Pebble over BoltDB", "why server-streaming over polling"). - A live demo of the test suite ( - race`, fuzzing, integration). - A live demo of the observability stack (Jaeger, Prometheus, pprof). - A live demo of fault tolerance (kill the leader, watch recovery).

The deliverable is the defense, not the slides. If you cannot answer "what is the worst-case write latency under leader change?" or "what is your goroutine count under 10× load?", you have not yet finished the curriculum.

24.4 Idiomatic & golangci-lint Drill

  • Final pass: golangci-lint run --enable-all --disable=lll,wsl --timeout=10m. Either fix or //nolint:linter // reason with a justification. Zero unjustified suppressions.

24.5 Production Hardening Slice

  • Tag the capstone repo v1.0.0. Generate a release artifact with goreleaser. Sign with cosign. Publish a CHANGELOG. The final commit hash is the artifact you reference on your resume.

Month 6 Deliverable

The chosen capstone (see CAPSTONE_PROJECTS.md)-running, defensible, hardened. Plus the hardening/ template, now a publishable Go-module starter under your name.

You are done. The next steps are no longer pedagogical; they are professional.

Appendix A-Production Hardening Reference

This appendix consolidates the hardening slices distributed throughout the curriculum. By week 24 the reader's hardening/ template should contain a working example of every section below.


A.1 Build & Release

A.1.1 go build flags worth knowing

  • ** - trimpath`-strips local file paths from the binary. Always on** in release builds; required for reproducible builds.
  • ** - ldflags="-s -w"`**-strips DWARF and symbol tables. ~30% size reduction. Only enable for production releases (debugging is harder; core dumps less useful).
  • ** - ldflags="-X main.version=v1.2.3"**-embeds version info. Pair with - X main.commit=$(git rev-parse HEAD) and - X main.buildDate=...`.
  • ** - buildmode=pie`**-position-independent executable. Required for ASLR on hardened deployments.
  • ** - buildvcs=true**-embed VCS info (default on with modules);go version -m ` reads it back.
  • ** - tags=netgo,osusergo`**-pure-Go DNS/user resolvers. Required for fully static binaries on Linux.

A.1.2 Build tags for cross-platform code

//go:build linux && amd64
// +build linux,amd64

package foo
- Tags gate file-level compilation. - Common patterns: //go:build linux, //go:build !windows, //go:build integration (for slow tests), //go:build debug. - Avoid runtime runtime.GOOS checks where a build tag would do-the dead-code path costs binary size.

A.1.3 Cross-compilation

GOOS=linux   GOARCH=amd64 go build -trimpath -o bin/svc-linux-amd64 ./cmd/svc
GOOS=linux   GOARCH=arm64 go build -trimpath -o bin/svc-linux-arm64 ./cmd/svc
GOOS=darwin  GOARCH=arm64 go build -trimpath -o bin/svc-darwin-arm64 ./cmd/svc
GOOS=windows GOARCH=amd64 go build -trimpath -o bin/svc-windows-amd64.exe ./cmd/svc
- Pure-Go modules cross-compile out of the box. - CGO modules require a cross C toolchain-use zig cc via CGO_ENABLED=1 CC="zig cc -target aarch64-linux-musl" for the simplest setup.

A.1.4 Static linking

  • CGO_ENABLED=0 produces a fully static binary on Linux. The default for containerized Go services unless you specifically need CGO (sqlite, libsystemd, etc.).
  • For services that must CGO: link against musl via Alpine or use gcc -static carefully; glibc-static-linking is fragile.

A.1.5 Reproducible builds

  • Pin toolchain via go.mod toolchain go1.22.X.
    • trimpath`.
  • Avoid time.Now() in init() or `build.go - equivalent.
  • Build inside a deterministic image: a pinned Alpine, golang:1.22.X-alpine@sha256:... with content hash.
  • Confirm reproducibility: sha256sum bin/svc should match across machines and builds of the same commit.

A.1.6 goreleaser

  • The de-facto Go release tool. One config file produces: cross-compiled binaries, tar.gz/zip archives, Homebrew tap, Linux packages (deb/rpm), Docker images, GitHub Releases, SBOM, signatures.
  • Replaces ~500 lines of Makefile+CI-script glue. Adopt early.

A.2 Linting and Static Analysis

A.2.1 golangci-lint baseline configuration

A reasonable starting .golangci.yml:

run:
  timeout: 5m
  go: "1.22"
linters:
  disable-all: true
  enable:
    - errcheck
    - govet
    - staticcheck
    - gosimple
    - ineffassign
    - unused
    - revive
    - gocritic
    - gosec
    - bodyclose
    - rowserrcheck
    - sqlclosecheck
    - nilerr
    - prealloc
    - unconvert
    - unparam
    - misspell
    - depguard
    - contextcheck
    - errorlint
    - exhaustive
    - forbidigo
    - goerr113
    - testifylint
    - tparallel
    - thelper
    - paralleltest
    - fieldalignment
    - copyloopvar
    - intrange
linters-settings:
  errcheck:
    check-blank: true
  govet:
    enable-all: true
  depguard:
    rules:
      domain-purity:
        list-mode: lax
        files: ["**/internal/domain/**"]
        deny:
          - pkg: net/http
            desc: domain must not import HTTP
          - pkg: database/sql
            desc: domain must not import SQL
issues:
  max-issues-per-linter: 0
  max-same-issues: 0

A.2.2 The race detector is non-negotiable

go test -race -count=1 ./...
- ~5–10× slowdown, ~5–10× memory. - Catches data races by adding a happens-before tracking layer. - Never commit code that has not been tested under - race`.

A.2.3 go vet

  • Subset of golangci-lint (which runs vet internally), but the standalone command is fast and honest.
  • Critical analyzers: printf, lostcancel, copylocks, loopclosure, nilness, shadow, unsafeptr.

A.2.4 staticcheck

  • The most rigorous Go linter. Maintained separately from go vet. Documented at staticcheck.io.
  • High-value codes: SA1015 (time.Tick leak), SA1029 (context.WithValue collisions), SA4006 (unused write), SA6002 (sync.Pool non-pointer).

A.3 Profiling and Tracing

A.3.1 pprof endpoints-production setup

import _ "net/http/pprof"
// ...
go func() {
    log.Fatal(http.ListenAndServe("127.0.0.1:6060", nil)) // admin port, never public
}()
- Bind to localhost or an internal interface only. - For Kubernetes: use a sidecar or an kubectl port-forward for ad-hoc access.

A.3.2 The pprof commands you will run weekly

go tool pprof -http=:0 http://host:6060/debug/pprof/profile?seconds=30   # CPU
go tool pprof -http=:0 http://host:6060/debug/pprof/heap                  # heap (inuse)
go tool pprof -http=:0 -alloc_objects http://host:6060/debug/pprof/heap   # allocations
go tool pprof -http=:0 http://host:6060/debug/pprof/goroutine             # goroutines
go tool pprof -http=:0 http://host:6060/debug/pprof/block                 # block (after SetBlockProfileRate)
go tool pprof -http=:0 http://host:6060/debug/pprof/mutex                 # mutex contention

A.3.3 runtime/trace

f, _ := os.Create("trace.out")
trace.Start(f); defer trace.Stop()
- View with go tool trace trace.out. - Use when pprof doesn't explain a latency stall-trace shows the exact timeline of every G across every P.

A.3.4 PGO (Profile-Guided Optimization)

  1. Run a representative load against your service.
  2. Capture: curl -o cpu.pprof http://host:6060/debug/pprof/profile?seconds=60.
  3. Place at default.pgo in the package containing main.
  4. Rebuild: go build -pgo=auto.
  5. Expect ~5–15% throughput win on hot paths. Combine with PGO-update cadence in your release flow.

A.4 Observability Standards

A.4.1 Logging

  • Use log/slog (stdlib, since 1.21).
  • JSON handler in production; Text handler locally.
  • Per-request scoped logger via context.Context.
  • Levels: Debug (off in prod), Info, Warn (something to watch), Error (a human should look). Never Panic or Fatal for recoverable errors.
  • Sensitive-attribute redaction at the handler.

A.4.2 Metrics

  • Use prometheus/client_golang with collectors.NewGoCollector(collectors.WithGoCollections(...)) for Go runtime metrics from runtime/metrics.
  • The four golden signals: latency (histogram), traffic (counter), errors (counter), saturation (gauge).
  • Never unbounded labels.

A.4.3 Traces

  • OpenTelemetry SDK + OTLP gRPC exporter.
  • Auto-instrument with otelhttp, otelgrpc, otelsql.
  • Sampling: head-based (e.g., 1%) for high-QPS services; tail-based (via collector) for systems where rare errors matter most.

A.4.4 The "useful errors" hardening pass

  • Wrap with fmt.Errorf("doing X: %w", err) at every layer, preserving %w for errors.Is/errors.As.
  • Sentinel errors at domain boundaries: var ErrNotFound = errors.New("not found").
  • Structured errors only when you need typed fields: type ValidationError struct{ Field, Reason string } with Error() method.
  • Never panic for recoverable conditions. Reserve panic for "the program's invariants are violated" (e.g., a nil pointer that should never be nil).

A.5 Memory Tuning

A.5.1 The two knobs

  • `GOGC - heap-growth ratio. Default 100 (next-GC = 2× live). Lower = more frequent GC = less memory; higher = less frequent = more throughput, more memory.
  • `GOMEMLIMIT - soft memory ceiling. Default off. Set this in containers to ~90% of cgroup memory.

A.5.2 The setup pattern

import _ "go.uber.org/automaxprocs"           // honor cgroup CPU
import "runtime/debug"

func init() {
    if v := os.Getenv("MEMORY_LIMIT_BYTES"); v != "" {
        if n, err := strconv.ParseInt(v, 10, 64); err == nil {
            debug.SetMemoryLimit(n)
        }
    }
}

A.5.3 automaxprocs

  • Uber's small library that sets GOMAXPROCS based on cgroup CPU quota. Without it, a container limited to 0.5 CPUs still sees the host's full CPU count and spawns too many P's.
  • Adopt by default in all containerized services.

A.6 The Hardening Template

By week 24, the hardening/ template should contain:

hardening/
  .golangci.yml
  .goreleaser.yaml
  Dockerfile                    # multi-stage, scratch or distroless final
  Makefile                      # fmt, vet, lint, test, race, bench, profile
  cmd/svc/main.go               # idiomatic composition root
  internal/
    platform/
      observability/            # slog + prom + otel + pprof wiring
      memlimit/                 # GOMEMLIMIT from env
      shutdown/                 # graceful shutdown helper
  ci/
    test.yml                    # fmt + vet + lint + test -race
    bench.yml                   # benchstat against baseline
    fuzz.yml                    # nightly fuzz
    release.yml                 # goreleaser on tag
  RELEASE_CHECKLIST.md
  RUNBOOK.md
  SECURITY.md
  THREAT_MODEL.md

This is the artifact that should accompany every Go service you ship after week 24.

Appendix B-Build-From-Scratch Data Structures and Patterns

A working Go engineer should have implemented each of the following at least once, with - race - clean tests, allocation benchmarks, and goroutine-leak verification (where concurrent). This appendix sketches the minimal-viable design for each.


B.1 Lock-Free SPSC Ring Buffer

When: real-time control loops, log shippers, audio paths, any 1-producer-1-consumer with strict latency budgets.

Design: - [Cap]T backing array (Cap a power of two for cheap modulo). - head and tail are atomic.Uint64, each on its own cache line ([7]uint64 padding). - Producer: load tail (relaxed), check space against head, write slot, store tail with release semantics (in Go: any atomic store). - Consumer: symmetric. - Wait-free per side; needs no CAS, only atomic loads/stores.

Lab outcomes: cache-line awareness, the Go memory model in practice, why chan T is not always the answer.


B.2 Lock-Free MPMC Bounded Queue

When: work-stealing schedulers, bounded work pools where contention is significant.

Design: - Slot array; each slot is atomic.Uint64 encoding (seq, occupied flag). - Producer CAS the seq from "empty at lap N" to "occupied at lap N". - Consumer CAS from "occupied at lap N" to "empty at lap N+1". - Avoids the ABA problem; same algorithm as Vyukov's bounded MPMC queue.

Lab outcomes: encoding state in atomics, lock-free without epoch reclamation, why the modular-arithmetic seq counter is correct.


B.3 Sharded Map

When: high-concurrency read+write workloads. Always consider this before reaching for sync.Map.

Design: - N shards (typically 16–64), each struct{ sync.RWMutex; m map[K]V }. - Hash key, mod N, lock the shard, perform the op. - Iteration: lock shards in order, snapshot or hold each.

Lab outcomes: sync.Map vs sharded vs RWMutex+map comparison; xxhash for non-string keys; the cost of interface{} boxing at the API boundary (use generics).


B.4 LRU Cache

When: API caches, decoded-value caches, anything where hot data fits and cold should evict.

Design: - Doubly-linked list of entries; map from key to list-element. - On hit: move to front. On miss + full: evict back; allocate new at front. - Concurrent variant: shard by key, per-shard mutex + per-shard list.

Lab outcomes: pointer hygiene in linked structures; the standard container/list is fine but allocates an extra struct per entry-for hot caches, an inlined doubly-linked list of the entries themselves saves ~30% memory.


B.5 Bloom Filter

When: "definitely-not-in-set" pre-checks in front of expensive lookups (DB, network).

Design: - Bit array of size m, k hash functions. - Add: set k bits. - Contains: all k bits set ⇒ probably in set; any unset ⇒ definitely not. - Tune m and k for target false-positive rate.

Lab outcomes: hash mixing (hash/maphash is the right primitive in modern Go), the math of false positives, when not to use one (size of input, write-amplification).


B.6 Concurrent Skiplist

When: ordered concurrent maps; the foundation of, e.g., RocksDB-style memtables.

Design: - Tower of forward pointers per node, height geometrically distributed. - Insert: build new node bottom-up, link levels via CAS. - Removal: logical (mark deleted) then physical (unlink).

Lab outcomes: lock-free with non-trivial structure, randomization in algorithm design, the alternative to balanced trees in concurrent settings.


B.7 Worker Pool With Backpressure

When: every production service.

Design: - N workers consuming from a buffered task channel. - Submission: non-blocking try-send with overflow → drop-with-metric, or blocking send for back-pressure. - Per-task timeout: derived context. - Panic isolation: each worker's task call is wrapped in defer recover(). - Graceful shutdown: ctx.Done() triggers drain with a deadline.

Lab outcomes: this is the worker pool from week 12, refined.


B.8 Rate Limiter

When: ingress protection, downstream-call throttling, fair multi-tenant scheduling.

Design: - Token bucket (golang.org/x/time/rate): refill at fixed rate up to a burst capacity. - Leaky bucket: same shape, different metaphor. - For per-key limiting: an LRU-bounded map of token-buckets.

Lab outcomes: study x/time/rate source-it is small and elegant; understand the difference between Allow() (immediate decision), Wait(ctx) (block until token), Reserve() (return a reservation that can be cancelled).


B.9 Circuit Breaker

When: any client of an external service.

Design: - Three states: Closed (normal), Open (fail fast), HalfOpen (probe). - Counts failures; on threshold → Open. - After cooldown → HalfOpen; one probe → Closed on success, Open on failure. - sony/gobreaker is a battle-tested reference; read its source then build your own.

Lab outcomes: state machines without globals; per-endpoint instances; the metric exports that operations actually wants (state changes, failure rate).


B.10 Singleflight

When: cache stampede mitigation, deduplicating concurrent identical requests.

Design: - A map[key]*call where call holds the result/error and a sync.WaitGroup. - First caller for a key creates the call, runs the work, signals completion. - Concurrent callers for the same key wait on the WaitGroup, share the result.

Lab outcomes: study golang.org/x/sync/singleflight; understand why the result-shape is (value, err, shared) (the shared flag matters for caches).


B.11 Lock-Free Counter / Histogram

When: high-frequency metrics where contention on a single atomic kills throughput.

Design: - N per-CPU (or per-P) counters; aggregate on read. - Use runtime.GOMAXPROCS + manual sharding, or sync/atomic with cache-line padding per shard. - Prometheus's prometheus.NewCounter is single-atomic and is fine for most uses; only build this when profiling shows contention.

Lab outcomes: why atomic.AddInt64 becomes the bottleneck at >10M/s/core; the cost of false sharing in metrics implementations.


Difficulty Ranking

Tier Structures
Warmup Worker pool, LRU, Sharded map
Intermediate Rate limiter, Circuit breaker, Bloom filter, Singleflight
Advanced SPSC ring, MPMC queue, Lock-free counter
Expert Concurrent skiplist

Pick at least one from each tier. Ship with - racetests, allocation benchmarks, andgoleak`.

Appendix C-Contributing to Go: A Playbook

Most engineers never contribute to a language. The barrier is procedural ("how does Gerrit work?") more than technical. This appendix is the on-ramp.


C.1 Mental Model

The Go project ("golang/go") is a single repository containing the compiler, the runtime, the standard library, the linker, and most of the toolchain. It is mirrored to GitHub for visibility, but the primary development happens at go-review.googlesource.com using Gerrit (not GitHub PRs).

Three implications: 1. You file changes as CLs (changelists) in Gerrit, not PRs. 2. Reviewers leave inline comments and a numeric vote (-2..+2). +2 from a maintainer is required to merge. 3. The maintainer set is small and prioritizes correctness over speed. A two-week review cycle is normal; a six-month cycle is not unheard of.


C.2 The Pipeline, in 30 Seconds

Source (.go)
  │ Lexer ── tokens
  │ Parser ── AST  (cmd/compile/internal/syntax)
  │ Type checker ── typed AST  (cmd/compile/internal/types2)
  │ IR construction (cmd/compile/internal/ir, ssagen)
  │ SSA  (cmd/compile/internal/ssa)
  │ SSA optimizations (rules/*.rules)
  │ Lowering to architecture-specific SSA
  │ Code generation
  │ Object file (.o)
  │ Linker (cmd/link)-combines objects + runtime
Executable

Then at runtime, the executable starts in runtime/asm_*.sruntime.rt0_go → scheduler bootstrap → runtime.main → user main.


C.3 Setting Up

git clone https://go.googlesource.com/go
cd go/src
./make.bash      # builds the toolchain (~3-5 min)
./run.bash       # full test suite (~20 min)

The new toolchain is at ../bin/go. Use it for testing your changes:

../bin/go test -run TestSomething ./...

For Gerrit, install git-codereview:

go install golang.org/x/review/git-codereview@latest
git config alias.change "codereview change"
git config alias.mail "codereview mail"
git config alias.sync "codereview sync"

Sign the CLA at cla.developers.google.com (individual or corporate). Without it, no CL can merge.


C.4 Where the Easy Wins Are

In rough order of difficulty:

C.4.1 Documentation fixes

  • Typos, unclear sentences, missing examples in stdlib godoc. Search the issue tracker for Documentation label.
  • Touch the .go file's doc comment, send a CL. ~10 lines, ~1-week review.

C.4.2 Stdlib bug fixes

  • Look for NeedsFix + help wanted labels. The time, net/http, encoding/*, database/sql packages have a steady flow of small bugs.
  • Reproduce, write a minimal failing test, fix, send.

C.4.3 New stdlib examples

  • Many functions lack ExampleX testable examples. These render in godoc and are doubly useful as tests.
  • Trivially good first contribution.

C.4.4 New go vet analyzers

  • cmd/vet is small and well-organized. Adding a new analyzer for a recurring bug pattern (with rationale) is a tractable medium contribution.

C.4.5 Compiler diagnostics

  • "Why is this error message confusing?" issues are tagged. Improving cmd/compile/internal/types2's error wording is high-impact and well-bounded.

C.4.6 Compiler optimizations

  • SSA peephole rules in cmd/compile/internal/ssa/_gen/*.rules. Each rule is a pattern → replacement transformation, tested via assembly tests.
  • Higher bar but still single-CL-sized contributions exist.

C.4.7 Don't start here (yet)

  • The runtime scheduler proper (runtime/proc.go).
  • The garbage collector (runtime/mgc.go).
  • The linker.
  • Anything involving language semantics (proposals → go/proposal repo, separate process, ~year-scale).

C.5 The First-CL Workflow

  1. Find an issue. Comment to claim it (no @bot mechanism like rust-bot; just be polite and indicate you're working on it).
  2. Branch from master:
    git sync
    git change <branch-name>
    
  3. Make changes. Run:
    ../bin/go vet ./...
    ../bin/go test -short ./...
    gofmt -l -d .
    
  4. Commit with a Go-style commit message:
    net/http: fix Server.Shutdown deadlock when called from request handler
    
    Previously, calling Server.Shutdown from within an active request
    handler would deadlock because [...].
    
    Fixes #12345
    
    First line: package: short description. Body: rationale and any non-obvious context. Trailer: Fixes #N if applicable.
  5. Mail the CL:
    git mail
    
    This pushes to Gerrit and creates a review thread.
  6. Address review. Reviewers comment inline. Each round: amend the existing commit (do NOT create new commits-git change --amend is the workflow), git mail again. The CL gets new patch sets.
  7. +2 from a maintainer + +1 from the trybots → merged. Your name is in the commit log.

C.6 The Go Source Reading Map

When the compiler/runtime is opaque, these are the highest-yield reads:

File What it teaches
runtime/runtime2.go The data model: g, m, p, schedt, hchan, mutex. Reference.
runtime/proc.go The scheduler. schedule, findrunnable, newproc, gopark.
runtime/mgc.go GC entry points and pacing.
runtime/mbarrier.go The write barrier.
runtime/malloc.go Allocator (mcache → mcentral → mheap).
runtime/chan.go Channel internals.
runtime/iface.go Interface dispatch and itab cache.
runtime/select.go select semantics (subtle; read slowly).
runtime/preempt.go Async preemption mechanism.
runtime/netpoll.go epoll/kqueue/IOCP integration.
cmd/compile/internal/escape/escape.go Escape analysis.
cmd/compile/internal/ssa/ SSA IR, optimization passes.

Read in order: runtime2.goproc.gochan.goiface.gomgc.gomalloc.go → escape analysis → SSA. Allow weeks, not days.


C.7 Adjacent Targets if golang/go Is Too Heavy

  • golangci-lint-Active, friendly, fast review. Add a linter, fix a false positive.
  • staticcheck-Higher bar than golangci-lint, but smaller surface and Dominik Honnef is a thoughtful reviewer.
  • golang.org/x/* repos-x/tools, x/sync, x/exp. Same Gerrit workflow as golang/go, but sometimes faster review.
  • gopls-language server. High-impact contributions; AST/types fluency from week 14 directly applies.
  • prometheus/client_golang, grpc/grpc-go, etcd-io/etcd-large Go projects with active maintainers and well-documented contribution flows. A merged PR signals real-world Go fluency.

C.8 Calibration

A reasonable goal for a curriculum graduate:

  • By end of week 23: a CL open against golang/go (a stdlib doc fix or small bug fix) or a PR against a well-known Go project.
  • By end of capstone: that CL/PR merged.
  • 6 months post-curriculum: a non-trivial CL-a stdlib API addition, a go vet analyzer, a compiler diagnostic improvement.

These are realistic timelines. The maintainers prioritize stability. Do not be discouraged by a four-week review cycle; that is healthy.

Capstone Projects-Three Tracks, One Choice

The Month 6 capstone is the deliverable that converts this curriculum from study into evidence. Pick one track. The work performed here is what you describe in interviews and link from a portfolio.


Track 1-Distributed Storage: A Raft-Replicated KV Store

Outcome: a 3+ node Raft-replicated key-value store with linearizable reads, snapshots, online membership changes, and a `jepsen - style fault-injection harness verifying linearizability.

Functional spec

  • gRPC API: Get(key), Put(key, value), Delete(key), Watch(prefix) stream.
  • Cluster API: AddNode, RemoveNode, Leadership.
  • Linearizable reads via read-index.
  • Snapshots every N entries (default 10K) with InstallSnapshot to recovering followers.
  • Persistent WAL via Pebble or BoltDB.
  • TLS between nodes; mutual auth via x509.

Non-functional spec

  • Sustained 50K writes/sec on commodity hardware (3-node, NVMe).
  • Sub-10 ms write latency p99 under 50% utilization.
  • Recovery time (leader change → fully available) under 1 s for a 3-node cluster.
  • Survives a single-node crash without data loss; survives a network partition with a clear majority.

Architecture sketch

  • One goroutine per node consumes from etcd-io/raft Ready channel.
  • Apply loop: stream committed entries → state machine → respond to clients.
  • Network: gRPC with a long-lived bidi stream per peer pair.
  • State machine: a sharded map[string][]byte with versioning for Watch.

Test rigor

  • Unit: state-machine transitions, log-truncation invariants.
  • Integration: 3-node local cluster via t.Run, exercise membership.
  • Fault injection: a "nemesis" goroutine that randomly partitions, pauses, crashes nodes; client offer history fed to a linearizability checker (Knossos in Clojure-via-process, or a lightweight Go port).
  • Race-clean under sustained load.

Hardening pass

  • goreleaser, cosign signing, SBOM via cyclonedx-gomod.
  • GOMEMLIMIT from cgroup; automaxprocs.
  • PGO with a representative workload.
  • pprof + runtime/trace capture endpoints.
  • OTel traces across the Raft RPC layer (custom interceptor).
  • A RUNBOOK.md covering: leader-stuck triage, log-corruption recovery, snapshot-restore procedure.

Acceptance criteria

  • Public repo with all of the above.
  • A README that includes: a topology diagram, a load-test latency CDF, a Jepsen-style report.
  • Defensible answer to: "What happens during a network partition where a majority can elect a new leader but the old leader is still up?"

Skills exercised

  • Months 3 (concurrency), 5 (gRPC, observability), 6.21–6.22 (Raft, distributed storage).

Track 2-Service Mesh: A gRPC Microservices Mesh

Outcome: a multi-service mesh demonstrating a custom service registry, health checking, deadline propagation, retries, outlier ejection, and end-to-end OTel tracing across at least four interconnected services.

Functional spec

  • A Registry service: gRPC interface for Register, Deregister, Watch, LookupHealthy. Backed by an in-memory store with optional Raft replication (composes with Track 1).
  • A Sidecar library that:
  • Resolves service names via the registry (custom gRPC resolver.Builder).
  • Implements client-side load balancing with round-robin + outlier ejection.
  • Propagates OTel context, deadlines, and a request_id.
  • Adds retry policy via service config.
  • Four demo services (e.g., user, order, inventory, payment) with a fan-out call graph that exercises retries, timeouts, and partial failures.
  • A mesh-cli for service inspection and chaos injection.

Non-functional spec

  • Sub-millisecond p99 sidecar overhead per RPC.
  • Outlier ejection within 10 s of an endpoint going bad.
  • Deadline propagation: an inbound 1 s deadline must result in downstream calls seeing strictly less than 1 s remaining.

Architecture sketch

  • Each service runs the sidecar library in-process (no separate sidecar binary-keep it simple, defensible).
  • Registry uses etcd-io/raft if Track 1 also chosen; otherwise a single-instance with TLS.
  • Service discovery uses long-poll Watch via gRPC server-streaming.

Test rigor

  • Unit: resolver, balancer, interceptor stacks.
  • Integration: spin all four services in-process, exercise the call graph with testcontainers for the registry's Postgres if used.
  • Chaos: a chaos-injector middleware that drops/delays/errors random %.
  • Latency tests with ghz at multiple QPS levels.

Hardening pass

  • pprof everywhere; OTel everywhere.
  • goleak per-service.
  • A reproducible Docker Compose stack and a one-command make demo that brings it up with Jaeger and Prometheus.
  • Alarms wired: Prometheus rules on per-service error rate, p99 latency, registry watch lag.

Acceptance criteria

  • All four services deployable with make demo.
  • A flame graph demonstrating where sidecar overhead lives.
  • A trace screenshot showing deadline-propagated failure across the call chain.
  • Defensible answer to: "What happens if the registry leader is down for 30 seconds?"

Skills exercised

  • Months 3 (concurrency), 5 (gRPC mastery, observability), 6 (capstone defense, performance).

Track 3-Streaming Pipeline: A Kafka-Compatible Ingestion + Stream Processor

Outcome: a Kafka-protocol-compatible (subset) broker plus a stream-processing framework, with at-least-once delivery, exactly-once-effective consumer offsets, and replay.

Functional spec

  • Broker: implements a subset of the Kafka wire protocol (Produce, Fetch, Metadata, ListOffsets, OffsetCommit, OffsetFetch). Disk-backed log per partition; segment + index files.
  • Stream processor: a small framework letting users write func(input Stream[T]) Stream[U] with operators (Map, Filter, Window, Aggregate, Join).
  • Consumer: offset management, rebalance protocol (subset).
  • Producer: idempotent producer (within session).
  • Compatibility: works with franz-go (the leading Kafka Go client) for at least Produce/Fetch.

Non-functional spec

  • 200K msgs/sec sustained on a single partition (commodity NVMe).
  • Sub-50 ms producer ack p99 with acks=all.
  • Replay from arbitrary offset.
  • Crash-recoverable: WAL fsync semantics documented.

Architecture sketch

  • One goroutine per partition for the disk-write path.
  • mmap'd index files; sequential append to log files.
  • Replication: Raft per partition (composes with Track 1) or a simpler primary-backup with a documented data-loss window.

Test rigor

  • Unit: log segment boundary handling, offset arithmetic, index lookup.
  • Integration: produce-and-consume tests against franz-go.
  • Fuzz: protocol parser fuzzed against malformed records.
  • Crash test: kill -9 during write; restart; verify WAL recovery.

Hardening pass

  • pprof for the hot path (the produce-write loop must be 0 allocs/op per record).
  • PGO with a sustained-throughput profile.
  • runtime/trace artifact showing zero scheduler stalls under load.

Acceptance criteria

  • Public repo, a reference-grade README.
  • A throughput/latency benchmark vs. real Kafka on the same hardware.
  • A replay demo showing rewinding consumer offset to a specific timestamp.

Skills exercised

  • Months 2 (memory + GC tuning, allocation discipline), 3 (concurrency at 200K msgs/sec), 5 (observability), 6.22 (storage patterns).

Cross-Track Requirements

Regardless of track:

  • Hardening template integrated. The hardening/ template from Appendix A applies.
  • Architectural Decision Records (ADRs). At least three for the capstone, each ~1 page.
  • Threat model. One page minimum, no matter the track.
  • Defense readiness. You should be able to walk a reviewer through the code in 45 minutes and answer "what fails first under load / fuzzing / a malicious input / a network partition?"

The track choice signals career direction: Track 1 for distributed-systems infrastructure roles, Track 2 for platform/SRE/networking roles, Track 3 for data-infra/streaming roles. Pick based on where you want the next interview loop, not on what looks easiest.

Worked example - Week 6: reading a GODEBUG=gctrace=1 output

Companion to Go Mastery → Month 02 → Week 6: The Garbage Collector. The week explains the tricolor concurrent mark-sweep algorithm. This page walks one real gctrace=1 line from a running program so the next time you see it in production logs, every field has meaning.

The program

// allocator.go
package main

import (
    "fmt"
    "time"
)

func main() {
    var sink []*[1024]byte
    for i := 0; i < 1_000_000; i++ {
        b := new([1024]byte)
        b[0] = byte(i)
        sink = append(sink, b)
        if i%50_000 == 0 {
            time.Sleep(10 * time.Millisecond) // give GC room to breathe
        }
    }
    fmt.Println("allocated", len(sink), "buffers")
}

A small, deliberately-allocating program. Each iteration allocates a 1 KB array and keeps a pointer to it. We expect the heap to grow steadily and GC cycles to happen periodically.

Running it with gctrace

$ GOGC=100 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -20
gc 1 @0.003s 0%: 0.012+0.31+0.020 ms clock, 0.10+0.054/0.30/0.072+0.16 ms cpu, 4->4->2 MB, 5 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 2 @0.012s 0%: 0.011+0.46+0.030 ms clock, 0.094+0.21/0.45/0.10+0.24 ms cpu, 4->5->3 MB, 5 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 3 @0.024s 0%: 0.014+0.66+0.027 ms clock, 0.11+0.23/0.66/0.18+0.22 ms cpu, 6->7->5 MB, 7 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 4 @0.046s 1%: 0.015+1.4+0.029 ms clock, 0.12+0.46/1.4/0.18+0.23 ms cpu, 10->12->9 MB, 11 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 5 @0.094s 1%: 0.014+2.4+0.030 ms clock, 0.11+0.49/2.4/0.42+0.24 ms cpu, 18->20->15 MB, 19 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 6 @0.187s 1%: 0.013+3.6+0.025 ms clock, 0.10+0.84/3.6/0.93+0.20 ms cpu, 30->33->24 MB, 31 MB goal, 0 MB stacks, 0 MB globals, 8 P

Take one line - gc 4 - and decode every field.

The fields, in order

gc 4 @0.046s 1%: 0.015+1.4+0.029 ms clock, 0.12+0.46/1.4/0.18+0.23 ms cpu, 10->12->9 MB, 11 MB goal, 0 MB stacks, 0 MB globals, 8 P
  • gc 4 - the 4th GC cycle since program start.
  • @0.046s - this cycle began 46 ms after program start.
  • 1% - the program has spent 1% of total wall-clock time in GC. (Sum across all GC cycles so far.)
  • 0.015+1.4+0.029 ms clock - wall-clock duration of the three GC phases:
    1. 0.015 ms - stop-the-world sweep termination. The runtime pauses all goroutines briefly to finish any leftover sweeping from the previous cycle.
    2. 1.4 ms - concurrent mark + scan. This is the bulk of the work. Goroutines keep running while the GC marks reachable objects.
    3. 0.029 ms - stop-the-world mark termination. A second brief pause to finalize the mark.

The two pauses (0.015 + 0.029 = ~44 µs total) are what your latency-sensitive code feels. The 1.4 ms middle is concurrent and doesn't block.

  • 0.12+0.46/1.4/0.18+0.23 ms cpu - total CPU time across all P (processors). The layout mirrors wall-clock but split by phase, with the middle phase broken into three (assist+background+idle CPU time).

  • 10->12->9 MB - heap size:

    1. 10 MB - heap size when GC started.
    2. 12 MB - heap size after marking (peak - concurrent work added objects).
    3. 9 MB - heap size after sweep finishes (live heap retained).

So this cycle reclaimed ~3 MB.

  • 11 MB goal - the heap-size trigger for next GC. Set by GOGC=100 to mean "trigger next GC when heap grows ~100% beyond the just-marked live set." 9 MB live × (1 + 100/100) = 18 MB would be the naive goal, but the runtime adjusts. Actual next-trigger may differ.

  • 0 MB stacks - total goroutine stack memory. We have only one goroutine here.

  • 0 MB globals - package-level data. Our program has almost none.

  • 8 P - 8 processors (P) participating. Matches GOMAXPROCS=8.

What changed across cycles

Watch the 4->5->3, 6->7->5, 10->12->9, 18->20->15, 30->33->24 progression. The live heap (third number) is growing as our sink slice retains more pointers. The trigger heap (first number, on next cycle) tracks: each cycle starts roughly 2× the previous live size, because GOGC=100 means "200% headroom."

The middle pause-free phase grew too: 0.310.460.661.42.43.6 ms. That's expected; marking takes time proportional to the live object graph. Stop-the-world phases stayed flat (~15 µs + ~30 µs) regardless of heap size - that's the whole point of Go's concurrent GC.

What this tells you

  • Go's GC pauses are measured in tens of microseconds. The "GC pause" most people complain about in older runtimes does not apply here.
  • "GC took 3.6 ms" sounds bad, but 99.9% of that was concurrent. Your goroutines kept running.
  • Heap headroom is the lever. GOGC=200 doubles the headroom (less frequent, bigger cycles); GOGC=50 halves it (more frequent, smaller). GOMEMLIMIT (Go 1.19+) caps absolute heap regardless of GOGC.

Try tuning

Re-run with different settings:

$ GOGC=50 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -10
# More GC cycles, smaller peaks, slightly more CPU on GC.

$ GOGC=200 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -10
# Fewer GC cycles, larger peaks, lower GC CPU but more memory.

$ GOMEMLIMIT=20MiB GOGC=100 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -20
# Soft cap: when heap approaches 20 MiB, GC runs more aggressively
# to stay under the limit even if GOGC would let it grow.

The trap

Reading gctrace and concluding "we should tune GOGC in production." Usually no. In 95% of cases, the right answer is: 1. Use the default GOGC=100. 2. Set GOMEMLIMIT to ~80% of your container's memory limit so the GC starts pushing back before OOM. 3. Use runtime/pprof heap profiles to find allocation hotspots and fix the code, not the GC.

Tuning GOGC is a last resort and almost always trades latency for throughput (or vice versa) - not a free win.

Exercise

  1. Run the program above. Identify the GC cycle in which the heap first crossed 100 MB.
  2. Add a sync.Pool for the byte arrays. Re-run with the same GOGC=100. How many GC cycles happen now? How does the heap profile change?
  3. Run with -toolexec='go tool trace' and view the resulting trace in go tool trace UI. Find a GC cycle. See the assists, the background sweepers, the STW phases.