Go Mastery¶
Runtime, GMP scheduler, GC, channels, distributed systems.
Printing this page
Use your browser's Print → Save as PDF. The print stylesheet hides navigation, comments, and other site chrome; pages break cleanly at section boundaries; advanced content stays included regardless of beginner-mode state.
Go Mastery Blueprint-A 24-Week Master-Level Syllabus¶
Authoring lens: Senior Staff Software Engineer / Distributed Systems Architect.
Target outcome: A graduate of this curriculum should be capable of (a) submitting non-trivial PRs against golang/go (runtime, compiler, or stdlib), (b) owning a high-throughput distributed control plane (Kubernetes-class), or (c) operating a hyperscale fleet of Go services with a coherent observability story and zero-downtime deploys.
This is not "Tour of Go in 24 weeks." It assumes the reader can write working Go and has shipped production code in some language. The premise: most Go bugs at scale are not language bugs-they are runtime, scheduler, allocator, and GC bugs in disguise. This curriculum surfaces all four.
Repository Layout¶
| File | Purpose |
|---|---|
00_PRELUDE_AND_PHILOSOPHY.md |
The "Go-ness" of Go; the design ethics; the cost model; reading list. |
01_MONTH_RUNTIME_FOUNDATIONS.md |
Weeks 1–4. Toolchain, GMP scheduler, stack management, escape analysis. |
02_MONTH_MEMORY_AND_GC.md |
Weeks 5–8. Memory layout, tricolor GC, interface itabs, GOMEMLIMIT. |
03_MONTH_CONCURRENCY_MASTERY.md |
Weeks 9–12. Channel internals, atomics, context, leak/deadlock prevention. |
04_MONTH_REFLECTION_CODEGEN_PLUGINS.md |
Weeks 13–16. reflect, go/ast, go generate, plugin, go-plugin. |
05_MONTH_PRODUCTION_DISTRIBUTED.md |
Weeks 17–20. DDD, observability, gRPC, hardened testing. |
06_MONTH_CAPSTONE.md |
Weeks 21–24. Consensus, distributed storage, perf tuning, capstone defense. |
APPENDIX_A_PRODUCTION_HARDENING.md |
pprof, trace, golangci-lint, race detector, ldflags, build tags. |
APPENDIX_B_DATA_STRUCTURES_AND_PATTERNS.md |
Build-from-scratch reference: lock-free queue, ring buffer, Bloom filter, LRU. |
APPENDIX_C_CONTRIBUTING_TO_GO.md |
The Go pipeline; Gerrit; first CL playbook; runtime PR strategy. |
CAPSTONE_PROJECTS.md |
Three terminal projects: Raft KV store, gRPC mesh, streaming pipeline. |
How Each Week Is Structured¶
Every weekly module follows the same five-section format so the reader can budget time:
- Conceptual Core-the why, with a mental model.
- Mechanical Detail-the how, down to runtime source where relevant (
src/runtime/proc.go,src/runtime/mgc.go, etc.). - Lab-a hands-on exercise that cannot be completed without internalizing the concept.
- Idiomatic &
golangci-lintDrill-read 2–3 lints, refactor a sample to silence them, understand why each lint exists. - Production Hardening Slice-a
pprof/trace/ - race/go vet` micro-task that compounds into a publishable hardening template.
Each week is sized for ~12–16 focused hours. Skip the labs at your peril; the labs are the curriculum.
Progression Strategy¶
The phases form a dependency DAG, not a linear track:
Runtime Foundations ──► Memory & GC ──► Concurrency ──► Reflection / Codegen / Plugins
│ │ │ │
└────────────────────┴────────┬───────┴───────────────────────┘
▼
Production & Distributed Systems
│
▼
Capstone & Defense
The Production Hardening slice is intentionally orthogonal-it accumulates a hardening/ template that, by week 24, is a publishable Go module starter.
Non-Goals¶
- This curriculum does not cover web frameworks (Gin/Echo/Fiber) as primary subjects. They appear only as integration surfaces in Month 5;
net/httpis sufficient for everything taught here. - Front-end / GopherJS / TinyGo are out of scope (pointers given in
00_PRELUDE). - "Why Go is better than X" advocacy is explicitly avoided. The reader should finish the program able to argue against using Go when it is the wrong tool.
Capstone Tracks (pick one in Month 6)¶
- Distributed Storage Track-a Raft-replicated key-value store with linearizable reads, snapshot/restore, multi-region demo.
- Service Mesh Track-a gRPC-based microservices mesh with a custom service registry, health checking, deadline propagation, and outlier ejection.
- Streaming Pipeline Track-a Kafka-protocol-compatible (or NATS-style) ingestion + stream-processing pipeline with at-least-once delivery and replay.
Details in CAPSTONE_PROJECTS.md.
Versioning Note¶
This curriculum targets Go 1.22+ as the baseline (range-over-int, range-over-func iterators stable, loopvar semantics fixed in 1.22, slog stable since 1.21, PGO stable since 1.21, GOMEMLIMIT since 1.19, async preemption since 1.14). Do not start this curriculum on a Go older than 1.22-too many of the modern idioms will be unavailable.
Prelude-The Philosophy Behind the Syllabus¶
Sit with this document for an evening before week 1. The rest of the curriculum is mechanically dense; this is the only chapter where we step back and define the shape of the discipline.
1. Go Is a Runtime, Not a Language¶
The most damaging misconception a Go engineer can hold is that "Go is just C with garbage collection and goroutines." A working master-level practitioner thinks the inverse:
Go is a runtime-a sophisticated user-space scheduler, allocator, and concurrent garbage collector-that ships with a small, deliberately under-featured language attached.
Almost every interesting performance bug in production Go has its root in the runtime, not in the language semantics. Almost every elegant high-throughput Go architecture is a thin layer over runtime primitives (runtime.Gosched, runtime.LockOSThread, runtime/pprof, runtime/trace, runtime/debug.SetGCPercent, debug.SetMemoryLimit).
Internalize this and the rest of the curriculum makes sense.
2. The Five-Axis Cost Model¶
A working Go engineer reasons about every line of code along five axes simultaneously:
| Axis | Question to ask |
|---|---|
| Allocation | Does this escape to the heap? Could it stay on the stack? |
| Scheduling | Will this goroutine block? On what-channel, syscall, lock, network? Will it preempt the P? |
| GC pressure | How much live data does this add? How long-lived? Pointer-rich? |
| Concurrency safety | Is this aliasable across goroutines? Is the access pattern visible to the race detector? |
| Failure | What happens on panic? On context.Canceled? On a deadlocked send? |
Beginner courses teach axis 4 only (and incompletely). This curriculum forces all five into your hands by week 12.
3. The "Go Way"-Aesthetic as Engineering Constraint¶
Go's design ethic is, famously, "simplicity." That word is doing more work than newcomers think. Specifically:
- Composition over inheritance. Embedding, not subclassing. Interfaces are implicitly implemented; consumers define interfaces, not producers.
- Errors are values. No exceptions.
if err != nilis not boilerplate-it is a deliberate choice to make every failure path local and visible. - Concurrency by communication. "Don't communicate by sharing memory; share memory by communicating." Channels first, mutexes when channels would obscure intent.
- The stdlib is the framework.
net/http,encoding/json,database/sql,context, `log/slog - together they cover ~80% of any service. Reach for third-party only when stdlib runs out. - Tooling is part of the language.
gofmt,go vet,go test -race,go test -fuzz,pprof,trace. A Go engineer who does not know these is half-trained.
If you fight these defaults, you will fight the language. If you internalize them, you will write code that any other Go engineer can pick up in under a day. That is the actual deliverable Go optimizes for.
4. The Reading List¶
These are referenced throughout the curriculum. You are not expected to read them cover-to-cover before starting; they are pinned tabs.
Primary
- The Go Programming Language (Donovan & Kernighan). The canonical text.
- 100 Go Mistakes and How to Avoid Them (Teiva Harsanyi). The single best second book.
- Concurrency in Go (Katherine Cox-Buday). Read once at week 9, again at week 12.
- The Go Memory Model-go.dev/ref/mem. Normative spec for memory ordering.
- Effective Go + the Go Code Review Comments wiki-both ~30 minutes of reading, both load-bearing for code-review fluency.
Runtime & internals
- The runtime source itself: src/runtime/proc.go (scheduler), src/runtime/mgc.go (GC), src/runtime/malloc.go (allocator), src/runtime/stack.go, src/runtime/chan.go, src/runtime/iface.go. Treat these as primary literature, not reference.
- Go internals by jhh.io and Russ Cox's "research.swtch.com" archive. Particularly Go's Work-Stealing Scheduler and The Tricolor Garbage Collector.
- Dmitry Vyukov's writings on the scheduler (Vyukov co-designed the work-stealing scheduler).
- Madhav Jivrajani's GoTeam talks ("Go scheduler: a deep dive").
Distributed systems canon (not Go-specific, but mandatory) - Lamport, Time, Clocks, and the Ordering of Events. - Diego Ongaro, In Search of an Understandable Consensus Algorithm (the Raft paper). Read in week 21. - Brewer, CAP Twelve Years Later. The original CAP paper is famously misread; this is the cleaner statement. - Kleppmann, Designing Data-Intensive Applications. Read chapters 5–9 in the back half of the curriculum.
Adjacent canon - Drepper, What Every Programmer Should Know About Memory. Re-read in week 5. - Herlihy & Shavit, The Art of Multiprocessor Programming, chapters 7, 9, 13.
5. Curriculum Philosophy: "Read the Source, Ship the Lab"¶
Three rules govern every module:
- Source first, blog second. When the curriculum says "study the channel send path," it means open
src/runtime/chan.goand readchansend1. Blogs go stale; commits are dated. - One lab per concept, one CL per phase. By the end of each month, the reader has produced one open-source-quality artifact (module, gist, or upstream contribution)-not a notebook of toy snippets.
- The race detector and
pprofare the teachers. When you do not understand why a program misbehaves, the first response isgo test -race, the second isgo tool pprof, and only the third is to ask another human.
6. What Go Is Not For¶
A graduate of this curriculum should be able to argue these points in a design review without sounding ideological:
- CPU-bound numerical code. No SIMD intrinsics in the language; the compiler's autovectorizer is conservative; the GC tax is non-zero. Use Rust, C++, or call out to BLAS/LAPACK via cgo.
- Hard-real-time systems. GC pauses are short but non-zero. Audio DSP, motor control, kernel drivers-wrong tool.
- Heavy generic-numeric libraries. Generics landed in 1.18 but have constraints (no method-on-type-parameter dispatch, no specialization). This is fine for collections; it is awkward for `numpy - equivalent libraries.
- Code where the team will demand inheritance hierarchies. Go has no inheritance. A team that resists composition will fight Go forever.
The signal that Go is the right tool: you have a concurrency, deployability, and team-onboarding-speed constraint that ranks above raw CPU efficiency or expressiveness.
7. A Note on AI-Assisted Workflows¶
Modern Go authors use LLM tooling. Three rules:
- **Never accept generated concurrent code without - race
.** The most common failure mode of generated Go is plausible-looking but racy patterns (closing a channel from multiple goroutines, reading amap` from one goroutine while writing from another). - Verify generated
interfacesatisfaction. Models hallucinate methods. Always compile. - Treat suggested context-handling skeptically. The most common context bug-capturing the request
ctxin a goroutine that outlives the request-is endemic in generated code.
You are now ready for Week 1. Open 01_MONTH_RUNTIME_FOUNDATIONS.md.
Month 1-Runtime Foundations: Toolchain, GMP, Stacks, Escape Analysis¶
Goal: by the end of week 4 you can (a) describe the GMP scheduler model and trace a goroutine through go func() → runqput → findrunnable → execution, (b) predict whether a value will escape to the heap by reading the source, (c) explain why a goroutine that never yields can stall a P, and (d) ship a small CLI as a statically linked binary with reproducible builds.
Weeks¶
- Week 1 - The Toolchain and the Build Pipeline
- Week 2 - The GMP Scheduler Model
- Week 3 - Stack Management
- Week 4 - Escape Analysis and the Inliner
Week 1 - The Toolchain and the Build Pipeline¶
1.1 Conceptual Core¶
- The Go toolchain is a single binary that bundles the compiler, linker, formatter, dependency manager, test runner, race detector, profiler, and tracer. Every other ecosystem you've used distributes these as separate tools; Go's choice to integrate them is itself a design statement.
go buildis not just a compiler invocation. It is a dependency graph walker that:- Resolves the module graph (
go.mod+go.sum). - Computes the build action graph (run with
go build -xorgo build -nto inspect). - Compiles each package to an archive (
.a) cached in$GOCACHE(default$HOME/.cache/go-build). - Links into a final binary or
.so/.a. - The build cache is content-addressed. Identical inputs → identical outputs → cache hit. This is what makes
go buildfeel instantaneous on second invocations.
1.2 Mechanical Detail¶
- Module mode is the only mode. GOPATH mode is dead-do not start a project under
$GOPATH/srcin 2026. go.moddirectives:module,go,toolchain,require,replace,exclude,retract. Memorize all of them.- Minimum Version Selection (MVS): Go's resolver picks the minimum version of each dependency that satisfies all
requiredirectives. This is the opposite of npm/pip "latest compatible." Read Russ Cox's MVS paper. go.sumis a content-addressed integrity ledger, not a lock file. It records hashes of every module version ever depended on, including transitively-dropped versions. Never edit by hand.- The
vendor/directory: dead in OSS, alive in air-gapped enterprise. Usego mod vendoronly when offline builds are mandatory. - Useful introspection commands:
- `go env - every environment variable the toolchain consults.
- `go list -m all - every module in the build.
- `go list -deps -json ./... - the package graph as JSON.
- `go version -m
- the modules embedded in a built binary (BuildInfo).
1.3 Lab-"Hello World, Audited"¶
- Create
hello-audited. Setgo 1.22and atoolchain go1.22.xdirective. - Build with
go build -trimpath -ldflags="-s -w -X main.version=v0.1.0". Rungo version -m ./hello-audited. - Strip with
stripand compare. Cross-compile tolinux/arm64,darwin/arm64,windows/amd64withGOOS=... GOARCH=... go build. - Document the size delta from each flag in
NOTES.md. - s -wtypically saves ~30%; - trimpathis a reproducibility flag (no local paths in the binary), not a size flag. - Inspect the binary with
go tool nmandgo tool objdump. Identify the runtime symbols (runtime.main,runtime.gcStart,runtime.schedule).
1.4 Idiomatic & golangci-lint Drill¶
- Install
golangci-lint. Enable a strict config:errcheck,govet,staticcheck,gosimple,ineffassign,revive,gocritic,gosec,bodyclose,nilerr,prealloc,unconvert. Run on a small repo. Read each finding's URL and understand the rationale.
1.5 Production Hardening Slice¶
- Add a
Makefile(orTaskfile.yml) target that runsgofmt -l -d,go vet ./...,golangci-lint run,go test -race -count=1 ./...,go build -trimpath. This is the baseline CI invocation; every subsequent week's hardening slice extends it. - Adopt
go-licensesto scan dependency licenses. Commit the report.
Week 2 - The GMP Scheduler Model¶
2.1 Conceptual Core¶
- G = Goroutine. A user-space concurrent execution context with a stack, program counter, and runtime metadata. Cheap (~2 KB initial stack), millions per process.
- M = Machine. An OS thread (
pthreadon POSIX). Has a kernel-managed stack. Goroutines run on M's. - P = Processor. A logical execution context that holds a local run queue of runnable G's plus a few caches (per-P mcache for the allocator).
GOMAXPROCSsets the number of P's; default = number of CPU cores. - The invariant: an M needs a P to run Go code (specifically, to call into the scheduler). When an M makes a blocking syscall, it hands off its P to another M so other goroutines can run. This is the magic that makes blocking syscalls cheap in Go.
2.2 Mechanical Detail¶
Read these source files in this order:
1. src/runtime/runtime2.go - the data structures:g,m,p,schedt. Keep this open as a reference.
2.src/runtime/proc.go::schedule - the heart of the scheduler: pick a runnable G and switch to it.
3. src/runtime/proc.go::findrunnable - the search order: local runq → global runq → netpoll → work-stealing from peer P's.
4.src/runtime/proc.go::newproc - what happens at go func(). Particularly note runqput and the work-stealing-friendly slot ordering.
Key concepts:
- Local run queue: each P has a 256-slot ring buffer of runnable G's. Push to tail, pop from head; work-stealers take from peer P's heads.
- Global run queue: a doubly-linked list under sched.lock. Used as overflow for local queues and for goroutines woken from netpoll.
- Work stealing: when a P's local queue is empty, it picks a victim P at random and steals half its queue. This is what amortizes load across cores.
- runtime.LockOSThread(): pin the calling goroutine to its current M. Necessary for cgo calls to OS APIs that require thread affinity (most GUI toolkits, OpenGL, some signal handlers).
- runtime.Gosched(): cooperative yield. The goroutine is moved back to the global queue.
- Asynchronous preemption (since Go 1.14): tight CPU loops without function-call boundaries used to monopolize a P; now the runtime sends SIGURG to the M to force a safe-point. Read runtime/preempt.go.
- netpoller: integrates with epoll/kqueue/IOCP. When a goroutine blocks on a network read, it parks and the M can run other goroutines. The goroutine is unparked when the FD is ready.
2.3 Lab-"Schedule Forensics"¶
Build a tiny program that:
1. Spawns 1,000 goroutines, each computing a busy CPU loop for 10ms.
2. Records the time-to-completion distribution.
3. Re-runs with GOMAXPROCS=1, =2, =N (your core count).
4. Re-runs with runtime.Gosched() inserted in the loop.
5. Re-runs with the loop replaced by time.Sleep(10*time.Millisecond) (the netpoller path).
Tabulate the latency distributions in NOTES.md. Explain why GOMAXPROCS=1 without Gosched() produces high tail latency. Then, capture an execution trace with runtime/trace:
go tool trace. Identify the per-P timeline, GC pauses, and proc transitions.
2.4 Idiomatic & golangci-lint Drill¶
staticcheck SA1019(deprecated APIs),staticcheck SA5008(forgottendefervs loop variables),revive: confusing-naming. Less about scheduler correctness here, more about hygiene.
2.5 Production Hardening Slice¶
- Wire
runtime/traceto a/debug/traceHTTP handler (gated by build tagdebug). Addpprofhandlers (net/http/pprofimport for side effect). Document how to capture a 10-second trace from a running process.
Week 3 - Stack Management¶
3.1 Conceptual Core¶
- Every goroutine has its own stack, separate from the OS thread stack. Initial size: 2 KB. Stacks grow (and shrink) dynamically. There is no fixed maximum per goroutine until you hit
runtime.SetMaxStack(default 1 GB on 64-bit). - Contiguous stacks (since Go 1.4): when a goroutine needs more stack, the runtime allocates a new, larger contiguous region, copies the old stack into it, and rewrites all internal pointers. This is what the compiler-emitted "stack guard" preamble enables.
- The relationship to escape analysis is direct: stack-allocated values are free; heap-allocated values cost an allocation, GC tracking, and a future scan. Master Go performance work is, in large part, the art of keeping values on the stack.
3.2 Mechanical Detail¶
- Stack growth flow (
src/runtime/stack.go): - Function prologue checks
g.stackguard0againstSP. - If
SP < stackguard0, jump toruntime.morestack. morestackcallsnewstack, which allocates a new stack 2× the old size, copies, and rewrites pointers (including pointers to local variables and function parameters).- Resume execution.
- Stack shrinking is performed by the GC when it observes the goroutine is using less than 1/4 of its stack.
- Pointer adjustment during copy: this is the reason Go does not let you take stable pointers to stack-allocated locals across goroutine boundaries-moving the stack invalidates them. The escape analysis catches this; values that escape are heap-promoted.
- Unsafe consequences: storing a
uintptr(rather thanunsafe.Pointer) does not protect against stack moves. The GC will not update the address. The Go memory model documents this; theunsafepackage docs make it explicit.
3.3 Lab-"Stack Growth in the Wild"¶
- Write a recursive function
func depth(n int) int { if n == 0 { return 0 }; var buf [256]byte; _ = buf; return 1 + depth(n-1) }. - Run with progressively larger
n. UseGODEBUG=gctrace=1,scheddetail=1and observe stack growth events. - Re-run under
runtime.ReadMemStatssnapshots, recordingStackInuseandStackSys. - Now write the same function with a `goroutine - per-call style and observe how stack churn changes.
3.4 Idiomatic & golangci-lint Drill¶
gocritic: deepEqualByteSlice,prealloc. The latter flags ranged loops appending to a slice that could bemake'd with capacity-relevant to allocator pressure but not stack-specific.
3.5 Production Hardening Slice¶
- Add
runtime/debug.SetMaxStack(64 * 1024 * 1024)(64 MiB) in your service binaries. Default 1 GiB is rarely what you want; bounding stack per-goroutine catches runaway recursion early.
Week 4 - Escape Analysis and the Inliner¶
4.1 Conceptual Core¶
- Escape analysis is the compiler pass that decides whether a variable can live on the stack (cheap, freed on return) or must live on the heap (allocated by
mallocgc, GC-tracked). It is not a runtime decision; it is purely static. - The two questions the compiler asks for each
&x/new(T)/make(...): - Does the address outlive the current function? (Returned, stored in a heap object, captured by a goroutine, captured by an interface escape.)
- Is the size statically known and bounded? (Variable-length stack allocations are limited.)
- If yes to escape: heap. If no: stack. The compiler dumps its reasoning under - gcflags=-m`.
4.2 Mechanical Detail¶
- Common escape triggers:
- Returning
&xfor a localx. - Storing
&xin a heap-allocated struct, slice, or map. - Capturing
xby a goroutine closure (the goroutine outlives the frame). - Boxing a value into an
interface{}of nontrivial size-the value is copied to the heap so the interface header can hold a pointer. - Calls to functions whose parameters are passed via
interface{}(e.g.,fmt.Printf("%d", x)boxesx). - Slices grown beyond the inlined-make size threshold.
- The inliner: small functions are inlined. Inlining matters for escape analysis because escape decisions are made across inlined call sites-a function that "would escape if not inlined" may stay on the stack when inlined into its caller.
//go:noinlineand//go:nosplit: directives to suppress inlining or stack-split checks. Reserved for runtime-internal code; rarely justified in application code.- Allocation profile:
go test -bench=. -memprofile=mem.outthengo tool pprof -alloc_objects mem.out. The - alloc_objectsview counts allocations (escapes); - inuse_spacecounts retained bytes.
4.3 Lab-"Escape Forensics"¶
For each of the following snippets, predict whether the value escapes, then verify with - gcflags=-m:
1.func A() int { x := 7; return &x }2.func B() int { x := 7; p := &x; return p }3.func C() { x := 7; go func() { fmt.Println(x) }() }4.func D() { x := bytes.Buffer{}; x.WriteString("hi"); fmt.Println(x.String()) }5.func E(s []int) int { return len(s) }called asE(make([]int, 8)).
6.func F() any { return 7 }(boxing intointerface{}`).
7. A method call on an interface value vs the concrete type (covered in Week 7).
For each that escapes, propose a refactor that keeps it on the stack. Then write a Criterion-style benchmark (testing.B) and prove the win.
4.4 Idiomatic & golangci-lint Drill¶
staticcheck SA6002(sync.Poolaccepting non-pointer types-silent allocation),gocritic: hugeParam,prealloc,makezero. Each maps to an allocation pathology.
4.5 Production Hardening Slice¶
- Configure
golangci-lintto fail on new escape-related issues introduced by a PR. Add a CI step that runsgo test -bench=. -benchmemon critical packages and diffs allocations against a baseline (benchstat).
Month 1 Capstone Deliverable¶
A workspace runtime-foundations/ with three modules:
1. schedule-forensics (week 2 lab)-produces a labeled trace.out and a markdown latency-distribution report.
2. stack-growth (week 3 lab)-produces a graph of StackInuse over time.
3. escape-clinic (week 4 lab)-six benchmarks with before / after allocation counts.
CI must run: gofmt -l, go vet, golangci-lint, go test -race, go test -bench=. -benchmem | tee bench.txt, benchstat baseline.txt bench.txt. The baseline is captured at week 4's end and tracked from then on.
Month 2-Memory, the Garbage Collector, and Interface Internals¶
Goal: by the end of week 8 you can (a) read a pprof heap profile and explain why an object is retained, (b) describe the tricolor mark-sweep algorithm including write barriers and the mark assist, (c) predict from a type signature whether method calls will allocate, and (d) tune GOGC and GOMEMLIMIT for a real workload.
Weeks¶
- Week 5 - Memory Layout, Padding, Alignment
- Week 6 - The Garbage Collector
- Week 7 - Interface Values, itabs, and Dispatch Cost
- Week 8 - Allocation Profiling,
sync.Pool, GC Tuning
Week 5 - Memory Layout, Padding, Alignment¶
5.1 Conceptual Core¶
- A Go value lives in exactly one of: a goroutine stack, the heap (managed by
mallocgc), the data segment (mutable globals), or the read-only data segment (string literals, constants). - Struct field ordering is significant for memory footprint: Go does not reorder fields. The compiler inserts padding to satisfy alignment requirements. Misordering can double the size of a hot struct.
- False sharing is the silent killer of concurrent Go: two unrelated atomic counters in the same cache line cause cores to evict each other's caches on every update. The fix is padding to 64 bytes (or 128 on Apple Silicon).
5.2 Mechanical Detail¶
unsafe.Sizeof,unsafe.Alignof,unsafe.Offsetof. Memorize the size of every primitive:bool1,int81,int162,int32/float32/rune4,int64/float64/int/uintptr8 (on 64-bit), pointer 8, slice header 24 (ptr+len+cap), string header 16 (ptr+len), interface header 16 (itab/type+data), map pointer 8, channel pointer 8.- Field reordering for size: sort fields by alignment, descending. Tools:
fieldalignment(a vet analyzer ingolang.org/x/tools/go/analysis/passes/fieldalignment). Wire it into CI. runtime/internal/sys.CacheLineSizeis 64 on most platforms. Use[7]uint64padding (or the helper ingolang.org/x/sys/cpu/ a customCacheLinePad) to isolate hot atomics.- Slice internals: a slice is a 24-byte header
{Data *T, Len int, Cap int}.s = append(s, x)may reallocate; the old backing array is GC'd if no other slice references it. The growth strategy: ~2× under 1024 elements, ~1.25× above. Readruntime/slice.go::growslice. - Map internals:
runtime/map.go. Hash-bucketed open addressing with overflow buckets. 8 entries per bucket. Iteration order is deliberately randomized. Maps are never safe for concurrent write+anything; usesync.Map(specialized read-mostly) or a sharded map for general concurrency.
5.3 Lab-"Layout Forensics"¶
- Define five "interestingly bad" structs (e.g.,
struct{ a bool; b int64; c bool; d int64; e bool }). Compute theirunsafe.Sizeofby hand, then verify. - Reorder for minimal padding. Re-measure. Document each delta.
- Build a benchmark with
[]Structof 1M elements; compare allocation/scan time with the badly-padded vs the optimally-packed version. Useruntime.ReadMemStatsto captureHeapAllocand GC pause durations. - Construct a false-sharing example: two atomic counters incremented by different goroutines, with and without
CacheLinePadbetween them. Benchmark contention. Expect 5–20× difference.
5.4 Idiomatic & golangci-lint Drill¶
fieldalignment(vet analyzer),unconvert,gocritic: builtinShadowDecl. Wirefieldalignmentinto CI as a hard fail.
5.5 Production Hardening Slice¶
- Add
runtime.ReadMemStatsinstrumentation to your service template. ExportHeapAlloc,HeapInuse,StackInuse,NumGC,PauseTotalNsas Prometheus metrics (orexpvar). This becomes the Month 5 observability baseline.
Week 6 - The Garbage Collector¶
6.1 Conceptual Core¶
- Go's GC is a concurrent, tricolor, mark-sweep, non-generational, non-compacting collector. Each adjective is doing work:
- Concurrent: marks happen while the application runs ("mutator").
- Tricolor: every object is white (unreached), gray (reached but children unscanned), or black (reached and scanned). The invariant: a black object never points to a white object.
- Mark-sweep: phase 1 marks reachable; phase 2 reclaims unmarked.
- Non-generational: no separate young/old heap. (The Pacer compensates.)
- Non-compacting: objects do not move. This is what allows direct pointer interior addressing and
unsafe.Pointerto remain valid. - Why these choices: Go optimizes for predictable low-pause behavior at the cost of throughput. A compacting collector would have lower steady-state heap, but compaction stops the world.
6.2 Mechanical Detail¶
- Phases (
runtime/mgc.go): - Sweep termination (STW, microseconds): finish previous cycle's sweep.
- Mark setup (STW, microseconds): enable write barrier, scan stacks (briefly STW each goroutine).
- Concurrent mark: workers and mutator-assist mark the heap. Write barrier intercepts pointer writes.
- Mark termination (STW, ~100 µs to ms on huge heaps): finalize.
- Concurrent sweep: lazily reclaim white objects as the next allocation requests space.
- Write barrier: the Dijkstra-style barrier records pointer writes during mark so the mutator cannot "hide" a white object behind a black one. Implemented as a runtime call inserted by the compiler around pointer stores. This is why pointer-heavy code is GC-expensive: every write costs a barrier.
- Mark assist: when a goroutine allocates, it is forced to do proportional GC work. This couples allocation rate to GC progress and is the mechanism that prevents heap blowup.
- The Pacer: targets
next_gc = live_after_last_gc * (1 + GOGC/100). DefaultGOGC=100: GC when heap doubles. Tunable viaGOGC=off,GOGC=50(more frequent, lower memory),GOGC=200(less frequent, higher memory). GOMEMLIMIT(since Go 1.19): a soft total-memory ceiling. The GC adjusts pacing to stay under the limit even ifGOGCwould not have triggered. Use it as your primary memory control in containers; leaveGOGCat default.- Stack scanning: each goroutine's stack is rooted in marking. Goroutines are paused (briefly) for stack scan; this is part of the STW-mark-setup phase but is per-goroutine and parallelized.
6.3 Lab-"GC Forensics"¶
- Write a service that allocates 100 MB/s of short-lived objects. Run with
GODEBUG=gctrace=1. Read each GC line and identify: total heap, live heap, pause time, pacer target. - Set
GOMEMLIMIT=512MiBandGOGC=off. Re-run; observe how the GC is now driven entirely by the memory ceiling. - Set
GOGC=50(noGOMEMLIMIT). Re-run; observe more frequent, smaller GCs. - Capture a
go tool pprof -alloc_objectsprofile. Identify the top five allocation sites. Refactor at least two usingsync.Poolor pre-allocated buffers. Re-benchmark. - Capture a
go tool traceand locate the GC mark phases visually.
6.4 Idiomatic & golangci-lint Drill¶
staticcheck SA6002(sync.Poolwith non-pointer types),prealloc,gocritic: rangeValCopy(large struct copies in range loops).
6.5 Production Hardening Slice¶
- In your service template, set
GOMEMLIMITfrom aMEMORY_LIMITenvironment variable computed at container start (debug.SetMemoryLimit(int64(0.9 * cgroup_memory_limit))). This is the single most impactful production tuning knob. - Export GC metrics:
go_gc_duration_seconds(histogram),go_memstats_*. Useprometheus/client_golang'scollectors.NewGoCollector(collectors.WithGoCollections(collectors.GoRuntimeMemStatsCollection | collectors.GoRuntimeMetricsCollection))for the modern collector.
Week 7 - Interface Values, itabs, and Dispatch Cost¶
7.1 Conceptual Core¶
- A Go interface value is a two-word header:
- For non-empty interfaces (
io.Reader,error, etc.):(itab, data). - For empty interfaces (
any/interface{}):(type, data). itab= interface table = the dynamic dispatch vtable for one (interface, concrete type) pair. Holds the type pointer, the interface type pointer, a hash, and an array of function pointers (the methods).datais a pointer to the concrete value, or the value itself if it fits in a word (small integer types, etc., as an optimization in some Go versions). Modern Go (>=1.4) always uses an indirect-be careful with old assumptions.
7.2 Mechanical Detail¶
- Read
src/runtime/iface.go. Key functions:getitab,convT2I,assertI2I,assertE2I. Thegetitabcache is keyed by `(interface_type, concrete_type) - first call may allocate the itab; subsequent calls hit the cache. - Cost of an interface call:
- Load
itabfrom interface header (cache hit). - Load function pointer from
itab.fun[N]. - Indirect call. Plus: the call cannot be inlined. So an interface call is 1 indirect call + lost inlining opportunities. On hot paths, this matters.
- Boxing cost: assigning a non-pointer concrete value to
anyallocates if the value is larger than a word.var x any = 42may allocate (depends on Go version's int boxing optimization);var x any = SomeBigStruct{}definitely allocates. - Type assertions:
v.(T)panics on failure; allocates anitabifTis an interface.v, ok := v.(T)is the same but no panic.switch v := v.(type)is the same machinery, optimized for multiple cases.- Type switches are faster than chains of type assertions because the compiler may emit a hashed dispatch table.
- Generics vs interfaces: generics (since 1.18) compile to a single generic body parameterized by the GC shape ("GCShape stenciling"), with a per-shape dictionary. Generics are not specialized like Rust monomorphization-there is still indirection for method calls on type parameters. The performance vs interface tradeoff is subtle and workload-dependent. Read
compiler/internal/types2/and Russ Cox's GCShape blog post.
7.3 Lab-"Interface Bench"¶
- Build a tight loop calling a method via three paths: concrete type, interface, generic type parameter. Benchmark with - benchmem`.
- Inspect the disassembly with
go tool objdump -s 'main\.benchInterface'. Identify the indirect call. - Refactor a real-world pattern (a
Loggerinterface used 10× in a hot path) into a concrete type or a type-parameterized version. Measure the win or non-win. - Build a worst-case allocation example: passing a stack int into
fmt.Println(...). Show with - gcflags=-mthat the int escapes (boxing intoany). Replace withfmt.Println(strconv.Itoa(x))` and re-measure.
7.4 Idiomatic & golangci-lint Drill¶
gocritic: typeAssertChain,gosimple S1034(omit comma-ok in type-switch). Re-read the Go FAQ on "Why no implicit type conversions?"-the answer informs API design.
7.5 Production Hardening Slice¶
- Add a benchmark to CI that asserts
0 allocs/opon critical paths (e.g., the request-handling hot path of your service template). Usetesting.B.ReportAllocs()and a script that diffsallocs/opagainst a committed baseline. Any PR that introduces an allocation on a0-allocpath fails CI.
Week 8 - Allocation Profiling, sync.Pool, GC Tuning¶
8.1 Conceptual Core¶
- The cheapest allocation is the one you do not make. The second cheapest is the one you reuse.
sync.Poolis a per-P caches-of-objects mechanism. Items can be reclaimed by the GC at any time (typically at the start of each GC cycle), so it is a cache, not a resource pool. Use it for short-lived, frequently-allocated objects (bytes.Buffer,[]bytescratch space, parser nodes).- The two production-grade memory tuning knobs:
GOGC(heap growth ratio) andGOMEMLIMIT(absolute ceiling). For containerized services, pinGOMEMLIMITto ~90% of cgroup memory; leaveGOGCdefault unless profiles say otherwise.
8.2 Mechanical Detail¶
sync.Poolmechanics (src/sync/pool.go):Get()returns from the local-P cache, falls back to a victim cache (objects from the previous GC), falls back toNew().Put()stores into the local-P cache.- At GC, the local cache is moved to victim, victim is freed.
- Therefore: do not assume a
Pool.Getreturns recentlyPutdata. Always reset state onGet. - Common
sync.Poolmistake: putting non-pointer values. The pool storesinterface{}, so a non-pointer goes through boxing-net allocation. Always store pointers. bytes.Bufferreuse pattern:- Allocation profile interpretation:
pprof -alloc_objects(count) tells you "where churn happens"; - alloc_space(bytes) tells you "where pressure happens"; - inuse_spacetells you "what is currently retained." Use all three. runtime/metrics(since 1.16): the modern API for runtime metrics. Replaces ad-hocMemStatsreads. Returns histograms for/gc/pauses:seconds,/sched/latencies:seconds, etc.
8.3 Lab-"Pool the Hot Path"¶
- Take the JSON-handling hot path of any service. Run
pprof -alloc_objectsunder load. Identify the top three allocation sites. - Introduce a
sync.Poolfor the most appropriate one (typicallybytes.Bufferor a decoder). - Re-benchmark. The win should be visible in allocs/op and in p99 latency under load.
- Now intentionally misuse:
Pool.Putwithout resetting state. Detect the bug under - race` or via a deliberately-inserted assertion.
8.4 Idiomatic & golangci-lint Drill¶
staticcheck SA6002,gocritic: appendAssign,prealloc. Re-read Dave Cheney's "High Performance Go Workshop" notes (a classic standing reference).
8.5 Production Hardening Slice¶
- Add a
/debug/pprofHTTP endpoint behind an auth-or-build-tag gate (do not expose it on the public listener). Document the on-call runbook for capturing CPU/heap profiles from a misbehaving production process. - Add `runtime/metrics - based exporters for GC pause histograms and scheduler latencies. These are the signals an SRE wants when a Go service misbehaves.
Month 2 Capstone Deliverable¶
A memory-and-gc/ workspace:
1. layout-forensics (week 5)-with fieldalignment enforced in CI.
2. gc-forensics (week 6)-with annotated gctrace=1 logs and a tuning playbook.
3. iface-bench (week 7)-concrete vs interface vs generic, three-way benchmark.
4. pool-the-hot-path (week 8)-before/after profile diff, baseline benchmark in CI.
Workspace-level CI must add: fieldalignment analyzer, 0-alloc regression guard on critical benchmarks, pprof artifacts captured on demand from a make profile target.
Month 3-Concurrency Mastery: Channels, Atomics, Context, Patterns¶
Goal: by the end of week 12 you can (a) implement a correct lock-free single-producer single-consumer ring buffer using sync/atomic, (b) read the channel send path in runtime/chan.go and explain chansend1 line-by-line, (c) detect goroutine leaks before they reach production, and (d) design a worker-pool that survives backpressure, partial failures, and graceful shutdown.
Weeks¶
- Week 9 - Channels, Deeply
- Week 10 -
syncPrimitives andsync/atomic - Week 11 -
context.Context, Cancellation, errgroup, singleflight - Week 12 - Worker Pools, Leak Detection, Deadlock Prevention
Week 9 - Channels, Deeply¶
9.1 Conceptual Core¶
- A channel is a typed, bounded (or unbounded), thread-safe queue with select integration. Internally it is a struct (
hchan) protected by a mutex, with two FIFO wait lists for blocked senders and receivers. - The CSP slogan ("share memory by communicating") is partly aspirational. In practice, large Go systems use channels for ownership transfer and signaling, and use mutexes/atomics for shared state. Both are idiomatic-picking the wrong one for a given problem is the bug.
- Send/receive semantics:
- Buffered channel with space → non-blocking send.
- Buffered channel full / unbuffered → block until a receiver is ready (or vice versa).
- Closed channel → send panics; receive returns zero value with
ok=false. nilchannel → send and receive block forever. Useful inselectto disable a case.
9.2 Mechanical Detail¶
Read src/runtime/chan.go. Particularly:
- hchan struct: qcount, dataqsiz, buf, elemsize, closed, sendx, recvx, recvq, sendq, lock.
- chansend: lock, then either copy to buffer / hand-off to waiting receiver / park sender.
- chanrecv: symmetric.
- closechan: marks closed, wakes all waiters.
- The hand-off optimization: if a sender finds a parked receiver, it copies directly into the receiver's stack and parks no goroutine. This is what makes unbuffered channels efficient.
- Select (runtime/select.go): randomized-fair selection across ready cases. The selectgo function is among the most subtle in the runtime; read it slowly. Note: select with a default is a non-blocking try.
- Closing discipline: close from the sender side, never from a receiver. Use sync.Once if multiple goroutines might close. The standard idiom for graceful shutdown is a separate done channel (or a context.Context), not closing the data channel.
9.3 Lab-"Channel Internals"¶
- Write a benchmark comparing: unbuffered chan, buffered chan(1), buffered chan(1024),
sync.Mutex+ slice queue, and a `sync/atomic - only SPSC ring buffer. Use 1 producer, 1 consumer, 10M messages. - Plot the throughput. The atomic SPSC should be 5–10× the channel; the mutex queue may beat the buffered channel for small messages.
- Reproduce a
nil - channel select pattern: a goroutine that toggles between two upstream channels by setting one tonil` to disable a case. - Write an "unbounded channel" using a goroutine that bridges an in-channel to an out-channel via an internal slice buffer. Discuss why this exists and why it is dangerous (memory growth on slow consumer).
9.4 Idiomatic & golangci-lint Drill¶
staticcheck SA1015(time.Tickleak),staticcheck SA1030(time.Afterin select-loops leaks),gocritic: emptyDecl,revive: empty-block. The first two are classic concurrency leaks.
9.5 Production Hardening Slice¶
- Add
goleak.VerifyTestMain(m)(Uber'sgo.uber.org/goleak) to the test entry point of every package that uses goroutines. CI will now fail any test that leaves a goroutine running.
Week 10 - sync Primitives and sync/atomic¶
10.1 Conceptual Core¶
sync.Mutex: a fast, fair-but-not-strict mutex with a starvation mode (since 1.9) that switches to FIFO if a goroutine has waited >1ms. Readsrc/sync/mutex.go.sync.RWMutex: reader-writer lock. Writer-preferring. The read path is fast under low contention, but cache-line bounces under heavy reading; consider sharding before reaching forRWMutex.sync.Once: exactly-once initialization with a memory-barrier guarantee.sync.WaitGroup: not a barrier; a counter with wait-on-zero. Misuse #1:wg.Add(1)inside the goroutine instead of before launching it (race withwg.Wait). Misuse #2: reusing across goroutine generations without resetting.sync.Cond: Mesa-style condition variable. Almost always the wrong tool-channels orchan struct{}+ atomicpatterns are clearer.sync.Map: optimized for the case where keys are written once and read many times across goroutines. Worse thanmap + RWMutexfor read-modify-write patterns.sync/atomic: low-level atomic operations. Modern API (since 1.19):atomic.Int64,atomic.Pointer[T],atomic.Bool,atomic.Value. Prefer the typed values over the legacy free functions-the typed API prevents most misuse.
10.2 Mechanical Detail¶
- The Go memory model: read
go.dev/ref/memonce carefully. Key facts: - There is happens-before, defined per-channel-op, per-mutex-op, per-atomic-op.
- There is no total order across atomic operations on different addresses-atomics establish per-location ordering only. (This is closer to C++'s
acquire/releasethanseq_cst.) - Reads and writes of word-sized values that are not synchronized are races and have undefined behavior. The race detector is the source of truth.
sync.Mutexsource walk-through:- State is a 32-bit word: locked bit, woken bit, starvation bit, waiter count.
- Fast path: CAS the locked bit. ~1 ns uncontended.
- Slow path: spin briefly, then park. Wake order biased toward the most recent waiter except in starvation mode.
- Atomic patterns:
- Counter:
atomic.Int64.Add(1). Use for stats; do not assume monotonicity across atomic types. - Read-only snapshot publish:
atomic.Pointer[T].Store(newPtr)paired withLoad(). The classic copy-on-write. - CAS loop for lock-free updates:
for { old := p.Load(); newV := f(old); if p.CompareAndSwap(old, newV) { break } }. Every CAS retry is wasted work; bound the loop or back off. - Memory ordering in Go:
sync/atomicoperations are sequentially consistent on Go-supported architectures (in practice). Do not rely on weaker orderings; the spec does not give you the knobs C++ does.
10.3 Lab-"Lock-Free SPSC Ring"¶
Build a single-producer, single-consumer ring buffer using only atomic.Uint64 indices. Pad the indices to separate cache lines. Validate with go test -race -count=1000 running 1 producer and 1 consumer. Benchmark against chan T and against sync.Mutex - protected slice. Document the cache-line padding's effect with awithoutPad` variant-expect a 3–10× difference on modern x86.
10.4 Idiomatic & golangci-lint Drill¶
govet: copylocks(mutexes must not be copied),staticcheck SA2000(WaitGroup.AddafterWait),gocritic: deferUnlambda.
10.5 Production Hardening Slice¶
- Run every test with - race` in CI. Make this non-negotiable.
- Add a CI step that runs critical concurrency tests under - race -count=100` to catch low-probability races. Budget the CI time accordingly.
Week 11 - context.Context, Cancellation, errgroup, singleflight¶
11.1 Conceptual Core¶
context.Contextis the cancellation propagation primitive in Go. Every blocking operation that crosses an API boundary should accept acontext.Contextas its first parameter.- A context carries:
- Deadline (or no deadline).
- Cancellation channel (
<-ctx.Done()) and reason (ctx.Err()). - Request-scoped values (
ctx.Value(key))-sparingly. - Contexts are immutable trees: each derivation (
WithCancel,WithTimeout,WithValue) produces a child. Cancelling a parent cancels all descendants.
11.2 Mechanical Detail¶
- The cancellation rules:
- Pass
context.Contextas the first parameter, namedctx. - Do not store context in struct fields except for short-lived adapters. (One narrow exception: long-running services that derive an internal context once from
context.Background().) - Always call the
cancelfunction returned byWithCancel/WithTimeout/WithDeadline, even on the success path. Otherwise the context's resources (timer, goroutine inpropagateCancel) leak. - Do not use
context.Valuefor required parameters. It is a request-scoped sidecar, not a function-call mechanism. Type-safe alternatives (function arguments, struct fields) are always better. errgroup(golang.org/x/sync/errgroup): spawn N goroutines, propagate the first error, cancel siblings, wait for all. The standard pattern for parallel sub-tasks. Read the source-it is ~120 lines.singleflight(golang.org/x/sync/singleflight): deduplicate concurrent identical requests. The classic cache-stampede mitigator. Use for expensive lookups (DB, RPC) where a thundering herd is plausible.context.AfterFunc(since Go 1.21): register a callback to fire when a context is cancelled. Replaces the boilerplate ofgo func() { <-ctx.Done(); cleanup() }().context.Cause(since Go 1.20): retrieve the cancellation reason, including custom errors viaWithCancelCause.
11.3 Lab-"Context Discipline"¶
- Take a small HTTP service. Audit every blocking operation (DB query, downstream RPC, Redis call). Each should accept and propagate
ctx. Fail any goroutine that captures a requestctxand outlives the request. - Implement a parallel fan-out using
errgroupwith N=8 workers, all cancellable on first error. - Implement a cache stampede test: 1000 concurrent requests for the same uncached key. Without
singleflight, observe N upstream calls. Withsingleflight, observe 1. - Demonstrate
context.AfterFunccleanup: register a release-resource callback on cancellation; verify it fires under both timeout and explicit cancel.
11.4 Idiomatic & golangci-lint Drill¶
contextcheck(verifies context propagation),noctx(forbidscontext.Background()outsidemain/tests),staticcheck SA1029(context.WithValuewith built-in key type-collision hazard).
11.5 Production Hardening Slice¶
- Wire
contextdeadlines to your gRPC server's per-RPC timeouts. The pattern: take the incoming RPC deadline, optionally tighten it for downstream calls, and propagate. Document the deadline-budget calculation in your service'sRUNBOOK.md.
Week 12 - Worker Pools, Leak Detection, Deadlock Prevention¶
12.1 Conceptual Core¶
- Worker pool is the canonical "bounded concurrency" pattern: N worker goroutines consuming from a shared task channel. Bounds CPU, memory, and downstream RPC concurrency simultaneously.
- Goroutine leaks are Go's silent OOM. Most common shapes:
- Goroutine blocked on a channel that is never closed and never sent to.
- Goroutine blocked on
<-ctx.Done()of a context that nobody cancels. - Goroutine holding a reference (closure capture) to a request object that is now done.
time.Afterin aselectloop (allocates a timer per iteration; the timer leaks until expiry).- Deadlocks in Go are detected only by the runtime's "all goroutines asleep" check, which fires only when every goroutine is blocked. Most production deadlocks are partial: a subsystem deadlocks while the rest of the program runs. The race detector does not catch these.
12.2 Mechanical Detail¶
- The canonical worker pool:
Every line above is load-bearing: the double-select on input and output, the
func RunPool[T, R any](ctx context.Context, n int, in <-chan T, fn func(context.Context, T) (R, error)) <-chan Result[R] { out := make(chan Result[R]) var wg sync.WaitGroup wg.Add(n) for i := 0; i < n; i++ { go func() { defer wg.Done() for { select { case <-ctx.Done(): return case task, ok := <-in: if !ok { return } r, err := fn(ctx, task) select { case out <- Result[R]{r, err}: case <-ctx.Done(): return } } } }() } go func() { wg.Wait(); close(out) }() return out }wg.Doneindefer, the closer goroutine afterwg.Wait. - Leak detection tooling:
goleakfor tests.pprof goroutinefor production:curl /debug/pprof/goroutine?debug=2dumps every goroutine's stack. Read it.runtime.NumGoroutine()exported as a metric. A monotonically growing count is the leak signal.- Deadlock detection:
go-deadlock(sasha-s/go-deadlock) wrapssync.Mutexwith timing-based deadlock detection in dev builds.- For partial deadlocks: instrumentation on the lock acquisition path (lock contention metrics from
runtime/metrics). - Backpressure: when the worker pool is saturated, what should the caller see? Three strategies: block (default), drop (with metric), reject (return error). The choice is application-dependent; document it.
12.3 Lab-"Worker Pool Survival Test"¶
Build a worker pool that handles:
1. Backpressure-bounded input channel, drop-with-metric on overflow.
2. Graceful shutdown-on ctx.Done(), drain in-flight tasks within a deadline, then abandon the rest.
3. Per-task timeouts-WithTimeout(ctx, 100ms) per task.
4. Panic isolation-a panic in one task does not kill the worker; recover and report.
5. Leak-clean-goleak passes after cancel(); pool.Wait().
Stress-test with 1M tasks across 1000 workers under - race`.
12.4 Idiomatic & golangci-lint Drill¶
bodyclose(HTTP responses leaked),rowserrcheck(sql.Rows.Err unchecked),sqlclosecheck. All three are leak-class lints; enable them as - D warnings`.
12.5 Production Hardening Slice¶
- Add a
/debug/pprof/goroutineperiodic snapshot job to your service template: every 5 minutes, capture the goroutine count and the top-N stacks. Surface as a Prometheus gauge with stack-hash labels (low cardinality). On a leak, you will see which stack is growing without paging anyone.
Month 3 Capstone Deliverable¶
A concurrency-lab/ workspace:
1. chan-bench (week 9)-channel vs mutex vs atomic ring, with a markdown writeup.
2. spsc-ring (week 10)-atomic-only, race-clean, with cache-pad ablation.
3. context-discipline (week 11)-a refactored HTTP service plus a singleflight cache demo.
4. survival-pool (week 12)-the worker pool that survives the five failure modes.
CI gates additions: - raceon every test, - race -count=100 on critical packages, goleak baseline, 0-alloc regression guard on the SPSC ring's hot path. Open one upstream PR-even a doc fix to errgroup or `singleflight - by month end.
Month 4-Reflection, Code Generation, Plugins¶
Goal: by the end of week 16 you can (a) write a reflect - based serializer that allocates exactly once per top-level call, (b) implement ago generatedirective that walks anast.Packageand emits idiomatic Go, (c) ship a customgolangci-lint - compatible analyzer using go/analysis, and (d) build a hot-loadable plugin system using HashiCorp's go-plugin.
Weeks¶
- Week 13 - Reflection:
reflect, Performance, and Discipline - Week 14 -
go/ast,go/parser,go/types: Static Analysis - Week 15 -
go generateand AST-Based Code Generation - Week 16 - Plugins:
plugin,go-plugin, gRPC-Based Extensions
Week 13 - Reflection: reflect, Performance, and Discipline¶
13.1 Conceptual Core¶
- The
reflectpackage exposes the runtime type system: every Go value has areflect.Type(its static type) and areflect.Value(a wrapper holding the value plus its type). Together they let you inspect and manipulate values whose concrete type is known only at runtime. - The two reflection use cases that matter:
- Generic serialization / deserialization (encoding/json, encoding/gob, gorm, sqlx)-when the input is
any. - Schema-driven adapters-config loaders, ORM tag parsers, validators.
- Reflection is slow. Roughly 5–50× the cost of direct field access. The standard libraries that use it (
encoding/json) compensate by cachingreflect.Typelookups and method tables per type.
13.2 Mechanical Detail¶
reflect.Typeis comparable (==) by identity-tworeflect.Typevalues are equal iff they describe the same Go type. This makesmap[reflect.Type]Cachea load-bearing pattern.reflect.Value.Kind()returns the underlying kind (Struct,Ptr,Slice, etc.).reflect.Value.Type()returns the named type. The two differ for named types:type MyInt inthas KindInt, TypeMyInt.- Field iteration:
t.NumField(),t.Field(i)returns aStructFieldwithName,Type,Tag,Index,Anonymous,PkgPath.Tag.Get("json")is the canonical tag-parsing path. - Method invocation:
v.Method(i).Call([]reflect.Value{...}). Allocates the slice and the result. unsafe.Pointershortcut: for performance-critical reflection, take the field address viaunsafe.Pointer(v.Field(i).UnsafeAddr())and read it as the typed value. This is whatmapstructureand high-performance JSON libraries do internally. Read the safety contract carefully-it's narrow.- Caching pattern:
13.3 Lab-"A Reflective Validator"¶
Build a struct validator that processes validate:"..." tags:
- Must support: required, min=N, max=N, email, regexp=<re>.
- Must cache per-type field metadata (one reflect.Type walk per type ever).
- Must produce structured errors (path, rule, value).
- Must beat a naive non-cached implementation by 10× in benchmarks.
Compare against go-playground/validator for both ergonomics and performance.
13.4 Idiomatic & golangci-lint Drill¶
staticcheck SA1019(deprecated reflect APIs),gocritic: hugeParam. The pattern of acceptinganythen immediately callingreflect.ValueOfis a smell-prefer typed APIs whenever possible.
13.5 Production Hardening Slice¶
- Add a benchmark that captures the per-call allocation count for the validator's hot path. The hot path (validating a previously-seen type) must allocate ≤1 time. CI fails on regressions.
Week 14 - go/ast, go/parser, go/types: Static Analysis¶
14.1 Conceptual Core¶
- The
go/astpackage represents Go source as a syntax tree. Thego/parserpackage parses source files intoast.Files. Thego/typespackage performs type checking and resolves identifiers to declarations. - The triad (
ast+parser+types) is the foundation for every serious Go tool:gofmt,goimports,gopls,golangci-lint,staticcheck,mockgen,sqlc. golang.org/x/tools/go/packagesis the modern entry point for loading a Go program for analysis. It handles modules, build tags, and CGO transparently. Use this; do not callparser.ParseFiledirectly except for single-file tools.golang.org/x/tools/go/analysisis the framework for writing analyzers-small, composable passes consumed bygo vet,golangci-lint, and standalone drivers.
14.2 Mechanical Detail¶
- Loading a package:
The
cfg := &packages.Config{Mode: packages.NeedTypes | packages.NeedSyntax | packages.NeedTypesInfo} pkgs, _ := packages.Load(cfg, "./...")Modeflags determine cost; load only what you need. - Walking AST:
- Type information:
pkg.TypesInfo.Types[expr]→ the type of an expression.pkg.TypesInfo.Defs[ident]/Uses[ident]→ the object an identifier defines or uses.pkg.TypesInfo.ObjectOf(ident)→ the resolved object (can be a*types.Var,*types.Func,*types.TypeName, etc.).- Writing an analyzer:
Compile as a binary using
var Analyzer = &analysis.Analyzer{ Name: "noprintln", Doc: "disallow fmt.Println in production code", Run: func(pass *analysis.Pass) (any, error) { /* walk pass.Files */ }, }unitcheckeror load via thegolangci-lintplugin system. - Common pitfalls: position information (
token.Pos) is meaningless without thetoken.FileSetit was created from; always pass them together. Comment groups are a separate field onast.File, not attached to AST nodes by default-ast.CommentMapbridges them.
14.3 Lab-"Build a Custom Analyzer"¶
Write an analyzer that flags:
1. context.Background() calls outside main and *_test.go files.
2. time.After inside a select body (the classic timer-leak pattern).
3. Goroutines launched with closures capturing a context.Context parameter named ctx of an enclosing HTTP handler (heuristic; document the false-positive risk).
Wire as a unitchecker binary. Run on a real codebase and triage findings. Document each false positive in ANALYZER_NOTES.md.
14.4 Idiomatic & golangci-lint Drill¶
- Read
staticcheck's source for two of its analyzers (e.g.,SA1015andSA4006). Internalize the analyzer-author idioms.
14.5 Production Hardening Slice¶
- Publish your analyzer as a module. Add a
golangci-lintcustomplugin entry so it runs alongside the standard suite. CI now enforces your project's idioms automatically.
Week 15 - go generate and AST-Based Code Generation¶
15.1 Conceptual Core¶
go generateis a convention, not a feature. It scans source files for//go:generate <command>comments and runs them. The output is normal Go source, committed to the repo.- The pattern is preferred over reflection for performance-critical paths: generate exhaustive code at build time, with no
reflectcost at runtime. - Canonical tools:
stringer-String()method for enum-like int types.mockgen-interface mocks for testing.sqlc-SQL → typed Go from query files.ent-schema → typed Go ORM.buf+protoc-gen-go-grpc-protobuf → Go.
15.2 Mechanical Detail¶
- Writing a generator (template-based):
//go:embed tmpl/api.tmpl var apiTmpl string type binding struct{ Name, Method, Path, Result string } func main() { cfg := &packages.Config{Mode: packages.NeedTypes | packages.NeedSyntax | ...} pkgs, _ := packages.Load(cfg, ".") bindings := extractBindings(pkgs[0]) // walks AST var buf bytes.Buffer template.Must(template.New("api").Parse(apiTmpl)).Execute(&buf, bindings) formatted, _ := format.Source(buf.Bytes()) // gofmt the output os.WriteFile("api_generated.go", formatted, 0644) } format.Source-always run generated bytes through it. Ungofmt'd generated code is an immediate code-review smell.- Token-based building (when templates get unwieldy):
go/ast+go/printer. Construct AST nodes programmatically;printer.Fprint(w, fset, node)writes them out. More verbose, more correct. - Generation hygiene:
- Add
// Code generated by foo. DO NOT EDIT.as the first line.goplsand reviewers honor this convention. - Commit the generated files. Do not run generation in CI by default; verify it is up-to-date via
go generate ./... && git diff --exit-code. - Keep generators small and composable. A 5000-line generator is a sign you should be using a real schema language (protobuf, openapi).
15.3 Lab-"Three Generators"¶
Build three small generators:
1. Enum stringer-a from-scratch reimplementation of stringer for one annotation pattern.
2. Mock generator-for one interface, generate a struct with method recorders and call assertions.
3. JSON marshaler-generate a type-specific MarshalJSON that allocates zero maps. Compare allocations against encoding/json for the same type.
For each: go vet - clean output,gofmt - formatted, with a go generate directive in the consumer file.
15.4 Idiomatic & golangci-lint Drill¶
revive: file-header(require theDO NOT EDITline on generated files),gocritic: dupArg. Configuregolangci-lintto skip generated files for most lints (exclude-filesor per-linterexclude-rules).
15.5 Production Hardening Slice¶
- Add a CI step
make generate && git diff --exit-codethat fails when generated code is stale relative to its inputs. This catches the "I forgot to regenerate" PR antipattern.
Week 16 - Plugins: plugin, go-plugin, gRPC-Based Extensions¶
16.1 Conceptual Core¶
- Go has two plugin stories:
pluginpackage (stdlib)-load.sofiles at runtime viadlopen. Linux/macOS only; brittle in practice (every dependency must match the host's exact build, including the Go version).- HashiCorp
go-plugin-out-of-process plugins communicating via gRPC or net/rpc over a local pipe. Used by Terraform, Vault, Packer, Nomad. Robust, polyglot, version-tolerant. - For any production extensibility story today, use
go-plugin(or its design pattern)-not the stdlibpluginpackage.
16.2 Mechanical Detail¶
pluginpackage mechanics:plugin.Open("./plug.so")loads the shared object.p.Lookup("Symbol")returns aninterface{}that you type-assert.- Constraints: same Go version, same module versions of every shared dependency, same build flags. In practice, used only for narrow, controlled use cases.
go-plugindesign:- Host process spawns plugin as a subprocess.
- Plugin advertises a "magic cookie" to confirm both sides agree.
- They negotiate a protocol version and one of (gRPC, net/rpc) as the transport.
- The host calls the plugin's interface methods, which round-trip over the pipe.
- On host shutdown, the plugin process is killed.
- Versioning: declare a
HandshakeConfigand one or morePlugininterfaces per protocol version. Drop old versions on major bumps. - Performance: per-call latency is microseconds (in-process) or tens of microseconds (cross-process). Not for hot paths; use for control-plane operations (provisioning, configuration, lifecycle).
16.3 Lab-"A Pluggable Storage Backend"¶
Build a service whose storage backend is a plugin. The host defines an interface Storage { Get(key) (val, err); Put(key, val) error; Delete(key) error }. Ship two plugins: an in-memory backend, and a file-system backend. Both communicate via gRPC over go-plugin. Demonstrate hot-swap by killing one plugin process and starting the other.
16.4 Idiomatic & golangci-lint Drill¶
staticcheck SA1019(deprecatednet/rpcpatterns),gocritic: ifElseChain. Plugin code paths are often where dependency-injection mistakes accumulate; review with discipline.
16.5 Production Hardening Slice¶
- Add structured logging across the host/plugin boundary using
slogwith consistent attribute keys. Add a health-check method to every plugin interface; the host periodically probes it and ejects unhealthy plugins.
Month 4 Capstone Deliverable¶
A reflect-codegen-plugins/ workspace:
1. validator-rs (week 13)-cached reflective validator with the 10× win.
2. noctx-analyzer (week 14)-unitchecker binary, runs in CI.
3. three-gens (week 15)-stringer + mock + JSON marshaler generators.
4. pluggable-storage (week 16)-go-plugin host + two backends.
CI gates additions: custom analyzer in golangci-lint, generated-code freshness check, go-plugin integration test under - race. By end of month, open one PR upstream againstgolangci-lint(a small custom-analyzer doc fix is sufficient) orgo-playground/validator` (a benchmark, a doc, anything).
Month 5-Production-Grade Distributed Systems Engineering¶
Goal: by the end of week 20 you can (a) lay out a non-trivial Go service following hexagonal/DDD principles and justify each boundary, (b) instrument a service with slog, pprof, OpenTelemetry traces, and metrics, (c) implement a gRPC service with proper deadlines, retries, interceptors, and outlier ejection, and (d) build a five-surface test pyramid that will catch races and goroutine leaks before production.
Weeks¶
- Week 17 - DDD in Go: Hexagonal Architecture, Bounded Contexts
- Week 18 - Observability:
slog,pprof,trace, OpenTelemetry - Week 19 - gRPC: Streaming, Interceptors, Deadlines, Retries, Outlier Ejection
- Week 20 - Testing Strategy: Five Surfaces, Race-Clean
Week 17 - DDD in Go: Hexagonal Architecture, Bounded Contexts¶
17.1 Conceptual Core¶
- Domain-Driven Design in Go starts with one observation: Go's package system is a bounded-context tool. A package can hide types, expose only the interfaces consumers need, and the import graph enforces direction.
- The hexagonal pattern in Go:
- Domain package: pure types, behaviors, ports (interfaces) for external dependencies. No imports from
net/http,database/sql, etc. This is the dependency-direction rule. - Adapter packages: one per external system (postgres, kafka, http-client). Each implements the ports the domain defines.
- Application package: use cases-methods that orchestrate domain operations across adapters.
- Cmd package: composition root. Wires adapters into a runnable binary.
- The three Go-specific hazards:
- Anaemic domain: types are bags of fields with all logic in services. Push behavior into the type.
- Receiver-method abuse: mutating methods on value receivers (compile-pass, semantic-fail). Pick
Tvs*Tdeliberately. internal/not used: Go'sinternal/directory restricts imports to subtrees. Use it aggressively to enforce layering.
17.2 Mechanical Detail¶
- Layout for a hexagonal Go service:
Note
service/ cmd/ api/main.go # composition root internal/ domain/ # pure types + ports (interfaces) application/ # use cases adapter/ postgres/ # impl PostgresUserRepo kafka/ # impl EventBus http/ # impl HTTP handlers platform/ observability/ # slog, otel, prom wiring pkg/ # exported (rare; most things are internal/)internal/: nothing outsideservice/...can import it. This is the architectural test. - Defining ports as interfaces:
Define interfaces where they are consumed (in
package domain type UserRepo interface { ByID(ctx context.Context, id UserID) (User, error) Save(ctx context.Context, u User) error }domainorapplication), not where they are implemented. This is "consumer-defined interfaces," the Go counterpart to dependency inversion. - Errors as domain values: a domain error like
ErrUserNotFound = errors.New("user not found")plusvar ErrUserNotFoundforerrors.Ismatching. Adapter packages translatesql.ErrNoRowstodomain.ErrUserNotFoundat the seam. - Avoiding leakage: never let a
*sql.Txor a*http.Requestcross intodomain. The compiler will not stop you; the architectural test will.
17.3 Lab-"A Hexagonal URL Shortener"¶
Build a workspace implementing a URL shortener:
- internal/domain -ShortURLaggregate,URLRepoandHasherports.
-internal/application - Shorten and Resolve use cases.
- internal/adapter/postgres - implementsURLRepoagainst a real Postgres (usepgxnotdatabase/sql).
-internal/adapter/http - REST handlers using application.
- internal/adapter/memory - in-memoryURLRepofor tests.
-cmd/api - wires everything.
The architectural test (a Go test) walks the import graph and fails if internal/domain imports any adapter package or stdlib networking package.
17.4 Idiomatic & golangci-lint Drill¶
depguard(forbids cross-layer imports),revive: empty-block,gocritic: dupCase. Thedepguardrules become the executable architecture documentation.
17.5 Production Hardening Slice¶
- Add
depguardrules forbiddinginternal/domainfrom importingnet/http,database/sql,context.Background, and any third-party adapter packages. CI fails on a violation. This is the architectural test in lint form.
Week 18 - Observability: slog, pprof, trace, OpenTelemetry¶
18.1 Conceptual Core¶
- The "three pillars"-logs, metrics, traces-map cleanly to four Go tools:
- Logs:
log/slog(stdlib, since 1.21). Structured, context-aware. - Metrics:
prometheus/client_golangplus `runtime/metrics - derived collectors. - Traces:
go.opentelemetry.io/otelwith an OTLP exporter. - Profiles:
pprof(CPU, heap, allocs, block, mutex, goroutine). - Execution traces: `runtime/trace - the tool when none of the above tell you why a goroutine is slow.
- Two cross-cutting principles:
- Correlation: every log line, trace span, and metric label uses the same
trace_idandrequest_id. This requires plumbing throughcontext.Context. - Cardinality discipline: never put unbounded values (user IDs, URLs with query strings, request IDs) into metric labels.
18.2 Mechanical Detail-slog¶
slog.Default(),slog.New(handler),slog.With(attrs...). Handlers:JSONHandler,TextHandler, custom.- Context-aware logging: derive a logger per request, store in context (one of the few legitimate
context.Valueuses), retrieve at log sites: - Sensitive-data redaction: implement a
slog.LogValueron types containing PII; theLogValue()method returns a redacted form. This pushes redaction into the type, not the call site.
18.3 Mechanical Detail-pprof and trace¶
- Endpoints:
import _ "net/http/pprof"registers handlers onhttp.DefaultServeMux. Mount on a separate port and gate behind auth-never on the public listener. - CPU profile:
go tool pprof http://host:6060/debug/pprof/profile?seconds=30. Top-heavy stacks, flame graphs (pprof -http=:0). - Heap profile:
pprof http://host:6060/debug/pprof/heap. Default modeinuse_spaceshows live retention; - alloc_objects` shows churn. - Block / mutex profiles:
runtime.SetBlockProfileRate(1)andruntime.SetMutexProfileFraction(1). Off by default-turn on briefly when investigating. - Goroutine profile:
?debug=2dumps full stacks. Read it line by line when chasing leaks. - Execution trace:
runtime/trace.Start(w). Captures every G's lifecycle, GC events, syscall durations. Visualize withgo tool trace. The most expensive but most informative tool.
18.4 Mechanical Detail-OpenTelemetry¶
- SDK setup:
otel.SetTracerProvider(...)+otel.SetTextMapPropagator(propagation.TraceContext{}). Use the OTLP gRPC exporter to a local collector. - Span creation:
ctx, span := tracer.Start(ctx, "operation"); defer span.End(). Alwaysdefer span.End()immediately after creation. - Span attributes: low-cardinality. Never put PII in attributes.
- Span events: structured logs attached to a span. Use sparingly; spans-as-logs leads to trace cardinality explosions.
- Instrumentation libraries:
otelhttp,otelgrpc,otelsql. Auto-propagate through standard transports.
18.5 Lab-"Wire the URL Shortener"¶
Take week 17's URL shortener and add:
- slog JSON output with request-scoped logger via context.
- /metrics Prometheus endpoint exposing request count, latency histogram, and Go runtime metrics.
- OTLP traces exported to a local Jaeger via docker-compose.
- /debug/pprof/* on a separate admin port, gated by IP allowlist.
- A 30-second runtime/trace capture under load, committed as trace.out with a markdown analysis.
18.6 Idiomatic & golangci-lint Drill¶
forbidigo(forbidfmt.Printlnoutside tests,log.Print*afterslogadoption),loggercheck(uniformslogkey conventions).
18.7 Production Hardening Slice¶
- Add a sampling-based redaction layer at the
slog.Handlerlevel: any attribute matchingemail|password|tokenregex is redacted. Unit-test the redaction. This is a compliance prerequisite in regulated environments.
Week 19 - gRPC: Streaming, Interceptors, Deadlines, Retries, Outlier Ejection¶
19.1 Conceptual Core¶
- gRPC is HTTP/2 with a binary protocol (Protocol Buffers) and four call shapes: unary, server-streaming, client-streaming, bidirectional-streaming.
- The Go implementation (
google.golang.org/grpc) is the canonical one. Read its source:google.golang.org/grpc/server.go,clientconn.go,stream.go. - Production gRPC concerns:
- Deadlines: every call must have one. Set on the client; propagate via context.
- Retries: configured via service config, with backoff. Idempotent calls only.
- Interceptors: cross-cutting middleware (logging, tracing, metrics, auth). Both unary and stream variants.
- Health checking:
grpc.health.v1.Healthstandard service. - Load balancing: client-side, via
resolver+balancerplugins. - Connection management: connections are HTTP/2 multiplexed; default max concurrent streams is 100-tune up for high-fanout clients.
19.2 Mechanical Detail¶
- Server setup:
- Client setup:
cc, err := grpc.NewClient("dns:///service.local:50051", grpc.WithTransportCredentials(creds), grpc.WithChainUnaryInterceptor(otelgrpc.UnaryClientInterceptor()), grpc.WithDefaultServiceConfig(`{ "loadBalancingConfig": [{"round_robin":{}}], "methodConfig": [{ "name": [{"service":"foo.Bar"}], "retryPolicy": { "maxAttempts": 3, "initialBackoff": "0.1s", "maxBackoff": "1s", "backoffMultiplier": 2, "retryableStatusCodes": ["UNAVAILABLE"] }, "timeout": "2s" }] }`), ) - Streaming patterns: server-streaming for log tail; client-streaming for batch upload; bidi for chat-like interaction. Always pair stream lifecycle with
context.Contextso cancellation works. - Outlier ejection: at the client
balancerlevel, eject endpoints with high error rates. Thexdsbalancer supports it natively; for simpler setups, implement aPickerwrapper. - Backpressure on streams: HTTP/2 has flow control. The Go gRPC implementation respects it. If your server is slow to send, the client will block writes, and vice versa. Do not rely on unbounded internal buffers.
19.3 Lab-"A Hardened gRPC Service"¶
Build a minimal Echo service with:
- Unary + server-streaming + bidi methods.
- Server interceptors for: panic recovery, request logging, OTel tracing, auth, rate limiting.
- Client config with retries (UNAVAILABLE only), 2 s default deadline, round-robin load balancing.
- A grpc.health.v1 health server.
- A tools/grpc_load_test/ directory with `ghz - based load tests; capture latency p50/p95/p99 under 10K QPS.
19.4 Idiomatic & golangci-lint Drill¶
protogetter(use generated getters),goerr113(use sentinel errors),nilerr,errchkjson.
19.5 Production Hardening Slice¶
- Wire deadline propagation tests: a client request with a 500 ms deadline must result in the server seeing a context with a similar deadline (within a budget). Failure here is the single most common gRPC production bug.
Week 20 - Testing Strategy: Five Surfaces, Race-Clean¶
20.1 Conceptual Core¶
- A production Go service has five test surfaces:
- Unit-
*_test.goin the same package, table-driven, fast. - Integration-
*_test.gowith a real Postgres/Kafka/Redis viatestcontainers-go. - Property-based-
gopteror stdlibtesting/quick. Less common in Go than in Haskell/Rust, but valuable for parsers and serializers. - Fuzz-stdlib
func FuzzX(f *testing.F)(since Go 1.18). Native, well-integrated, must be in CI. - End-to-end-the binary, with all real dependencies, via
go runor compiled artifact. - Each surface answers a different question. Skipping one leaves a class of bugs uncovered.
20.2 Mechanical Detail¶
- Table-driven test idiom:
func TestParse(t *testing.T) { tests := []struct{ name string in string want Result err error }{ {"empty", "", Result{}, ErrEmpty}, // ... } for _, tc := range tests { t.Run(tc.name, func(t *testing.T) { got, err := Parse(tc.in) if !errors.Is(err, tc.err) { t.Fatalf("err: got %v, want %v", err, tc.err) } if !cmp.Equal(got, tc.want) { t.Fatalf("got %v, want %v", got, tc.want) } }) } } testify/requirefor terse assertions,google/go-cmp/cmpfor deep equality with custom comparers.- Fuzz tests:
Run as
func FuzzParse(f *testing.F) { f.Add("hello") f.Fuzz(func(t *testing.T, in string) { out, err := Parse(in) if err == nil { if Roundtrip(out) != in { t.Fatal("not idempotent") } } }) }go test -fuzz=FuzzParse -fuzztime=30s. Persist the corpus. testcontainers-gofor integration: spin a real Postgres in-test, get a connection string, run schema migrations, exercise the adapter. Per-test cost is ~1–3 s container startup; amortize via test-package-level setup.- Race detector economics: - race` slows tests 5–10× and uses ~5–10× memory. Always run in CI; locally optional. Always run on a freshly written concurrent test before committing.
20.3 Lab-"Test-Pyramid the URL Shortener"¶
- Unit: 100% line coverage on
internal/domainandinternal/applicationusing mocks for ports. - Integration:
testcontainers-goPostgres for the postgres adapter. - Fuzz: fuzz the alias-generation function, persisting any crashing inputs.
- Property:
goptertest that "shorten then resolve returns original URL." - E2E: a
make e2etarget that spins the full stack viadocker-compose, hits the HTTP API, asserts behavior. - All five surfaces run in CI under - race -count=1`.
20.4 Idiomatic & golangci-lint Drill¶
tparallel,paralleltest(encouragest.Parallel()),thelper(mark test helpers),testifylint(correct testify usage).
20.5 Production Hardening Slice¶
- Add a continuous-fuzzing job (e.g., scheduled GitHub Action) that runs each
Fuzz*function for 5 minutes against the latest corpus. Persist the corpus as an artifact. Any new crashing input is a P0 issue with a runbook entry.
Month 5 Capstone Deliverable¶
A production-shaped url-shortener-prod/ workspace with:
- Hexagonal layout enforced by depguard.
- Full observability stack (slog + Prometheus + OTel + pprof).
- gRPC sibling service (e.g., a metrics-export gRPC) with hardened client/server config.
- Five test surfaces in CI, all - race - clean.
- A one-page RUNBOOK.md describing alarms, dashboards, deadline budgets, and rollback procedures.
This is the first artifact that resembles a real production service. Treat it as a portfolio piece.
Month 6-Mastery: Consensus, Distributed Storage, Performance Tuning, Defense¶
Goal: by the end of week 24 you have shipped one capstone deliverable in your chosen track (Raft KV / gRPC mesh / streaming pipeline) and can defend every design decision in a senior-level technical interview.
Weeks¶
- Week 21 - Consensus Algorithms: Raft (and a Glance at Paxos)
- Week 22 - Distributed Storage Patterns
- Week 23 - Performance Tuning: Profile, Tune, Re-Profile
- Week 24 - Capstone Integration, Defense, Final Hardening
Week 21 - Consensus Algorithms: Raft (and a Glance at Paxos)¶
21.1 Conceptual Core¶
- Consensus is the problem of getting N nodes to agree on a sequence of values despite arbitrary message loss, reordering, and node failure (but not Byzantine failure).
- Raft is the modern teaching consensus: leader-based, log-replication-centric, decomposed into three sub-problems-leader election, log replication, safety. Read the Ongaro paper.
- Paxos is the older, denser counterpart. Read the Paxos Made Simple paper for fluency, but use Raft in implementation.
- The two properties Raft guarantees:
- Log matching: if two logs contain an entry with the same index and term, they are identical up to that index.
- Leader completeness: a committed entry exists in every leader's log thereafter.
21.2 Mechanical Detail¶
- Roles: Follower → Candidate (on election timeout) → Leader (on majority vote).
- Terms: monotonically increasing election epochs. Every RPC carries a term. Stale terms are rejected.
- Log entries:
(term, index, command). The leader appends client commands and replicates viaAppendEntries. - Commit: an entry is committed when a majority has it in their log. The leader advances
commitIndex. Once committed, the state machine can apply it. - Snapshots: long logs are compacted via
InstallSnapshot. Without snapshots, restart time and storage grow unbounded. - Production Raft libraries in Go:
- `hashicorp/raft - used by Consul, Nomad, Vault. Stable, mature, opinionated.
- `etcd-io/raft - used by etcd, CockroachDB, Kubernetes etcd. More flexible, more low-level.
- Read both. Pick
etcd-io/raftfor new builds-it has been hardened by years of CockroachDB and etcd production load.
21.3 Lab-"Read Raft in Anger"¶
- Read
etcd-io/raft/node.goandraft.goend-to-end. Annotate the state machine transitions. - Build a minimal in-memory KV store on top: a single goroutine consumes from
node.Ready(), applies entries to amap[string]string, persists log entries to a WAL, sends messages to peers, and acknowledges. - Run a 3-node cluster locally. Kill the leader; observe an election. Restart; observe log catchup.
- Add a snapshot mechanism every 10K entries.
21.4 Idiomatic & golangci-lint Drill¶
- The Raft codebases are dense; do not lint-refactor them. Instead, study their style: small functions, explicit state transitions, testable seams.
21.5 Production Hardening Slice¶
- Add
jepsen-io/jepsen - style fault injection: random partition, random clock skew, random crash. Run for 30 minutes. Verify linearizability via youretcd-io/raft - derived KV's history.
Week 22 - Distributed Storage Patterns¶
22.1 Conceptual Core¶
- The matching engine of a distributed storage system is consensus (week 21). The engineering of one is everything around it: durable storage, replication, partitioning, snapshotting, repair, observability.
- Three patterns to know:
- Replicated state machine (Raft, Paxos): one consensus group, each node holds the full data set. Linearizable; throughput limited by the leader.
- Sharded replicated (etcd, CockroachDB ranges): many consensus groups, one per data shard. Horizontal scale.
- Eventually consistent (Cassandra, DynamoDB): no consensus on the write path; quorum reads, hinted handoff, anti-entropy. Different consistency model.
22.2 Mechanical Detail¶
- WAL discipline: every state-changing op is durably logged before acknowledgment.
fsyncafter each batch (or per-op for stricter durability). The WAL is the source of truth for recovery. - Snapshots: periodic point-in-time captures of the state machine. Truncate the WAL behind them. Snapshot format must be efficient to ship to a recovering follower.
- Membership changes: adding/removing nodes is the hardest correctness boundary. Raft's "joint consensus" handles this. Both
hashicorp/raftandetcd-io/raftprovide APIs; do not roll your own. - Linearizable reads: three options-read from the leader after a heartbeat round (
etcd"linearizable read" with read-index), read from any node with a lease, or read after a no-op append. Each has tradeoffs. - Storage engine choice: BoltDB (simple, single-writer, great for Raft logs), BadgerDB (LSM-based, higher throughput), Pebble (CockroachDB's RocksDB replacement, the modern choice for high-throughput).
22.3 Lab-"Harden the KV Store"¶
Take the week 21 Raft KV and add:
1. Pebble as the storage engine for both the WAL and the state machine.
2. Snapshots every N entries, with InstallSnapshot to recovering followers.
3. Linearizable reads via read-index.
4. Membership changes: add and remove nodes online.
5. Metrics: per-node Raft state, log lag, snapshot duration, apply latency.
22.4 Idiomatic & golangci-lint Drill¶
errcheck,errorlint,wrapcheck. Distributed-systems code is almost entirely error handling; lint rigor is non-optional.
22.5 Production Hardening Slice¶
- Add a Jepsen-style "nemesis" goroutine to your test harness that randomly partitions, pauses, and restarts nodes. Verify linearizability over 1M operations.
Week 23 - Performance Tuning: Profile, Tune, Re-Profile¶
23.1 Conceptual Core¶
- The discipline: measure, then change. Never optimize from intuition. The profilers-
pprof,trace, `runtime/metrics - are the source of truth. - A working flow: capture a baseline profile, propose a hypothesis, change one thing, re-profile, accept or reject. Commit each accepted change with the before/after profile linked.
23.2 Mechanical Detail-The Tuning Toolkit¶
- CPU profile:
pprof http://host/debug/pprof/profile?seconds=60. Top-heavy stacks identify the hot path. Look forruntime.mallocgc,runtime.scanobject, `runtime.gcDrain - these mean GC is the bottleneck. - Heap profile: identify retention.
pprof -inuse_spaceshows what's live; - alloc_objects` shows churn. - Block profile:
runtime.SetBlockProfileRate(1)thenpprof /debug/pprof/block. Identifies channel/syscall waits. - Mutex profile:
runtime.SetMutexProfileFraction(1)thenpprof /debug/pprof/mutex. Identifies contended locks. - Goroutine profile: stack distribution. Sudden growth = leak.
- Execution trace:
go tool trace. The expensive but most informative tool. Identifies scheduler latencies, GC pauses, and goroutine-state transitions. - PGO (Profile-Guided Optimization): stable since Go 1.21. Capture a representative CPU profile in production, place it as
default.pgo, rebuild. ~5–10% throughput win on hot paths. benchstat: compare twogo test -benchruns statistically. Reports geomean and significance.
23.3 Mechanical Detail-Common Wins¶
- Replace
interface{}boxing in hot paths with concrete types or generics. - Reuse allocations via
sync.Pool(with the discipline from Week 8). - Pre-size slices and maps when capacity is known:
make([]T, 0, n),make(map[K]V, n). - Avoid
deferin hot loops (Go 1.14+ madedefer~zero-cost in most cases, but the loop variant still has overhead). strings.Builderover+=for building strings.- Slice instead of map for small collections (<~50 entries)-linear scan beats hash on modern caches.
- Goroutine cost: launching is cheap, but the aggregate of millions of goroutines on a long-tail-blocked path is not. Bound concurrency.
23.4 Lab-"Profile-Tune-Profile"¶
Take your capstone (whatever track) and:
1. Capture a CPU profile under representative load. Identify the top 5 functions.
2. Pick one and propose a fix. Estimate the win in advance.
3. Implement, re-profile, compare with benchstat. Document each change in PERF_LOG.md.
4. Capture a runtime/trace and identify any GC or scheduler stalls. Fix one.
5. Apply PGO. Confirm the win.
23.5 Idiomatic & golangci-lint Drill¶
prealloc,gosimple S1024(time.Subinstead oftime.Now().Sub),gocritic: rangeValCopy. Final lint pass-your codebase should be near-zero findings.
23.6 Production Hardening Slice¶
- Wire PGO into your release pipeline: a "canary" deploy collects a profile, the next "stable" build uses it. Document the procedure in
RELEASE.md.
Week 24 - Capstone Integration, Defense, Final Hardening¶
24.1 Conceptual Core¶
The final week is integration, not new material. Bring your chosen capstone (see CAPSTONE_PROJECTS.md) to production-defensible quality.
24.2 The Final Hardening Checklist¶
By now, every previous module has fed the hardening/ template. Roll it up into one final release-checklist.md:
-
gofmt,go vet,golangci-lint runclean (zero findings, allnolintannotations have a documented reason). - All tests pass under - race -count=10`.
- Fuzz harnesses for every parser/serializer; CI runs them for ≥30s per fuzzer.
-
goleakpasses for every package using goroutines. - PGO applied; benchmark deltas committed.
-
pprofendpoints behind admin port + auth; documented. - OTel traces, Prometheus metrics,
slogJSON logs-wired and tested. -
GOMEMLIMITset from cgroup memory at startup. -
runtime.SetMaxStackset to a sane bound (default 1 GiB is too lenient). - Cross-compilation matrix green:
linux/amd64,linux/arm64,darwin/arm64minimum. - Build is reproducible: - trimpath
, pinned toolchain, deterministicDockerfile`. - Binary size optimized: - ldflags="-s -w"
, optionallyupx` if startup time is irrelevant (rarely worth it). - SBOM generated (
cyclonedx-gomod); release artifacts signed (cosign). -
RUNBOOK.md,THREAT_MODEL.md, ADRs (≥3), andSECURITY.mdpresent. - On-call alarms wired to the metrics that matter (p99 latency, error rate, goroutine count, GC pause p99, memory headroom).
24.3 Lab-"Defend the Design"¶
Schedule a 45-minute mock review with a senior peer (or record yourself). Present:
- The architecture diagram.
- One slide per non-obvious decision (e.g., "why etcd-io/raft over hashicorp/raft", "why Pebble over BoltDB", "why server-streaming over polling").
- A live demo of the test suite ( - race`, fuzzing, integration).
- A live demo of the observability stack (Jaeger, Prometheus, pprof).
- A live demo of fault tolerance (kill the leader, watch recovery).
The deliverable is the defense, not the slides. If you cannot answer "what is the worst-case write latency under leader change?" or "what is your goroutine count under 10× load?", you have not yet finished the curriculum.
24.4 Idiomatic & golangci-lint Drill¶
- Final pass:
golangci-lint run --enable-all --disable=lll,wsl --timeout=10m. Either fix or//nolint:linter // reasonwith a justification. Zero unjustified suppressions.
24.5 Production Hardening Slice¶
- Tag the capstone repo
v1.0.0. Generate a release artifact withgoreleaser. Sign withcosign. Publish a CHANGELOG. The final commit hash is the artifact you reference on your resume.
Month 6 Deliverable¶
The chosen capstone (see CAPSTONE_PROJECTS.md)-running, defensible, hardened. Plus the hardening/ template, now a publishable Go-module starter under your name.
You are done. The next steps are no longer pedagogical; they are professional.
Appendix A-Production Hardening Reference¶
This appendix consolidates the hardening slices distributed throughout the curriculum. By week 24 the reader's hardening/ template should contain a working example of every section below.
A.1 Build & Release¶
A.1.1 go build flags worth knowing¶
- ** - trimpath`-strips local file paths from the binary. Always on** in release builds; required for reproducible builds.
- ** - ldflags="-s -w"`**-strips DWARF and symbol tables. ~30% size reduction. Only enable for production releases (debugging is harder; core dumps less useful).
- ** - ldflags="-X main.version=v1.2.3"
**-embeds version info. Pair with - X main.commit=$(git rev-parse HEAD)and - X main.buildDate=...`. - ** - buildmode=pie`**-position-independent executable. Required for ASLR on hardened deployments.
- ** - buildvcs=true
**-embed VCS info (default on with modules);go version -m` reads it back. - ** - tags=netgo,osusergo`**-pure-Go DNS/user resolvers. Required for fully static binaries on Linux.
A.1.2 Build tags for cross-platform code¶
- Tags gate file-level compilation. - Common patterns://go:build linux, //go:build !windows, //go:build integration (for slow tests), //go:build debug.
- Avoid runtime runtime.GOOS checks where a build tag would do-the dead-code path costs binary size.
A.1.3 Cross-compilation¶
GOOS=linux GOARCH=amd64 go build -trimpath -o bin/svc-linux-amd64 ./cmd/svc
GOOS=linux GOARCH=arm64 go build -trimpath -o bin/svc-linux-arm64 ./cmd/svc
GOOS=darwin GOARCH=arm64 go build -trimpath -o bin/svc-darwin-arm64 ./cmd/svc
GOOS=windows GOARCH=amd64 go build -trimpath -o bin/svc-windows-amd64.exe ./cmd/svc
zig cc via CGO_ENABLED=1 CC="zig cc -target aarch64-linux-musl" for the simplest setup.
A.1.4 Static linking¶
CGO_ENABLED=0produces a fully static binary on Linux. The default for containerized Go services unless you specifically need CGO (sqlite, libsystemd, etc.).- For services that must CGO: link against musl via Alpine or use
gcc -staticcarefully; glibc-static-linking is fragile.
A.1.5 Reproducible builds¶
- Pin toolchain via
go.modtoolchain go1.22.X. -
- trimpath`.
- Avoid
time.Now()ininit()or `build.go - equivalent. - Build inside a deterministic image: a pinned Alpine,
golang:1.22.X-alpine@sha256:...with content hash. - Confirm reproducibility:
sha256sum bin/svcshould match across machines and builds of the same commit.
A.1.6 goreleaser¶
- The de-facto Go release tool. One config file produces: cross-compiled binaries,
tar.gz/ziparchives, Homebrew tap, Linux packages (deb/rpm), Docker images, GitHub Releases, SBOM, signatures. - Replaces ~500 lines of
Makefile+CI-script glue. Adopt early.
A.2 Linting and Static Analysis¶
A.2.1 golangci-lint baseline configuration¶
A reasonable starting .golangci.yml:
run:
timeout: 5m
go: "1.22"
linters:
disable-all: true
enable:
- errcheck
- govet
- staticcheck
- gosimple
- ineffassign
- unused
- revive
- gocritic
- gosec
- bodyclose
- rowserrcheck
- sqlclosecheck
- nilerr
- prealloc
- unconvert
- unparam
- misspell
- depguard
- contextcheck
- errorlint
- exhaustive
- forbidigo
- goerr113
- testifylint
- tparallel
- thelper
- paralleltest
- fieldalignment
- copyloopvar
- intrange
linters-settings:
errcheck:
check-blank: true
govet:
enable-all: true
depguard:
rules:
domain-purity:
list-mode: lax
files: ["**/internal/domain/**"]
deny:
- pkg: net/http
desc: domain must not import HTTP
- pkg: database/sql
desc: domain must not import SQL
issues:
max-issues-per-linter: 0
max-same-issues: 0
A.2.2 The race detector is non-negotiable¶
- ~5–10× slowdown, ~5–10× memory. - Catches data races by adding a happens-before tracking layer. - Never commit code that has not been tested under - race`.A.2.3 go vet¶
- Subset of
golangci-lint(which runsvetinternally), but the standalone command is fast and honest. - Critical analyzers:
printf,lostcancel,copylocks,loopclosure,nilness,shadow,unsafeptr.
A.2.4 staticcheck¶
- The most rigorous Go linter. Maintained separately from
go vet. Documented atstaticcheck.io. - High-value codes:
SA1015(time.Tickleak),SA1029(context.WithValuecollisions),SA4006(unused write),SA6002(sync.Poolnon-pointer).
A.3 Profiling and Tracing¶
A.3.1 pprof endpoints-production setup¶
import _ "net/http/pprof"
// ...
go func() {
log.Fatal(http.ListenAndServe("127.0.0.1:6060", nil)) // admin port, never public
}()
kubectl port-forward for ad-hoc access.
A.3.2 The pprof commands you will run weekly¶
go tool pprof -http=:0 http://host:6060/debug/pprof/profile?seconds=30 # CPU
go tool pprof -http=:0 http://host:6060/debug/pprof/heap # heap (inuse)
go tool pprof -http=:0 -alloc_objects http://host:6060/debug/pprof/heap # allocations
go tool pprof -http=:0 http://host:6060/debug/pprof/goroutine # goroutines
go tool pprof -http=:0 http://host:6060/debug/pprof/block # block (after SetBlockProfileRate)
go tool pprof -http=:0 http://host:6060/debug/pprof/mutex # mutex contention
A.3.3 runtime/trace¶
- View with go tool trace trace.out.
- Use when pprof doesn't explain a latency stall-trace shows the exact timeline of every G across every P.
A.3.4 PGO (Profile-Guided Optimization)¶
- Run a representative load against your service.
- Capture:
curl -o cpu.pprof http://host:6060/debug/pprof/profile?seconds=60. - Place at
default.pgoin the package containingmain. - Rebuild:
go build -pgo=auto. - Expect ~5–15% throughput win on hot paths. Combine with PGO-update cadence in your release flow.
A.4 Observability Standards¶
A.4.1 Logging¶
- Use
log/slog(stdlib, since 1.21). - JSON handler in production; Text handler locally.
- Per-request scoped logger via
context.Context. - Levels:
Debug(off in prod),Info,Warn(something to watch),Error(a human should look). NeverPanicorFatalfor recoverable errors. - Sensitive-attribute redaction at the handler.
A.4.2 Metrics¶
- Use
prometheus/client_golangwithcollectors.NewGoCollector(collectors.WithGoCollections(...))for Go runtime metrics fromruntime/metrics. - The four golden signals: latency (histogram), traffic (counter), errors (counter), saturation (gauge).
- Never unbounded labels.
A.4.3 Traces¶
- OpenTelemetry SDK + OTLP gRPC exporter.
- Auto-instrument with
otelhttp,otelgrpc,otelsql. - Sampling: head-based (e.g., 1%) for high-QPS services; tail-based (via collector) for systems where rare errors matter most.
A.4.4 The "useful errors" hardening pass¶
- Wrap with
fmt.Errorf("doing X: %w", err)at every layer, preserving%wforerrors.Is/errors.As. - Sentinel errors at domain boundaries:
var ErrNotFound = errors.New("not found"). - Structured errors only when you need typed fields:
type ValidationError struct{ Field, Reason string }withError()method. - Never
panicfor recoverable conditions. Reserve panic for "the program's invariants are violated" (e.g., a nil pointer that should never be nil).
A.5 Memory Tuning¶
A.5.1 The two knobs¶
- `GOGC - heap-growth ratio. Default 100 (next-GC = 2× live). Lower = more frequent GC = less memory; higher = less frequent = more throughput, more memory.
- `GOMEMLIMIT - soft memory ceiling. Default off. Set this in containers to ~90% of cgroup memory.
A.5.2 The setup pattern¶
import _ "go.uber.org/automaxprocs" // honor cgroup CPU
import "runtime/debug"
func init() {
if v := os.Getenv("MEMORY_LIMIT_BYTES"); v != "" {
if n, err := strconv.ParseInt(v, 10, 64); err == nil {
debug.SetMemoryLimit(n)
}
}
}
A.5.3 automaxprocs¶
- Uber's small library that sets
GOMAXPROCSbased on cgroup CPU quota. Without it, a container limited to 0.5 CPUs still sees the host's full CPU count and spawns too many P's. - Adopt by default in all containerized services.
A.6 The Hardening Template¶
By week 24, the hardening/ template should contain:
hardening/
.golangci.yml
.goreleaser.yaml
Dockerfile # multi-stage, scratch or distroless final
Makefile # fmt, vet, lint, test, race, bench, profile
cmd/svc/main.go # idiomatic composition root
internal/
platform/
observability/ # slog + prom + otel + pprof wiring
memlimit/ # GOMEMLIMIT from env
shutdown/ # graceful shutdown helper
ci/
test.yml # fmt + vet + lint + test -race
bench.yml # benchstat against baseline
fuzz.yml # nightly fuzz
release.yml # goreleaser on tag
RELEASE_CHECKLIST.md
RUNBOOK.md
SECURITY.md
THREAT_MODEL.md
This is the artifact that should accompany every Go service you ship after week 24.
Appendix B-Build-From-Scratch Data Structures and Patterns¶
A working Go engineer should have implemented each of the following at least once, with - race - clean tests, allocation benchmarks, and goroutine-leak verification (where concurrent). This appendix sketches the minimal-viable design for each.
B.1 Lock-Free SPSC Ring Buffer¶
When: real-time control loops, log shippers, audio paths, any 1-producer-1-consumer with strict latency budgets.
Design:
- [Cap]T backing array (Cap a power of two for cheap modulo).
- head and tail are atomic.Uint64, each on its own cache line ([7]uint64 padding).
- Producer: load tail (relaxed), check space against head, write slot, store tail with release semantics (in Go: any atomic store).
- Consumer: symmetric.
- Wait-free per side; needs no CAS, only atomic loads/stores.
Lab outcomes: cache-line awareness, the Go memory model in practice, why chan T is not always the answer.
B.2 Lock-Free MPMC Bounded Queue¶
When: work-stealing schedulers, bounded work pools where contention is significant.
Design:
- Slot array; each slot is atomic.Uint64 encoding (seq, occupied flag).
- Producer CAS the seq from "empty at lap N" to "occupied at lap N".
- Consumer CAS from "occupied at lap N" to "empty at lap N+1".
- Avoids the ABA problem; same algorithm as Vyukov's bounded MPMC queue.
Lab outcomes: encoding state in atomics, lock-free without epoch reclamation, why the modular-arithmetic seq counter is correct.
B.3 Sharded Map¶
When: high-concurrency read+write workloads. Always consider this before reaching for sync.Map.
Design:
- N shards (typically 16–64), each struct{ sync.RWMutex; m map[K]V }.
- Hash key, mod N, lock the shard, perform the op.
- Iteration: lock shards in order, snapshot or hold each.
Lab outcomes: sync.Map vs sharded vs RWMutex+map comparison; xxhash for non-string keys; the cost of interface{} boxing at the API boundary (use generics).
B.4 LRU Cache¶
When: API caches, decoded-value caches, anything where hot data fits and cold should evict.
Design: - Doubly-linked list of entries; map from key to list-element. - On hit: move to front. On miss + full: evict back; allocate new at front. - Concurrent variant: shard by key, per-shard mutex + per-shard list.
Lab outcomes: pointer hygiene in linked structures; the standard container/list is fine but allocates an extra struct per entry-for hot caches, an inlined doubly-linked list of the entries themselves saves ~30% memory.
B.5 Bloom Filter¶
When: "definitely-not-in-set" pre-checks in front of expensive lookups (DB, network).
Design:
- Bit array of size m, k hash functions.
- Add: set k bits.
- Contains: all k bits set ⇒ probably in set; any unset ⇒ definitely not.
- Tune m and k for target false-positive rate.
Lab outcomes: hash mixing (hash/maphash is the right primitive in modern Go), the math of false positives, when not to use one (size of input, write-amplification).
B.6 Concurrent Skiplist¶
When: ordered concurrent maps; the foundation of, e.g., RocksDB-style memtables.
Design: - Tower of forward pointers per node, height geometrically distributed. - Insert: build new node bottom-up, link levels via CAS. - Removal: logical (mark deleted) then physical (unlink).
Lab outcomes: lock-free with non-trivial structure, randomization in algorithm design, the alternative to balanced trees in concurrent settings.
B.7 Worker Pool With Backpressure¶
When: every production service.
Design:
- N workers consuming from a buffered task channel.
- Submission: non-blocking try-send with overflow → drop-with-metric, or blocking send for back-pressure.
- Per-task timeout: derived context.
- Panic isolation: each worker's task call is wrapped in defer recover().
- Graceful shutdown: ctx.Done() triggers drain with a deadline.
Lab outcomes: this is the worker pool from week 12, refined.
B.8 Rate Limiter¶
When: ingress protection, downstream-call throttling, fair multi-tenant scheduling.
Design:
- Token bucket (golang.org/x/time/rate): refill at fixed rate up to a burst capacity.
- Leaky bucket: same shape, different metaphor.
- For per-key limiting: an LRU-bounded map of token-buckets.
Lab outcomes: study x/time/rate source-it is small and elegant; understand the difference between Allow() (immediate decision), Wait(ctx) (block until token), Reserve() (return a reservation that can be cancelled).
B.9 Circuit Breaker¶
When: any client of an external service.
Design:
- Three states: Closed (normal), Open (fail fast), HalfOpen (probe).
- Counts failures; on threshold → Open.
- After cooldown → HalfOpen; one probe → Closed on success, Open on failure.
- sony/gobreaker is a battle-tested reference; read its source then build your own.
Lab outcomes: state machines without globals; per-endpoint instances; the metric exports that operations actually wants (state changes, failure rate).
B.10 Singleflight¶
When: cache stampede mitigation, deduplicating concurrent identical requests.
Design:
- A map[key]*call where call holds the result/error and a sync.WaitGroup.
- First caller for a key creates the call, runs the work, signals completion.
- Concurrent callers for the same key wait on the WaitGroup, share the result.
Lab outcomes: study golang.org/x/sync/singleflight; understand why the result-shape is (value, err, shared) (the shared flag matters for caches).
B.11 Lock-Free Counter / Histogram¶
When: high-frequency metrics where contention on a single atomic kills throughput.
Design:
- N per-CPU (or per-P) counters; aggregate on read.
- Use runtime.GOMAXPROCS + manual sharding, or sync/atomic with cache-line padding per shard.
- Prometheus's prometheus.NewCounter is single-atomic and is fine for most uses; only build this when profiling shows contention.
Lab outcomes: why atomic.AddInt64 becomes the bottleneck at >10M/s/core; the cost of false sharing in metrics implementations.
Difficulty Ranking¶
| Tier | Structures |
|---|---|
| Warmup | Worker pool, LRU, Sharded map |
| Intermediate | Rate limiter, Circuit breaker, Bloom filter, Singleflight |
| Advanced | SPSC ring, MPMC queue, Lock-free counter |
| Expert | Concurrent skiplist |
Pick at least one from each tier. Ship with - racetests, allocation benchmarks, andgoleak`.
Appendix C-Contributing to Go: A Playbook¶
Most engineers never contribute to a language. The barrier is procedural ("how does Gerrit work?") more than technical. This appendix is the on-ramp.
C.1 Mental Model¶
The Go project ("golang/go") is a single repository containing the compiler, the runtime, the standard library, the linker, and most of the toolchain. It is mirrored to GitHub for visibility, but the primary development happens at go-review.googlesource.com using Gerrit (not GitHub PRs).
Three implications: 1. You file changes as CLs (changelists) in Gerrit, not PRs. 2. Reviewers leave inline comments and a numeric vote (-2..+2). +2 from a maintainer is required to merge. 3. The maintainer set is small and prioritizes correctness over speed. A two-week review cycle is normal; a six-month cycle is not unheard of.
C.2 The Pipeline, in 30 Seconds¶
Source (.go)
│ Lexer ── tokens
│ Parser ── AST (cmd/compile/internal/syntax)
│ Type checker ── typed AST (cmd/compile/internal/types2)
│ IR construction (cmd/compile/internal/ir, ssagen)
│ SSA (cmd/compile/internal/ssa)
│ SSA optimizations (rules/*.rules)
│ Lowering to architecture-specific SSA
│ Code generation
│ Object file (.o)
│
│ Linker (cmd/link)-combines objects + runtime
▼
Executable
Then at runtime, the executable starts in runtime/asm_*.s → runtime.rt0_go → scheduler bootstrap → runtime.main → user main.
C.3 Setting Up¶
git clone https://go.googlesource.com/go
cd go/src
./make.bash # builds the toolchain (~3-5 min)
./run.bash # full test suite (~20 min)
The new toolchain is at ../bin/go. Use it for testing your changes:
For Gerrit, install git-codereview:
go install golang.org/x/review/git-codereview@latest
git config alias.change "codereview change"
git config alias.mail "codereview mail"
git config alias.sync "codereview sync"
Sign the CLA at cla.developers.google.com (individual or corporate). Without it, no CL can merge.
C.4 Where the Easy Wins Are¶
In rough order of difficulty:
C.4.1 Documentation fixes¶
- Typos, unclear sentences, missing examples in stdlib godoc. Search the issue tracker for
Documentationlabel. - Touch the
.gofile's doc comment, send a CL. ~10 lines, ~1-week review.
C.4.2 Stdlib bug fixes¶
- Look for
NeedsFix+help wantedlabels. Thetime,net/http,encoding/*,database/sqlpackages have a steady flow of small bugs. - Reproduce, write a minimal failing test, fix, send.
C.4.3 New stdlib examples¶
- Many functions lack
ExampleXtestable examples. These render in godoc and are doubly useful as tests. - Trivially good first contribution.
C.4.4 New go vet analyzers¶
cmd/vetis small and well-organized. Adding a new analyzer for a recurring bug pattern (with rationale) is a tractable medium contribution.
C.4.5 Compiler diagnostics¶
- "Why is this error message confusing?" issues are tagged. Improving
cmd/compile/internal/types2's error wording is high-impact and well-bounded.
C.4.6 Compiler optimizations¶
- SSA peephole rules in
cmd/compile/internal/ssa/_gen/*.rules. Each rule is a pattern → replacement transformation, tested via assembly tests. - Higher bar but still single-CL-sized contributions exist.
C.4.7 Don't start here (yet)¶
- The runtime scheduler proper (
runtime/proc.go). - The garbage collector (
runtime/mgc.go). - The linker.
- Anything involving language semantics (proposals →
go/proposalrepo, separate process, ~year-scale).
C.5 The First-CL Workflow¶
- Find an issue. Comment to claim it (no
@botmechanism like rust-bot; just be polite and indicate you're working on it). - Branch from master:
- Make changes. Run:
- Commit with a Go-style commit message:
First line:
net/http: fix Server.Shutdown deadlock when called from request handler Previously, calling Server.Shutdown from within an active request handler would deadlock because [...]. Fixes #12345package: short description. Body: rationale and any non-obvious context. Trailer:Fixes #Nif applicable. - Mail the CL: This pushes to Gerrit and creates a review thread.
- Address review. Reviewers comment inline. Each round: amend the existing commit (do NOT create new commits-
git change --amendis the workflow),git mailagain. The CL gets new patch sets. - +2 from a maintainer + +1 from the trybots → merged. Your name is in the commit log.
C.6 The Go Source Reading Map¶
When the compiler/runtime is opaque, these are the highest-yield reads:
| File | What it teaches |
|---|---|
runtime/runtime2.go |
The data model: g, m, p, schedt, hchan, mutex. Reference. |
runtime/proc.go |
The scheduler. schedule, findrunnable, newproc, gopark. |
runtime/mgc.go |
GC entry points and pacing. |
runtime/mbarrier.go |
The write barrier. |
runtime/malloc.go |
Allocator (mcache → mcentral → mheap). |
runtime/chan.go |
Channel internals. |
runtime/iface.go |
Interface dispatch and itab cache. |
runtime/select.go |
select semantics (subtle; read slowly). |
runtime/preempt.go |
Async preemption mechanism. |
runtime/netpoll.go |
epoll/kqueue/IOCP integration. |
cmd/compile/internal/escape/escape.go |
Escape analysis. |
cmd/compile/internal/ssa/ |
SSA IR, optimization passes. |
Read in order: runtime2.go → proc.go → chan.go → iface.go → mgc.go → malloc.go → escape analysis → SSA. Allow weeks, not days.
C.7 Adjacent Targets if golang/go Is Too Heavy¶
golangci-lint-Active, friendly, fast review. Add a linter, fix a false positive.staticcheck-Higher bar thangolangci-lint, but smaller surface and Dominik Honnef is a thoughtful reviewer.golang.org/x/*repos-x/tools,x/sync,x/exp. Same Gerrit workflow asgolang/go, but sometimes faster review.gopls-language server. High-impact contributions; AST/types fluency from week 14 directly applies.prometheus/client_golang,grpc/grpc-go,etcd-io/etcd-large Go projects with active maintainers and well-documented contribution flows. A merged PR signals real-world Go fluency.
C.8 Calibration¶
A reasonable goal for a curriculum graduate:
- By end of week 23: a CL open against
golang/go(a stdlib doc fix or small bug fix) or a PR against a well-known Go project. - By end of capstone: that CL/PR merged.
- 6 months post-curriculum: a non-trivial CL-a stdlib API addition, a
go vetanalyzer, a compiler diagnostic improvement.
These are realistic timelines. The maintainers prioritize stability. Do not be discouraged by a four-week review cycle; that is healthy.
Capstone Projects-Three Tracks, One Choice¶
The Month 6 capstone is the deliverable that converts this curriculum from study into evidence. Pick one track. The work performed here is what you describe in interviews and link from a portfolio.
Track 1-Distributed Storage: A Raft-Replicated KV Store¶
Outcome: a 3+ node Raft-replicated key-value store with linearizable reads, snapshots, online membership changes, and a `jepsen - style fault-injection harness verifying linearizability.
Functional spec¶
- gRPC API:
Get(key),Put(key, value),Delete(key),Watch(prefix) stream. - Cluster API:
AddNode,RemoveNode,Leadership. - Linearizable reads via read-index.
- Snapshots every N entries (default 10K) with
InstallSnapshotto recovering followers. - Persistent WAL via Pebble or BoltDB.
- TLS between nodes; mutual auth via x509.
Non-functional spec¶
- Sustained 50K writes/sec on commodity hardware (3-node, NVMe).
- Sub-10 ms write latency p99 under 50% utilization.
- Recovery time (leader change → fully available) under 1 s for a 3-node cluster.
- Survives a single-node crash without data loss; survives a network partition with a clear majority.
Architecture sketch¶
- One goroutine per node consumes from
etcd-io/raftReadychannel. - Apply loop: stream committed entries → state machine → respond to clients.
- Network: gRPC with a long-lived bidi stream per peer pair.
- State machine: a sharded
map[string][]bytewith versioning forWatch.
Test rigor¶
- Unit: state-machine transitions, log-truncation invariants.
- Integration: 3-node local cluster via
t.Run, exercise membership. - Fault injection: a "nemesis" goroutine that randomly partitions, pauses, crashes nodes; client offer history fed to a linearizability checker (Knossos in Clojure-via-process, or a lightweight Go port).
- Race-clean under sustained load.
Hardening pass¶
goreleaser,cosignsigning, SBOM viacyclonedx-gomod.GOMEMLIMITfrom cgroup;automaxprocs.- PGO with a representative workload.
pprof+runtime/tracecapture endpoints.- OTel traces across the Raft RPC layer (custom interceptor).
- A
RUNBOOK.mdcovering: leader-stuck triage, log-corruption recovery, snapshot-restore procedure.
Acceptance criteria¶
- Public repo with all of the above.
- A README that includes: a topology diagram, a load-test latency CDF, a Jepsen-style report.
- Defensible answer to: "What happens during a network partition where a majority can elect a new leader but the old leader is still up?"
Skills exercised¶
- Months 3 (concurrency), 5 (gRPC, observability), 6.21–6.22 (Raft, distributed storage).
Track 2-Service Mesh: A gRPC Microservices Mesh¶
Outcome: a multi-service mesh demonstrating a custom service registry, health checking, deadline propagation, retries, outlier ejection, and end-to-end OTel tracing across at least four interconnected services.
Functional spec¶
- A Registry service: gRPC interface for
Register,Deregister,Watch,LookupHealthy. Backed by an in-memory store with optional Raft replication (composes with Track 1). - A Sidecar library that:
- Resolves service names via the registry (custom gRPC
resolver.Builder). - Implements client-side load balancing with round-robin + outlier ejection.
- Propagates OTel context, deadlines, and a
request_id. - Adds retry policy via service config.
- Four demo services (e.g.,
user,order,inventory,payment) with a fan-out call graph that exercises retries, timeouts, and partial failures. - A
mesh-clifor service inspection and chaos injection.
Non-functional spec¶
- Sub-millisecond p99 sidecar overhead per RPC.
- Outlier ejection within 10 s of an endpoint going bad.
- Deadline propagation: an inbound 1 s deadline must result in downstream calls seeing strictly less than 1 s remaining.
Architecture sketch¶
- Each service runs the sidecar library in-process (no separate sidecar binary-keep it simple, defensible).
- Registry uses
etcd-io/raftif Track 1 also chosen; otherwise a single-instance with TLS. - Service discovery uses long-poll
Watchvia gRPC server-streaming.
Test rigor¶
- Unit: resolver, balancer, interceptor stacks.
- Integration: spin all four services in-process, exercise the call graph with
testcontainersfor the registry's Postgres if used. - Chaos: a
chaos-injectormiddleware that drops/delays/errors random %. - Latency tests with
ghzat multiple QPS levels.
Hardening pass¶
pprofeverywhere; OTel everywhere.goleakper-service.- A reproducible Docker Compose stack and a one-command
make demothat brings it up with Jaeger and Prometheus. - Alarms wired: Prometheus rules on per-service error rate, p99 latency, registry watch lag.
Acceptance criteria¶
- All four services deployable with
make demo. - A flame graph demonstrating where sidecar overhead lives.
- A trace screenshot showing deadline-propagated failure across the call chain.
- Defensible answer to: "What happens if the registry leader is down for 30 seconds?"
Skills exercised¶
- Months 3 (concurrency), 5 (gRPC mastery, observability), 6 (capstone defense, performance).
Track 3-Streaming Pipeline: A Kafka-Compatible Ingestion + Stream Processor¶
Outcome: a Kafka-protocol-compatible (subset) broker plus a stream-processing framework, with at-least-once delivery, exactly-once-effective consumer offsets, and replay.
Functional spec¶
- Broker: implements a subset of the Kafka wire protocol (Produce, Fetch, Metadata, ListOffsets, OffsetCommit, OffsetFetch). Disk-backed log per partition; segment + index files.
- Stream processor: a small framework letting users write
func(input Stream[T]) Stream[U]with operators (Map,Filter,Window,Aggregate,Join). - Consumer: offset management, rebalance protocol (subset).
- Producer: idempotent producer (within session).
- Compatibility: works with
franz-go(the leading Kafka Go client) for at least Produce/Fetch.
Non-functional spec¶
- 200K msgs/sec sustained on a single partition (commodity NVMe).
- Sub-50 ms producer ack p99 with
acks=all. - Replay from arbitrary offset.
- Crash-recoverable: WAL fsync semantics documented.
Architecture sketch¶
- One goroutine per partition for the disk-write path.
- mmap'd index files; sequential append to log files.
- Replication: Raft per partition (composes with Track 1) or a simpler primary-backup with a documented data-loss window.
Test rigor¶
- Unit: log segment boundary handling, offset arithmetic, index lookup.
- Integration: produce-and-consume tests against
franz-go. - Fuzz: protocol parser fuzzed against malformed records.
- Crash test: kill -9 during write; restart; verify WAL recovery.
Hardening pass¶
pproffor the hot path (the produce-write loop must be0 allocs/opper record).- PGO with a sustained-throughput profile.
runtime/traceartifact showing zero scheduler stalls under load.
Acceptance criteria¶
- Public repo, a reference-grade README.
- A throughput/latency benchmark vs. real Kafka on the same hardware.
- A replay demo showing rewinding consumer offset to a specific timestamp.
Skills exercised¶
- Months 2 (memory + GC tuning, allocation discipline), 3 (concurrency at 200K msgs/sec), 5 (observability), 6.22 (storage patterns).
Cross-Track Requirements¶
Regardless of track:
- Hardening template integrated. The
hardening/template from Appendix A applies. - Architectural Decision Records (ADRs). At least three for the capstone, each ~1 page.
- Threat model. One page minimum, no matter the track.
- Defense readiness. You should be able to walk a reviewer through the code in 45 minutes and answer "what fails first under load / fuzzing / a malicious input / a network partition?"
The track choice signals career direction: Track 1 for distributed-systems infrastructure roles, Track 2 for platform/SRE/networking roles, Track 3 for data-infra/streaming roles. Pick based on where you want the next interview loop, not on what looks easiest.
Worked example - Week 6: reading a GODEBUG=gctrace=1 output¶
Companion to Go Mastery → Month 02 → Week 6: The Garbage Collector. The week explains the tricolor concurrent mark-sweep algorithm. This page walks one real gctrace=1 line from a running program so the next time you see it in production logs, every field has meaning.
The program¶
// allocator.go
package main
import (
"fmt"
"time"
)
func main() {
var sink []*[1024]byte
for i := 0; i < 1_000_000; i++ {
b := new([1024]byte)
b[0] = byte(i)
sink = append(sink, b)
if i%50_000 == 0 {
time.Sleep(10 * time.Millisecond) // give GC room to breathe
}
}
fmt.Println("allocated", len(sink), "buffers")
}
A small, deliberately-allocating program. Each iteration allocates a 1 KB array and keeps a pointer to it. We expect the heap to grow steadily and GC cycles to happen periodically.
Running it with gctrace¶
$ GOGC=100 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -20
gc 1 @0.003s 0%: 0.012+0.31+0.020 ms clock, 0.10+0.054/0.30/0.072+0.16 ms cpu, 4->4->2 MB, 5 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 2 @0.012s 0%: 0.011+0.46+0.030 ms clock, 0.094+0.21/0.45/0.10+0.24 ms cpu, 4->5->3 MB, 5 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 3 @0.024s 0%: 0.014+0.66+0.027 ms clock, 0.11+0.23/0.66/0.18+0.22 ms cpu, 6->7->5 MB, 7 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 4 @0.046s 1%: 0.015+1.4+0.029 ms clock, 0.12+0.46/1.4/0.18+0.23 ms cpu, 10->12->9 MB, 11 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 5 @0.094s 1%: 0.014+2.4+0.030 ms clock, 0.11+0.49/2.4/0.42+0.24 ms cpu, 18->20->15 MB, 19 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 6 @0.187s 1%: 0.013+3.6+0.025 ms clock, 0.10+0.84/3.6/0.93+0.20 ms cpu, 30->33->24 MB, 31 MB goal, 0 MB stacks, 0 MB globals, 8 P
Take one line - gc 4 - and decode every field.
The fields, in order¶
gc 4 @0.046s 1%: 0.015+1.4+0.029 ms clock, 0.12+0.46/1.4/0.18+0.23 ms cpu, 10->12->9 MB, 11 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 4- the 4th GC cycle since program start.@0.046s- this cycle began 46 ms after program start.1%- the program has spent 1% of total wall-clock time in GC. (Sum across all GC cycles so far.)0.015+1.4+0.029 ms clock- wall-clock duration of the three GC phases:0.015 ms- stop-the-world sweep termination. The runtime pauses all goroutines briefly to finish any leftover sweeping from the previous cycle.1.4 ms- concurrent mark + scan. This is the bulk of the work. Goroutines keep running while the GC marks reachable objects.0.029 ms- stop-the-world mark termination. A second brief pause to finalize the mark.
The two pauses (0.015 + 0.029 = ~44 µs total) are what your latency-sensitive code feels. The 1.4 ms middle is concurrent and doesn't block.
-
0.12+0.46/1.4/0.18+0.23 ms cpu- total CPU time across all P (processors). The layout mirrors wall-clock but split by phase, with the middle phase broken into three (assist+background+idle CPU time). -
10->12->9 MB- heap size:10 MB- heap size when GC started.12 MB- heap size after marking (peak - concurrent work added objects).9 MB- heap size after sweep finishes (live heap retained).
So this cycle reclaimed ~3 MB.
-
11 MB goal- the heap-size trigger for next GC. Set byGOGC=100to mean "trigger next GC when heap grows ~100% beyond the just-marked live set."9 MB live × (1 + 100/100) = 18 MBwould be the naive goal, but the runtime adjusts. Actual next-trigger may differ. -
0 MB stacks- total goroutine stack memory. We have only one goroutine here. -
0 MB globals- package-level data. Our program has almost none. -
8 P- 8 processors (P) participating. MatchesGOMAXPROCS=8.
What changed across cycles¶
Watch the 4->5->3, 6->7->5, 10->12->9, 18->20->15, 30->33->24 progression. The live heap (third number) is growing as our sink slice retains more pointers. The trigger heap (first number, on next cycle) tracks: each cycle starts roughly 2× the previous live size, because GOGC=100 means "200% headroom."
The middle pause-free phase grew too: 0.31 → 0.46 → 0.66 → 1.4 → 2.4 → 3.6 ms. That's expected; marking takes time proportional to the live object graph. Stop-the-world phases stayed flat (~15 µs + ~30 µs) regardless of heap size - that's the whole point of Go's concurrent GC.
What this tells you¶
- Go's GC pauses are measured in tens of microseconds. The "GC pause" most people complain about in older runtimes does not apply here.
- "GC took 3.6 ms" sounds bad, but 99.9% of that was concurrent. Your goroutines kept running.
- Heap headroom is the lever.
GOGC=200doubles the headroom (less frequent, bigger cycles);GOGC=50halves it (more frequent, smaller).GOMEMLIMIT(Go 1.19+) caps absolute heap regardless of GOGC.
Try tuning¶
Re-run with different settings:
$ GOGC=50 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -10
# More GC cycles, smaller peaks, slightly more CPU on GC.
$ GOGC=200 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -10
# Fewer GC cycles, larger peaks, lower GC CPU but more memory.
$ GOMEMLIMIT=20MiB GOGC=100 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -20
# Soft cap: when heap approaches 20 MiB, GC runs more aggressively
# to stay under the limit even if GOGC would let it grow.
The trap¶
Reading gctrace and concluding "we should tune GOGC in production." Usually no. In 95% of cases, the right answer is:
1. Use the default GOGC=100.
2. Set GOMEMLIMIT to ~80% of your container's memory limit so the GC starts pushing back before OOM.
3. Use runtime/pprof heap profiles to find allocation hotspots and fix the code, not the GC.
Tuning GOGC is a last resort and almost always trades latency for throughput (or vice versa) - not a free win.
Exercise¶
- Run the program above. Identify the GC cycle in which the heap first crossed 100 MB.
- Add a
sync.Poolfor the byte arrays. Re-run with the sameGOGC=100. How many GC cycles happen now? How does the heap profile change? - Run with
-toolexec='go tool trace'and view the resulting trace ingo tool traceUI. Find a GC cycle. See the assists, the background sweepers, the STW phases.
Related reading¶
- The main Week 6 chapter covers the tricolor algorithm, write barriers, and GC pacing.
- The Performance methodology cross-topic page explains how to think about GC in the context of overall latency budgets.
- Glossary: Tricolor marking, Write barrier, STW (stop-the-world), GOGC in the main glossary.