Week 6 - The Garbage Collector¶
6.1 Conceptual Core¶
- Go's GC is a concurrent, tricolor, mark-sweep, non-generational, non-compacting collector. Each adjective is doing work:
- Concurrent: marks happen while the application runs ("mutator").
- Tricolor: every object is white (unreached), gray (reached but children unscanned), or black (reached and scanned). The invariant: a black object never points to a white object.
- Mark-sweep: phase 1 marks reachable; phase 2 reclaims unmarked.
- Non-generational: no separate young/old heap. (The Pacer compensates.)
- Non-compacting: objects do not move. This is what allows direct pointer interior addressing and
unsafe.Pointerto remain valid. - Why these choices: Go optimizes for predictable low-pause behavior at the cost of throughput. A compacting collector would have lower steady-state heap, but compaction stops the world.
6.2 Mechanical Detail¶
- Phases (
runtime/mgc.go): - Sweep termination (STW, microseconds): finish previous cycle's sweep.
- Mark setup (STW, microseconds): enable write barrier, scan stacks (briefly STW each goroutine).
- Concurrent mark: workers and mutator-assist mark the heap. Write barrier intercepts pointer writes.
- Mark termination (STW, ~100 µs to ms on huge heaps): finalize.
- Concurrent sweep: lazily reclaim white objects as the next allocation requests space.
- Write barrier: the Dijkstra-style barrier records pointer writes during mark so the mutator cannot "hide" a white object behind a black one. Implemented as a runtime call inserted by the compiler around pointer stores. This is why pointer-heavy code is GC-expensive: every write costs a barrier.
- Mark assist: when a goroutine allocates, it is forced to do proportional GC work. This couples allocation rate to GC progress and is the mechanism that prevents heap blowup.
- The Pacer: targets
next_gc = live_after_last_gc * (1 + GOGC/100). DefaultGOGC=100: GC when heap doubles. Tunable viaGOGC=off,GOGC=50(more frequent, lower memory),GOGC=200(less frequent, higher memory). GOMEMLIMIT(since Go 1.19): a soft total-memory ceiling. The GC adjusts pacing to stay under the limit even ifGOGCwould not have triggered. Use it as your primary memory control in containers; leaveGOGCat default.- Stack scanning: each goroutine's stack is rooted in marking. Goroutines are paused (briefly) for stack scan; this is part of the STW-mark-setup phase but is per-goroutine and parallelized.
6.3 Lab-"GC Forensics"¶
- Write a service that allocates 100 MB/s of short-lived objects. Run with
GODEBUG=gctrace=1. Read each GC line and identify: total heap, live heap, pause time, pacer target. - Set
GOMEMLIMIT=512MiBandGOGC=off. Re-run; observe how the GC is now driven entirely by the memory ceiling. - Set
GOGC=50(noGOMEMLIMIT). Re-run; observe more frequent, smaller GCs. - Capture a
go tool pprof -alloc_objectsprofile. Identify the top five allocation sites. Refactor at least two usingsync.Poolor pre-allocated buffers. Re-benchmark. - Capture a
go tool traceand locate the GC mark phases visually.
6.4 Idiomatic & golangci-lint Drill¶
staticcheck SA6002(sync.Poolwith non-pointer types),prealloc,gocritic: rangeValCopy(large struct copies in range loops).
6.5 Production Hardening Slice¶
- In your service template, set
GOMEMLIMITfrom aMEMORY_LIMITenvironment variable computed at container start (debug.SetMemoryLimit(int64(0.9 * cgroup_memory_limit))). This is the single most impactful production tuning knob. - Export GC metrics:
go_gc_duration_seconds(histogram),go_memstats_*. Useprometheus/client_golang'scollectors.NewGoCollector(collectors.WithGoCollections(collectors.GoRuntimeMemStatsCollection | collectors.GoRuntimeMetricsCollection))for the modern collector.