Week 8 - Allocation Profiling, sync.Pool, GC Tuning¶
8.1 Conceptual Core¶
- The cheapest allocation is the one you do not make. The second cheapest is the one you reuse.
sync.Poolis a per-P caches-of-objects mechanism. Items can be reclaimed by the GC at any time (typically at the start of each GC cycle), so it is a cache, not a resource pool. Use it for short-lived, frequently-allocated objects (bytes.Buffer,[]bytescratch space, parser nodes).- The two production-grade memory tuning knobs:
GOGC(heap growth ratio) andGOMEMLIMIT(absolute ceiling). For containerized services, pinGOMEMLIMITto ~90% of cgroup memory; leaveGOGCdefault unless profiles say otherwise.
8.2 Mechanical Detail¶
sync.Poolmechanics (src/sync/pool.go):Get()returns from the local-P cache, falls back to a victim cache (objects from the previous GC), falls back toNew().Put()stores into the local-P cache.- At GC, the local cache is moved to victim, victim is freed.
- Therefore: do not assume a
Pool.Getreturns recentlyPutdata. Always reset state onGet. - Common
sync.Poolmistake: putting non-pointer values. The pool storesinterface{}, so a non-pointer goes through boxing-net allocation. Always store pointers. bytes.Bufferreuse pattern:- Allocation profile interpretation:
pprof -alloc_objects(count) tells you "where churn happens"; - alloc_space(bytes) tells you "where pressure happens"; - inuse_spacetells you "what is currently retained." Use all three. runtime/metrics(since 1.16): the modern API for runtime metrics. Replaces ad-hocMemStatsreads. Returns histograms for/gc/pauses:seconds,/sched/latencies:seconds, etc.
8.3 Lab-"Pool the Hot Path"¶
- Take the JSON-handling hot path of any service. Run
pprof -alloc_objectsunder load. Identify the top three allocation sites. - Introduce a
sync.Poolfor the most appropriate one (typicallybytes.Bufferor a decoder). - Re-benchmark. The win should be visible in allocs/op and in p99 latency under load.
- Now intentionally misuse:
Pool.Putwithout resetting state. Detect the bug under - race` or via a deliberately-inserted assertion.
8.4 Idiomatic & golangci-lint Drill¶
staticcheck SA6002,gocritic: appendAssign,prealloc. Re-read Dave Cheney's "High Performance Go Workshop" notes (a classic standing reference).
8.5 Production Hardening Slice¶
- Add a
/debug/pprofHTTP endpoint behind an auth-or-build-tag gate (do not expose it on the public listener). Document the on-call runbook for capturing CPU/heap profiles from a misbehaving production process. - Add `runtime/metrics - based exporters for GC pause histograms and scheduler latencies. These are the signals an SRE wants when a Go service misbehaves.
Month 2 Capstone Deliverable¶
A memory-and-gc/ workspace:
1. layout-forensics (week 5)-with fieldalignment enforced in CI.
2. gc-forensics (week 6)-with annotated gctrace=1 logs and a tuning playbook.
3. iface-bench (week 7)-concrete vs interface vs generic, three-way benchmark.
4. pool-the-hot-path (week 8)-before/after profile diff, baseline benchmark in CI.
Workspace-level CI must add: fieldalignment analyzer, 0-alloc regression guard on critical benchmarks, pprof artifacts captured on demand from a make profile target.