Week 5 - Memory Layout, Padding, Alignment¶
5.1 Conceptual Core¶
- A Go value lives in exactly one of: a goroutine stack, the heap (managed by
mallocgc), the data segment (mutable globals), or the read-only data segment (string literals, constants). - Struct field ordering is significant for memory footprint: Go does not reorder fields. The compiler inserts padding to satisfy alignment requirements. Misordering can double the size of a hot struct.
- False sharing is the silent killer of concurrent Go: two unrelated atomic counters in the same cache line cause cores to evict each other's caches on every update. The fix is padding to 64 bytes (or 128 on Apple Silicon).
5.2 Mechanical Detail¶
unsafe.Sizeof,unsafe.Alignof,unsafe.Offsetof. Memorize the size of every primitive:bool1,int81,int162,int32/float32/rune4,int64/float64/int/uintptr8 (on 64-bit), pointer 8, slice header 24 (ptr+len+cap), string header 16 (ptr+len), interface header 16 (itab/type+data), map pointer 8, channel pointer 8.- Field reordering for size: sort fields by alignment, descending. Tools:
fieldalignment(a vet analyzer ingolang.org/x/tools/go/analysis/passes/fieldalignment). Wire it into CI. runtime/internal/sys.CacheLineSizeis 64 on most platforms. Use[7]uint64padding (or the helper ingolang.org/x/sys/cpu/ a customCacheLinePad) to isolate hot atomics.- Slice internals: a slice is a 24-byte header
{Data *T, Len int, Cap int}.s = append(s, x)may reallocate; the old backing array is GC'd if no other slice references it. The growth strategy: ~2× under 1024 elements, ~1.25× above. Readruntime/slice.go::growslice. - Map internals:
runtime/map.go. Hash-bucketed open addressing with overflow buckets. 8 entries per bucket. Iteration order is deliberately randomized. Maps are never safe for concurrent write+anything; usesync.Map(specialized read-mostly) or a sharded map for general concurrency.
5.3 Lab-"Layout Forensics"¶
- Define five "interestingly bad" structs (e.g.,
struct{ a bool; b int64; c bool; d int64; e bool }). Compute theirunsafe.Sizeofby hand, then verify. - Reorder for minimal padding. Re-measure. Document each delta.
- Build a benchmark with
[]Structof 1M elements; compare allocation/scan time with the badly-padded vs the optimally-packed version. Useruntime.ReadMemStatsto captureHeapAllocand GC pause durations. - Construct a false-sharing example: two atomic counters incremented by different goroutines, with and without
CacheLinePadbetween them. Benchmark contention. Expect 5–20× difference.
5.4 Idiomatic & golangci-lint Drill¶
fieldalignment(vet analyzer),unconvert,gocritic: builtinShadowDecl. Wirefieldalignmentinto CI as a hard fail.
5.5 Production Hardening Slice¶
- Add
runtime.ReadMemStatsinstrumentation to your service template. ExportHeapAlloc,HeapInuse,StackInuse,NumGC,PauseTotalNsas Prometheus metrics (orexpvar). This becomes the Month 5 observability baseline.