Saltar a contenido

Week 5 - Memory Layout, Padding, Alignment

5.1 Conceptual Core

  • A Go value lives in exactly one of: a goroutine stack, the heap (managed by mallocgc), the data segment (mutable globals), or the read-only data segment (string literals, constants).
  • Struct field ordering is significant for memory footprint: Go does not reorder fields. The compiler inserts padding to satisfy alignment requirements. Misordering can double the size of a hot struct.
  • False sharing is the silent killer of concurrent Go: two unrelated atomic counters in the same cache line cause cores to evict each other's caches on every update. The fix is padding to 64 bytes (or 128 on Apple Silicon).

5.2 Mechanical Detail

  • unsafe.Sizeof, unsafe.Alignof, unsafe.Offsetof. Memorize the size of every primitive: bool 1, int8 1, int16 2, int32/float32/rune 4, int64/float64/int/uintptr 8 (on 64-bit), pointer 8, slice header 24 (ptr+len+cap), string header 16 (ptr+len), interface header 16 (itab/type+data), map pointer 8, channel pointer 8.
  • Field reordering for size: sort fields by alignment, descending. Tools: fieldalignment (a vet analyzer in golang.org/x/tools/go/analysis/passes/fieldalignment). Wire it into CI.
  • runtime/internal/sys.CacheLineSize is 64 on most platforms. Use [7]uint64 padding (or the helper in golang.org/x/sys/cpu / a custom CacheLinePad) to isolate hot atomics.
  • Slice internals: a slice is a 24-byte header {Data *T, Len int, Cap int}. s = append(s, x) may reallocate; the old backing array is GC'd if no other slice references it. The growth strategy: ~2× under 1024 elements, ~1.25× above. Read runtime/slice.go::growslice.
  • Map internals: runtime/map.go. Hash-bucketed open addressing with overflow buckets. 8 entries per bucket. Iteration order is deliberately randomized. Maps are never safe for concurrent write+anything; use sync.Map (specialized read-mostly) or a sharded map for general concurrency.

5.3 Lab-"Layout Forensics"

  1. Define five "interestingly bad" structs (e.g., struct{ a bool; b int64; c bool; d int64; e bool }). Compute their unsafe.Sizeof by hand, then verify.
  2. Reorder for minimal padding. Re-measure. Document each delta.
  3. Build a benchmark with []Struct of 1M elements; compare allocation/scan time with the badly-padded vs the optimally-packed version. Use runtime.ReadMemStats to capture HeapAlloc and GC pause durations.
  4. Construct a false-sharing example: two atomic counters incremented by different goroutines, with and without CacheLinePad between them. Benchmark contention. Expect 5–20× difference.

5.4 Idiomatic & golangci-lint Drill

  • fieldalignment (vet analyzer), unconvert, gocritic: builtinShadowDecl. Wire fieldalignment into CI as a hard fail.

5.5 Production Hardening Slice

  • Add runtime.ReadMemStats instrumentation to your service template. Export HeapAlloc, HeapInuse, StackInuse, NumGC, PauseTotalNs as Prometheus metrics (or expvar). This becomes the Month 5 observability baseline.

Comments