Skip to content

Week 23 - Performance Tuning: Profile, Tune, Re-Profile

23.1 Conceptual Core

  • The discipline: measure, then change. Never optimize from intuition. The profilers-pprof, trace, `runtime/metrics - are the source of truth.
  • A working flow: capture a baseline profile, propose a hypothesis, change one thing, re-profile, accept or reject. Commit each accepted change with the before/after profile linked.

23.2 Mechanical Detail-The Tuning Toolkit

  • CPU profile: pprof http://host/debug/pprof/profile?seconds=60. Top-heavy stacks identify the hot path. Look for runtime.mallocgc, runtime.scanobject, `runtime.gcDrain - these mean GC is the bottleneck.
  • Heap profile: identify retention. pprof -inuse_space shows what's live; - alloc_objects` shows churn.
  • Block profile: runtime.SetBlockProfileRate(1) then pprof /debug/pprof/block. Identifies channel/syscall waits.
  • Mutex profile: runtime.SetMutexProfileFraction(1) then pprof /debug/pprof/mutex. Identifies contended locks.
  • Goroutine profile: stack distribution. Sudden growth = leak.
  • Execution trace: go tool trace. The expensive but most informative tool. Identifies scheduler latencies, GC pauses, and goroutine-state transitions.
  • PGO (Profile-Guided Optimization): stable since Go 1.21. Capture a representative CPU profile in production, place it as default.pgo, rebuild. ~5–10% throughput win on hot paths.
  • benchstat: compare two go test -bench runs statistically. Reports geomean and significance.

23.3 Mechanical Detail-Common Wins

  • Replace interface{} boxing in hot paths with concrete types or generics.
  • Reuse allocations via sync.Pool (with the discipline from Week 8).
  • Pre-size slices and maps when capacity is known: make([]T, 0, n), make(map[K]V, n).
  • Avoid defer in hot loops (Go 1.14+ made defer ~zero-cost in most cases, but the loop variant still has overhead).
  • strings.Builder over += for building strings.
  • Slice instead of map for small collections (<~50 entries)-linear scan beats hash on modern caches.
  • Goroutine cost: launching is cheap, but the aggregate of millions of goroutines on a long-tail-blocked path is not. Bound concurrency.

23.4 Lab-"Profile-Tune-Profile"

Take your capstone (whatever track) and: 1. Capture a CPU profile under representative load. Identify the top 5 functions. 2. Pick one and propose a fix. Estimate the win in advance. 3. Implement, re-profile, compare with benchstat. Document each change in PERF_LOG.md. 4. Capture a runtime/trace and identify any GC or scheduler stalls. Fix one. 5. Apply PGO. Confirm the win.

23.5 Idiomatic & golangci-lint Drill

  • prealloc, gosimple S1024 (time.Sub instead of time.Now().Sub), gocritic: rangeValCopy. Final lint pass-your codebase should be near-zero findings.

23.6 Production Hardening Slice

  • Wire PGO into your release pipeline: a "canary" deploy collects a profile, the next "stable" build uses it. Document the procedure in RELEASE.md.

Comments