Skip to content

Week 12 - Worker Pools, Leak Detection, Deadlock Prevention

12.1 Conceptual Core

  • Worker pool is the canonical "bounded concurrency" pattern: N worker goroutines consuming from a shared task channel. Bounds CPU, memory, and downstream RPC concurrency simultaneously.
  • Goroutine leaks are Go's silent OOM. Most common shapes:
  • Goroutine blocked on a channel that is never closed and never sent to.
  • Goroutine blocked on <-ctx.Done() of a context that nobody cancels.
  • Goroutine holding a reference (closure capture) to a request object that is now done.
  • time.After in a select loop (allocates a timer per iteration; the timer leaks until expiry).
  • Deadlocks in Go are detected only by the runtime's "all goroutines asleep" check, which fires only when every goroutine is blocked. Most production deadlocks are partial: a subsystem deadlocks while the rest of the program runs. The race detector does not catch these.

12.2 Mechanical Detail

  • The canonical worker pool:
    func RunPool[T, R any](ctx context.Context, n int, in <-chan T, fn func(context.Context, T) (R, error)) <-chan Result[R] {
        out := make(chan Result[R])
        var wg sync.WaitGroup
        wg.Add(n)
        for i := 0; i < n; i++ {
            go func() {
                defer wg.Done()
                for {
                    select {
                    case <-ctx.Done():
                        return
                    case task, ok := <-in:
                        if !ok { return }
                        r, err := fn(ctx, task)
                        select {
                        case out <- Result[R]{r, err}:
                        case <-ctx.Done():
                            return
                        }
                    }
                }
            }()
        }
        go func() { wg.Wait(); close(out) }()
        return out
    }
    
    Every line above is load-bearing: the double-select on input and output, the wg.Done in defer, the closer goroutine after wg.Wait.
  • Leak detection tooling:
  • goleak for tests.
  • pprof goroutine for production: curl /debug/pprof/goroutine?debug=2 dumps every goroutine's stack. Read it.
  • runtime.NumGoroutine() exported as a metric. A monotonically growing count is the leak signal.
  • Deadlock detection:
  • go-deadlock (sasha-s/go-deadlock) wraps sync.Mutex with timing-based deadlock detection in dev builds.
  • For partial deadlocks: instrumentation on the lock acquisition path (lock contention metrics from runtime/metrics).
  • Backpressure: when the worker pool is saturated, what should the caller see? Three strategies: block (default), drop (with metric), reject (return error). The choice is application-dependent; document it.

12.3 Lab-"Worker Pool Survival Test"

Build a worker pool that handles: 1. Backpressure-bounded input channel, drop-with-metric on overflow. 2. Graceful shutdown-on ctx.Done(), drain in-flight tasks within a deadline, then abandon the rest. 3. Per-task timeouts-WithTimeout(ctx, 100ms) per task. 4. Panic isolation-a panic in one task does not kill the worker; recover and report. 5. Leak-clean-goleak passes after cancel(); pool.Wait().

Stress-test with 1M tasks across 1000 workers under - race`.

12.4 Idiomatic & golangci-lint Drill

  • bodyclose (HTTP responses leaked), rowserrcheck (sql.Rows.Err unchecked), sqlclosecheck. All three are leak-class lints; enable them as - D warnings`.

12.5 Production Hardening Slice

  • Add a /debug/pprof/goroutine periodic snapshot job to your service template: every 5 minutes, capture the goroutine count and the top-N stacks. Surface as a Prometheus gauge with stack-hash labels (low cardinality). On a leak, you will see which stack is growing without paging anyone.

Month 3 Capstone Deliverable

A concurrency-lab/ workspace: 1. chan-bench (week 9)-channel vs mutex vs atomic ring, with a markdown writeup. 2. spsc-ring (week 10)-atomic-only, race-clean, with cache-pad ablation. 3. context-discipline (week 11)-a refactored HTTP service plus a singleflight cache demo. 4. survival-pool (week 12)-the worker pool that survives the five failure modes.

CI gates additions: - raceon every test, - race -count=100 on critical packages, goleak baseline, 0-alloc regression guard on the SPSC ring's hot path. Open one upstream PR-even a doc fix to errgroup or `singleflight - by month end.

Comments