Saltar a contenido

Worked example - Week 6: reading a GODEBUG=gctrace=1 output

Companion to Go Mastery → Month 02 → Week 6: The Garbage Collector. The week explains the tricolor concurrent mark-sweep algorithm. This page walks one real gctrace=1 line from a running program so the next time you see it in production logs, every field has meaning.

The program

// allocator.go
package main

import (
    "fmt"
    "time"
)

func main() {
    var sink []*[1024]byte
    for i := 0; i < 1_000_000; i++ {
        b := new([1024]byte)
        b[0] = byte(i)
        sink = append(sink, b)
        if i%50_000 == 0 {
            time.Sleep(10 * time.Millisecond) // give GC room to breathe
        }
    }
    fmt.Println("allocated", len(sink), "buffers")
}

A small, deliberately-allocating program. Each iteration allocates a 1 KB array and keeps a pointer to it. We expect the heap to grow steadily and GC cycles to happen periodically.

Running it with gctrace

$ GOGC=100 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -20
gc 1 @0.003s 0%: 0.012+0.31+0.020 ms clock, 0.10+0.054/0.30/0.072+0.16 ms cpu, 4->4->2 MB, 5 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 2 @0.012s 0%: 0.011+0.46+0.030 ms clock, 0.094+0.21/0.45/0.10+0.24 ms cpu, 4->5->3 MB, 5 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 3 @0.024s 0%: 0.014+0.66+0.027 ms clock, 0.11+0.23/0.66/0.18+0.22 ms cpu, 6->7->5 MB, 7 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 4 @0.046s 1%: 0.015+1.4+0.029 ms clock, 0.12+0.46/1.4/0.18+0.23 ms cpu, 10->12->9 MB, 11 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 5 @0.094s 1%: 0.014+2.4+0.030 ms clock, 0.11+0.49/2.4/0.42+0.24 ms cpu, 18->20->15 MB, 19 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 6 @0.187s 1%: 0.013+3.6+0.025 ms clock, 0.10+0.84/3.6/0.93+0.20 ms cpu, 30->33->24 MB, 31 MB goal, 0 MB stacks, 0 MB globals, 8 P

Take one line - gc 4 - and decode every field.

The fields, in order

gc 4 @0.046s 1%: 0.015+1.4+0.029 ms clock, 0.12+0.46/1.4/0.18+0.23 ms cpu, 10->12->9 MB, 11 MB goal, 0 MB stacks, 0 MB globals, 8 P
  • gc 4 - the 4th GC cycle since program start.
  • @0.046s - this cycle began 46 ms after program start.
  • 1% - the program has spent 1% of total wall-clock time in GC. (Sum across all GC cycles so far.)
  • 0.015+1.4+0.029 ms clock - wall-clock duration of the three GC phases:
    1. 0.015 ms - stop-the-world sweep termination. The runtime pauses all goroutines briefly to finish any leftover sweeping from the previous cycle.
    2. 1.4 ms - concurrent mark + scan. This is the bulk of the work. Goroutines keep running while the GC marks reachable objects.
    3. 0.029 ms - stop-the-world mark termination. A second brief pause to finalize the mark.

The two pauses (0.015 + 0.029 = ~44 µs total) are what your latency-sensitive code feels. The 1.4 ms middle is concurrent and doesn't block.

  • 0.12+0.46/1.4/0.18+0.23 ms cpu - total CPU time across all P (processors). The layout mirrors wall-clock but split by phase, with the middle phase broken into three (assist+background+idle CPU time).

  • 10->12->9 MB - heap size:

    1. 10 MB - heap size when GC started.
    2. 12 MB - heap size after marking (peak - concurrent work added objects).
    3. 9 MB - heap size after sweep finishes (live heap retained).

So this cycle reclaimed ~3 MB.

  • 11 MB goal - the heap-size trigger for next GC. Set by GOGC=100 to mean "trigger next GC when heap grows ~100% beyond the just-marked live set." 9 MB live × (1 + 100/100) = 18 MB would be the naive goal, but the runtime adjusts. Actual next-trigger may differ.

  • 0 MB stacks - total goroutine stack memory. We have only one goroutine here.

  • 0 MB globals - package-level data. Our program has almost none.

  • 8 P - 8 processors (P) participating. Matches GOMAXPROCS=8.

What changed across cycles

Watch the 4->5->3, 6->7->5, 10->12->9, 18->20->15, 30->33->24 progression. The live heap (third number) is growing as our sink slice retains more pointers. The trigger heap (first number, on next cycle) tracks: each cycle starts roughly 2× the previous live size, because GOGC=100 means "200% headroom."

The middle pause-free phase grew too: 0.310.460.661.42.43.6 ms. That's expected; marking takes time proportional to the live object graph. Stop-the-world phases stayed flat (~15 µs + ~30 µs) regardless of heap size - that's the whole point of Go's concurrent GC.

What this tells you

  • Go's GC pauses are measured in tens of microseconds. The "GC pause" most people complain about in older runtimes does not apply here.
  • "GC took 3.6 ms" sounds bad, but 99.9% of that was concurrent. Your goroutines kept running.
  • Heap headroom is the lever. GOGC=200 doubles the headroom (less frequent, bigger cycles); GOGC=50 halves it (more frequent, smaller). GOMEMLIMIT (Go 1.19+) caps absolute heap regardless of GOGC.

Try tuning

Re-run with different settings:

$ GOGC=50 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -10
# More GC cycles, smaller peaks, slightly more CPU on GC.

$ GOGC=200 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -10
# Fewer GC cycles, larger peaks, lower GC CPU but more memory.

$ GOMEMLIMIT=20MiB GOGC=100 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -20
# Soft cap: when heap approaches 20 MiB, GC runs more aggressively
# to stay under the limit even if GOGC would let it grow.

The trap

Reading gctrace and concluding "we should tune GOGC in production." Usually no. In 95% of cases, the right answer is: 1. Use the default GOGC=100. 2. Set GOMEMLIMIT to ~80% of your container's memory limit so the GC starts pushing back before OOM. 3. Use runtime/pprof heap profiles to find allocation hotspots and fix the code, not the GC.

Tuning GOGC is a last resort and almost always trades latency for throughput (or vice versa) - not a free win.

Exercise

  1. Run the program above. Identify the GC cycle in which the heap first crossed 100 MB.
  2. Add a sync.Pool for the byte arrays. Re-run with the same GOGC=100. How many GC cycles happen now? How does the heap profile change?
  3. Run with -toolexec='go tool trace' and view the resulting trace in go tool trace UI. Find a GC cycle. See the assists, the background sweepers, the STW phases.

Comments