Worked example - Week 6: reading a `GODEBUG=gctrace=1` output¶

Companion to Go Mastery → Month 02 → Week 6: The Garbage Collector. The week explains the tricolor concurrent mark-sweep algorithm. This page walks one real gctrace=1 line from a running program so the next time you see it in production logs, every field has meaning.

The program¶

// allocator.go
package main

import (
    "fmt"
    "time"
)

func main() {
    var sink []*[1024]byte
    for i := 0; i < 1_000_000; i++ {
        b := new([1024]byte)
        b[0] = byte(i)
        sink = append(sink, b)
        if i%50_000 == 0 {
            time.Sleep(10 * time.Millisecond) // give GC room to breathe
        }
    }
    fmt.Println("allocated", len(sink), "buffers")
}

A small, deliberately-allocating program. Each iteration allocates a 1 KB array and keeps a pointer to it. We expect the heap to grow steadily and GC cycles to happen periodically.

Running it with gctrace¶

$ GOGC=100 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -20
gc 1 @0.003s 0%: 0.012+0.31+0.020 ms clock, 0.10+0.054/0.30/0.072+0.16 ms cpu, 4->4->2 MB, 5 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 2 @0.012s 0%: 0.011+0.46+0.030 ms clock, 0.094+0.21/0.45/0.10+0.24 ms cpu, 4->5->3 MB, 5 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 3 @0.024s 0%: 0.014+0.66+0.027 ms clock, 0.11+0.23/0.66/0.18+0.22 ms cpu, 6->7->5 MB, 7 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 4 @0.046s 1%: 0.015+1.4+0.029 ms clock, 0.12+0.46/1.4/0.18+0.23 ms cpu, 10->12->9 MB, 11 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 5 @0.094s 1%: 0.014+2.4+0.030 ms clock, 0.11+0.49/2.4/0.42+0.24 ms cpu, 18->20->15 MB, 19 MB goal, 0 MB stacks, 0 MB globals, 8 P
gc 6 @0.187s 1%: 0.013+3.6+0.025 ms clock, 0.10+0.84/3.6/0.93+0.20 ms cpu, 30->33->24 MB, 31 MB goal, 0 MB stacks, 0 MB globals, 8 P

Take one line - gc 4 - and decode every field.

The fields, in order¶

gc 4 @0.046s 1%: 0.015+1.4+0.029 ms clock, 0.12+0.46/1.4/0.18+0.23 ms cpu, 10->12->9 MB, 11 MB goal, 0 MB stacks, 0 MB globals, 8 P

gc 4 - the 4th GC cycle since program start.
@0.046s - this cycle began 46 ms after program start.
1% - the program has spent 1% of total wall-clock time in GC. (Sum across all GC cycles so far.)
0.015+1.4+0.029 ms clock - wall-clock duration of the three GC phases:
1. 0.015 ms - stop-the-world sweep termination. The runtime pauses all goroutines briefly to finish any leftover sweeping from the previous cycle.
2. 1.4 ms - concurrent mark + scan. This is the bulk of the work. Goroutines keep running while the GC marks reachable objects.
3. 0.029 ms - stop-the-world mark termination. A second brief pause to finalize the mark.

The two pauses (0.015 + 0.029 = ~44 µs total) are what your latency-sensitive code feels. The 1.4 ms middle is concurrent and doesn't block.

0.12+0.46/1.4/0.18+0.23 ms cpu - total CPU time across all P (processors). The layout mirrors wall-clock but split by phase, with the middle phase broken into three (assist+background+idle CPU time).
10->12->9 MB - heap size:
1. 10 MB - heap size when GC started.
2. 12 MB - heap size after marking (peak - concurrent work added objects).
3. 9 MB - heap size after sweep finishes (live heap retained).

So this cycle reclaimed ~3 MB.

11 MB goal - the heap-size trigger for next GC. Set by GOGC=100 to mean "trigger next GC when heap grows ~100% beyond the just-marked live set." 9 MB live × (1 + 100/100) = 18 MB would be the naive goal, but the runtime adjusts. Actual next-trigger may differ.
0 MB stacks - total goroutine stack memory. We have only one goroutine here.
0 MB globals - package-level data. Our program has almost none.
8 P - 8 processors (P) participating. Matches GOMAXPROCS=8.

What changed across cycles¶

Watch the 4->5->3, 6->7->5, 10->12->9, 18->20->15, 30->33->24 progression. The live heap (third number) is growing as our sink slice retains more pointers. The trigger heap (first number, on next cycle) tracks: each cycle starts roughly 2× the previous live size, because GOGC=100 means "200% headroom."

The middle pause-free phase grew too: 0.31 → 0.46 → 0.66 → 1.4 → 2.4 → 3.6 ms. That's expected; marking takes time proportional to the live object graph. Stop-the-world phases stayed flat (~15 µs + ~30 µs) regardless of heap size - that's the whole point of Go's concurrent GC.

What this tells you¶

Go's GC pauses are measured in tens of microseconds. The "GC pause" most people complain about in older runtimes does not apply here.
"GC took 3.6 ms" sounds bad, but 99.9% of that was concurrent. Your goroutines kept running.
Heap headroom is the lever. GOGC=200 doubles the headroom (less frequent, bigger cycles); GOGC=50 halves it (more frequent, smaller). GOMEMLIMIT (Go 1.19+) caps absolute heap regardless of GOGC.

Try tuning¶

Re-run with different settings:

$ GOGC=50 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -10
# More GC cycles, smaller peaks, slightly more CPU on GC.

$ GOGC=200 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -10
# Fewer GC cycles, larger peaks, lower GC CPU but more memory.

$ GOMEMLIMIT=20MiB GOGC=100 GODEBUG=gctrace=1 go run allocator.go 2>&1 | head -20
# Soft cap: when heap approaches 20 MiB, GC runs more aggressively
# to stay under the limit even if GOGC would let it grow.

The trap¶

Reading gctrace and concluding "we should tune GOGC in production." Usually no. In 95% of cases, the right answer is: 1. Use the default GOGC=100. 2. Set GOMEMLIMIT to ~80% of your container's memory limit so the GC starts pushing back before OOM. 3. Use runtime/pprof heap profiles to find allocation hotspots and fix the code, not the GC.

Tuning GOGC is a last resort and almost always trades latency for throughput (or vice versa) - not a free win.

Exercise¶

Run the program above. Identify the GC cycle in which the heap first crossed 100 MB.
Add a sync.Pool for the byte arrays. Re-run with the same GOGC=100. How many GC cycles happen now? How does the heap profile change?
Run with -toolexec='go tool trace' and view the resulting trace in go tool trace UI. Find a GC cycle. See the assists, the background sweepers, the STW phases.

The main Week 6 chapter covers the tricolor algorithm, write barriers, and GC pacing.
The Performance methodology cross-topic page explains how to think about GC in the context of overall latency budgets.
Glossary: Tricolor marking, Write barrier, STW (stop-the-world), GOGC in the main glossary.

Worked example - Week 6: reading a GODEBUG=gctrace=1 output¶