Skip to content

Garbage collection

Why it matters

Every memory-safe runtime makes the same trade and resolves it differently. Tracing vs. reference counting. Generational vs. flat. Stop-the-world vs. concurrent. Compacting vs. non-compacting. Region-based vs. free-list. A working systems engineer can name where any production GC lands on each axis - and predict the failure mode that comes from that choice.

The four paths below show four resolutions of the same trade. Read at least two; the contrast is where the learning lives.


The lens, per path

Java - the most engineered GC ecosystem in existence

Month 3 - Memory & GC. Four weeks on object layout (incl. JEP 450 compact headers), the generational hypothesis, the GC family tree (Serial → Parallel → CMS-removed → G1 → ZGC → Shenandoah → Generational ZGC), container-aware heap sizing, and JFR-driven tuning.

What's unique here: the menu. Java is the only mainstream platform where picking a GC is a real design decision - Generational ZGC for sub-ms pauses on multi-GB heaps, G1 for general-purpose throughput, Parallel for batch, Serial for tiny containers. Every other path has one collector and lives with it.

The trap

assuming -Xmx is the memory budget. It is not. Direct memory, metaspace, code cache, GC overhead, and thread stacks all live outside the heap. Week 11 walks the full accounting.

Go - the world's least-configurable GC, on purpose

Month 2 - Memory & GC. Four weeks on the heap/stack split via escape analysis, the tricolor concurrent mark-sweep collector, write barriers, GOGC and GOMEMLIMIT (since 1.19), and the deliberate non-generational design.

What's unique here: Go's GC has two knobs (GOGC, GOMEMLIMIT) and refuses to add a third. The design ethic is "tune the allocation rate, not the collector." Most Go GC bugs are allocation bugs - escape-analysis failures, pointer-rich data structures, accidental interface boxing.

The trap

assuming "Go has no generational GC because generational GC isn't worth it." The actual reason is that Go's compiler aggressively stack-allocates short-lived values, so the youngest generation effectively is the stack. The hypothesis still holds; the implementation just lives in a different place.

Python - refcount first, generational second

Month 3 - Runtime & Performance. CPython is reference-counted with a generational cycle collector on top. Every object has a refcount field; when it hits zero, deallocation is immediate. The cycle collector exists only to break unreachable reference cycles that refcounting alone leaks.

What's unique here: GC pauses are predictable (refcount drops are deterministic) until they aren't (a large object with many transitive references triggers a cascade on one decref). The "free-threaded" CPython work (PEP 703, going stable in 3.14) changes the cost model again - atomic refcount ops dominate cache traffic on multi-core.

The trap

assuming Python "doesn't have GC pauses." It does - they're just usually small and frequent rather than large and rare.

Linux kernel - manual allocators, not GC, but the same problems

Month 2 - Memory & Scheduling. No tracing GC, but the same fragmentation, locality, and pause problems show up in the buddy allocator (page-level) and slab/slub/slob allocators (object-level). Plus reverse-mapping, RCU reclamation, and the page cache lifecycle.

What's unique here: kernel allocation cannot fail-and-retry. Every allocator path has a fallback strategy (GFP flags), and the wrong choice deadlocks the kernel. RCU is garbage collection in disguise - readers are wait-free, writers defer reclamation until all readers exit a grace period.

The trap

thinking "this isn't GC." RCU absolutely is GC, just with a different reclamation trigger.


The contrasts that teach

Axis Java Go Python Linux
Mechanism Tracing Tracing Refcount + tracing for cycles Manual + RCU
Generational Yes (G1, Gen ZGC) No (stack-allocates instead) Yes (3 generations) N/A
Concurrent Yes (G1 mostly, ZGC fully) Yes (tricolor) No (refcount is sync) Yes (RCU)
Compacting Yes (G1, ZGC) No No N/A
Pause profile Sub-ms (ZGC) to 100ms (G1) <1ms typical Many small None (no STW)
Tuning surface Dozens of -XX: flags Two knobs None (cycle threshold) GFP flags, drop_caches
Failure mode Long pauses on bad sizing Allocation-rate spikes Cycle leaks, refcount thrash OOM-killer, fragmentation

The single most clarifying read across these: Java's Generational ZGC + Go's tricolor concurrent collector side-by-side. Same problem (concurrent reclamation without stopping mutators), two completely different solutions (colored pointers + load barriers vs. write barriers + assist credit).


What to read first

Pick by your current work:

  • You debug GC pauses in production today → Java Month 3 weeks 10–12. Most directly applicable; the JFR workflow transfers to any JVM-shaped problem.
  • You write services that need to scale memory cheaply → Go Month 2 plus Java week 9 (object layout). The contrast is the lesson.
  • You write Python at scale → Python Month 3, then come back to Java's Month 3 for the generational-hypothesis derivation. Python's gen collector is the same idea, smaller and weirder.
  • You hack the kernel → Linux Month 2, then read Go's tricolor section. RCU and tricolor solve adjacent problems.