Week 2 - The GMP Scheduler Model¶
2.1 Conceptual Core¶
- G = Goroutine. A user-space concurrent execution context with a stack, program counter, and runtime metadata. Cheap (~2 KB initial stack), millions per process.
- M = Machine. An OS thread (
pthreadon POSIX). Has a kernel-managed stack. Goroutines run on M's. - P = Processor. A logical execution context that holds a local run queue of runnable G's plus a few caches (per-P mcache for the allocator).
GOMAXPROCSsets the number of P's; default = number of CPU cores. - The invariant: an M needs a P to run Go code (specifically, to call into the scheduler). When an M makes a blocking syscall, it hands off its P to another M so other goroutines can run. This is the magic that makes blocking syscalls cheap in Go.
2.2 Mechanical Detail¶
Read these source files in this order:
1. src/runtime/runtime2.go - the data structures:g,m,p,schedt. Keep this open as a reference.
2.src/runtime/proc.go::schedule - the heart of the scheduler: pick a runnable G and switch to it.
3. src/runtime/proc.go::findrunnable - the search order: local runq → global runq → netpoll → work-stealing from peer P's.
4.src/runtime/proc.go::newproc - what happens at go func(). Particularly note runqput and the work-stealing-friendly slot ordering.
Key concepts:
- Local run queue: each P has a 256-slot ring buffer of runnable G's. Push to tail, pop from head; work-stealers take from peer P's heads.
- Global run queue: a doubly-linked list under sched.lock. Used as overflow for local queues and for goroutines woken from netpoll.
- Work stealing: when a P's local queue is empty, it picks a victim P at random and steals half its queue. This is what amortizes load across cores.
- runtime.LockOSThread(): pin the calling goroutine to its current M. Necessary for cgo calls to OS APIs that require thread affinity (most GUI toolkits, OpenGL, some signal handlers).
- runtime.Gosched(): cooperative yield. The goroutine is moved back to the global queue.
- Asynchronous preemption (since Go 1.14): tight CPU loops without function-call boundaries used to monopolize a P; now the runtime sends SIGURG to the M to force a safe-point. Read runtime/preempt.go.
- netpoller: integrates with epoll/kqueue/IOCP. When a goroutine blocks on a network read, it parks and the M can run other goroutines. The goroutine is unparked when the FD is ready.
2.3 Lab-"Schedule Forensics"¶
Build a tiny program that:
1. Spawns 1,000 goroutines, each computing a busy CPU loop for 10ms.
2. Records the time-to-completion distribution.
3. Re-runs with GOMAXPROCS=1, =2, =N (your core count).
4. Re-runs with runtime.Gosched() inserted in the loop.
5. Re-runs with the loop replaced by time.Sleep(10*time.Millisecond) (the netpoller path).
Tabulate the latency distributions in NOTES.md. Explain why GOMAXPROCS=1 without Gosched() produces high tail latency. Then, capture an execution trace with runtime/trace:
go tool trace. Identify the per-P timeline, GC pauses, and proc transitions.
2.4 Idiomatic & golangci-lint Drill¶
staticcheck SA1019(deprecated APIs),staticcheck SA5008(forgottendefervs loop variables),revive: confusing-naming. Less about scheduler correctness here, more about hygiene.
2.5 Production Hardening Slice¶
- Wire
runtime/traceto a/debug/traceHTTP handler (gated by build tagdebug). Addpprofhandlers (net/http/pprofimport for side effect). Document how to capture a 10-second trace from a running process.