Week 2 - The GMP Scheduler Model¶

2.1 Conceptual Core¶

G = Goroutine. A user-space concurrent execution context with a stack, program counter, and runtime metadata. Cheap (~2 KB initial stack), millions per process.
M = Machine. An OS thread (pthread on POSIX). Has a kernel-managed stack. Goroutines run on M's.
P = Processor. A logical execution context that holds a local run queue of runnable G's plus a few caches (per-P mcache for the allocator). GOMAXPROCS sets the number of P's; default = number of CPU cores.
The invariant: an M needs a P to run Go code (specifically, to call into the scheduler). When an M makes a blocking syscall, it hands off its P to another M so other goroutines can run. This is the magic that makes blocking syscalls cheap in Go.

2.2 Mechanical Detail¶

Read these source files in this order: 1. src/runtime/runtime2.go - the data structures:g,m,p,schedt. Keep this open as a reference. 2.src/runtime/proc.go::schedule - the heart of the scheduler: pick a runnable G and switch to it. 3. src/runtime/proc.go::findrunnable - the search order: local runq → global runq → netpoll → work-stealing from peer P's. 4.src/runtime/proc.go::newproc - what happens at go func(). Particularly note runqput and the work-stealing-friendly slot ordering.

Key concepts: - Local run queue: each P has a 256-slot ring buffer of runnable G's. Push to tail, pop from head; work-stealers take from peer P's heads. - Global run queue: a doubly-linked list under sched.lock. Used as overflow for local queues and for goroutines woken from netpoll. - Work stealing: when a P's local queue is empty, it picks a victim P at random and steals half its queue. This is what amortizes load across cores. - runtime.LockOSThread(): pin the calling goroutine to its current M. Necessary for cgo calls to OS APIs that require thread affinity (most GUI toolkits, OpenGL, some signal handlers). - runtime.Gosched(): cooperative yield. The goroutine is moved back to the global queue. - Asynchronous preemption (since Go 1.14): tight CPU loops without function-call boundaries used to monopolize a P; now the runtime sends SIGURG to the M to force a safe-point. Read runtime/preempt.go. - netpoller: integrates with epoll/kqueue/IOCP. When a goroutine blocks on a network read, it parks and the M can run other goroutines. The goroutine is unparked when the FD is ready.

2.3 Lab-"Schedule Forensics"¶

Build a tiny program that: 1. Spawns 1,000 goroutines, each computing a busy CPU loop for 10ms. 2. Records the time-to-completion distribution. 3. Re-runs with GOMAXPROCS=1, =2, =N (your core count). 4. Re-runs with runtime.Gosched() inserted in the loop. 5. Re-runs with the loop replaced by time.Sleep(10*time.Millisecond) (the netpoller path).

Tabulate the latency distributions in NOTES.md. Explain why GOMAXPROCS=1 without Gosched() produces high tail latency. Then, capture an execution trace with runtime/trace:

trace.Start(f)
defer trace.Stop()

View with go tool trace. Identify the per-P timeline, GC pauses, and proc transitions.

2.4 Idiomatic & `golangci-lint` Drill¶

staticcheck SA1019 (deprecated APIs), staticcheck SA5008 (forgotten defer vs loop variables), revive: confusing-naming. Less about scheduler correctness here, more about hygiene.

2.5 Production Hardening Slice¶

Wire runtime/trace to a /debug/trace HTTP handler (gated by build tag debug). Add pprof handlers (net/http/pprof import for side effect). Document how to capture a 10-second trace from a running process.