Skip to content

Concurrency

Why it matters

Five languages, five fundamentally different answers to the same question: how do you make a program do more than one thing at once without corrupting state or wasting cores? The answers - channels, locks + virtual threads, ownership-as-type, the GIL + asyncio, futex-backed primitives - encode different bets about which kinds of programs matter most.

Read at least two paths' concurrency chapters. The contrast is where your mental model upgrades.


The lens, per path

Go - concurrency as a language feature

Month 3 - Concurrency Mastery. The GMP scheduler (goroutines × OS threads × processors), channel internals (chan.go), select, the context package, the Go memory model. Goroutines are cheap (~2KB stack), preemptive (since 1.14), and the default unit of concurrency.

What's unique here: "Don't communicate by sharing memory; share memory by communicating." Channels-first is doctrine. Mutexes exist (sync.Mutex, sync.RWMutex) but reading a Go codebase that uses them heavily is a sign of fighting the language.

The trap

leaking goroutines. Every go func() { ... }() is a future debugging session if you don't have a cancellation path through context.

Java - the longest history, the newest paradigm shift

Month 4 - Concurrency & Loom. The Java Memory Model, java.util.concurrent (Doug Lea), the pre-Loom world (CompletableFuture, reactive), then virtual threads (final in 21), structured concurrency (final in 25), and scoped values (final in 25).

What's unique here: Java is the only mainstream platform that ran the "millions of cheap threads" experiment twice - Quasar/Kotlin coroutines as third-party, then Loom as first-party. Post-Loom, the right Java concurrency pattern often is blocking code on a virtual thread. Reactive's primary motivation evaporates for most workloads.

The trap

pinning. Pre-24, synchronized blocks pin a virtual thread to its carrier. JEP 491 fixed it in 24+; until your fleet is on 24+, ReentrantLock is the Loom-friendly default.

Rust - concurrency as a type-system feature

Month 3 - Concurrency & Async. Send and Sync marker traits, Arc<Mutex<T>> vs. Arc<RwLock<T>> vs. channels (std::sync::mpsc, crossbeam, tokio::sync), then async/await, executors (Tokio, async-std), and the Pin/Unpin machinery.

What's unique here: data races are a compile error. The type system encodes "this value is safe to send between threads" and "this value is safe to share between threads" as orthogonal capabilities. Most concurrency bugs that exist in Go/Java code cannot be expressed in safe Rust.

The trap

fighting the borrow checker by reaching for Arc<Mutex<T>> everywhere. The idiomatic answer is usually channels (message passing) or actor-like designs. Reach for shared state only when measured.

Python - the GIL and its workarounds

Month 4 - Concurrency & Parallelism. The Global Interpreter Lock and what it actually protects (CPython bytecode dispatch), the threading model (preemptive but GIL-serialized), multiprocessing, asyncio, concurrent.futures. Then free-threaded CPython (PEP 703, going stable in 3.14) - the GIL becomes optional.

What's unique here: Python's concurrency story is two stories. CPU-bound work goes to multiprocessing or native extensions; I/O-bound work goes to asyncio or threads. Free-threaded CPython merges them - at the cost of per-object locking overhead that hurts single-threaded perf ~10%.

The trap

assuming asyncio makes everything fast. It only helps I/O concurrency. A CPU-bound coroutine starves the event loop; a blocking syscall in an async function blocks every other coroutine on the loop.

Linux kernel - the primitives everyone else builds on

Month 2 - Memory & Scheduling for the CFS/EEVDF scheduler; Month 3 - Namespaces, Cgroups, eBPF for the isolation primitives. futex (fast userspace mutex) is the syscall every mutex in every userspace language ultimately calls.

What's unique here: kernel concurrency is the substrate. Spinlocks, sleeping locks (mutex_t), RCU, per-CPU variables, atomic ops with documented memory ordering. The Linux memory model is a published spec; the userspace ones (Go MM, JMM, Rust orderings) are downstream of it conceptually.

The trap

thinking spinlocks are always wrong. In the kernel, with contention shorter than a context-switch, they're correct. The choice between sleeping and spinning is workload-driven.


The contrasts that teach

Axis Go Java (post-Loom) Rust Python Kernel
Unit of concurrency goroutine virtual thread task / OS thread task / thread / process thread / IRQ / softirq
Cost per unit ~2KB ~few KB ~OS thread or ~task ~OS thread or ~task KB-MB
Primary message-passing channel BlockingQueue channel (mpsc) asyncio.Queue per-CPU queues
Primary shared state sync.Mutex ReentrantLock Mutex<T> (typed) threading.Lock spinlock / mutex
Data-race detection -race flag jcstress (formal) compile-time none (until free-threaded) KCSAN
Memory model Go MM JMM C++20-derived informal LKMM (formal spec)
Default approach channels first blocking on virtual threads channels / Send+Sync event loop or processes spinlock / RCU

The most clarifying read across these: Go's GMP scheduler + Java's Loom carriers + Tokio's executor + Linux's CFS - four implementations of "many user-space tasks on few OS threads." Different cost models, different fairness guarantees, same shape.


What to read first

  • You're a backend engineer writing services → Go Month 3, then Java Month 4. Two takes on "throughput servers with many concurrent requests."
  • You write systems code where data races are intolerable → Rust Month 3. Then read Java's JMM section for what the cost of not having Send/Sync in the type system looks like.
  • You write Python at scale → Python Month 4, then circle back to free-threaded CPython's design once your mental model of the GIL is solid.
  • You write any of the above and want to understand the substrate → Linux Month 2 (scheduler + futex). Every mutex in your day job ends up here.