Week 16 - Lock-Free Patterns, VarHandle Memory Modes, and jcstress¶
Conceptual Core¶
You almost never need lock-free code. When you do - high-contention counters, single-producer single-consumer queues, latency-critical paths where mutex acquire-cost is intolerable - VarHandle's memory-ordering modes are the right tool, and jcstress is the only way to validate.
The decision rule: if synchronized or ReentrantLock gives you the latency and throughput you need, stop. Lock-free code is one to two orders of magnitude harder to get right than locked code. The few legitimate reasons to write it: documented hot-path contention (>100k ops/sec/thread on the same data), latency tail caused by lock convoy, or memory-ordering needs that the lock primitives don't express.
Mechanical Detail¶
-
VarHandlemodes in order of strictness:Plain- no ordering, no atomicity beyond hardware natural.Opaque- per-variable ordering (no reordering with itself), no inter-variable.Acquire/Release- one-way ordering (acquire-load sees all writes before the matching release-store).Volatile- full sequential consistency.
Map directly to C++20:
relaxed/relaxed+opaque/acquire+release/seq_cst. Use the weakest that proves correctness. - The classical patterns: lock-free SPSC queue (single-producer single-consumer - LMAX Disruptor is the famous example), Treiber stack (CAS on head), Michael-Scott queue (lock-free FIFO), hazard pointers (safe memory reclamation), RCU-style epoch reclamation. -LongAddervsAtomicLong: under contention,AtomicLongthrashes the cache line;LongAdderis striped across multiple cells, summed on read. The right choice for high-contention counters (request counts, metrics). -jcstress(org.openjdk.jcstress) is Shipilëv's harness: declare small concurrent fragments with@JCStressTest, annotate expected outcomes (@Outcome(id = "1, 1", expect = ACCEPTABLE)), and the harness exhaustively explores interleavings under multiple memory models. The unit-test framework for the JMM.
The trap
Using Plain mode and hoping it's "atomic enough." It isn't. Plain reads can return stale values forever; plain writes can be reordered with anything. Reach for Volatile or Acquire/Release; drop to Opaque only with a documented justification; never use Plain for shared state.
Lab¶
Implement a Treiber stack with VarHandle.compareAndSet (single VarHandle on the head pointer). Write three jcstress tests:
1. Linearizability - concurrent push + pop produces an ordering consistent with some serial schedule.
2. No lost pops - every pushed element is popped exactly once.
3. ABA exposure - under contention, a pop-then-push cycle can corrupt a CAS; document the scenario even if you don't fix it (the standard fix is hazard pointers or versioned pointers).
Run under all available -m modes (default, sequential consistency, relaxed).
Idiomatic Drill¶
Read the source of ConcurrentHashMap (Doug Lea). You will not understand all of it. Understand enough - specifically, how the table is striped, how resize hand-off works, and why every shared field is volatile. This is the gold standard for production lock-free Java code.
Production Hardening Slice¶
Add a "concurrency review" checklist to your hardening/ template:
- Every mutable shared field is justified (why not local + return?).
- Every lock has a documented invariant ("
monitorprotectscache.size <= maxSize"). - Every
volatilehas a documented happens-before edge ("write ininit()happens-before reads inprocess()"). - Every
Future/CompletableFuturechain has a defined cancellation path (no orphan continuations on the common pool). - Every
VarHandleuse names its memory mode in a comment.