Worked example - Week 16: a lock-free counter, then the ABA bug¶
Companion to Java Mastery → Month 04 → Week 16: Lock-Free Patterns. The week introduces compare-and-swap (CAS) and the building blocks of lock-free data structures. This page walks one concrete example end-to-end: build a counter with CAS, then deliberately introduce the ABA hazard so you can see what it looks like.
A naive shared counter¶
The wrong way:
Two threads calling increment() concurrently lose updates. The ++ decomposes into three bytecodes (iload, iconst_1, iadd, istore) and another thread can land between any of them. Classic data race.
The textbook fix is synchronized, but that takes a lock. Lock-free programming asks: can we do this with hardware atomics instead?
CAS to the rescue¶
import java.util.concurrent.atomic.AtomicInteger;
public class Counter {
private final AtomicInteger value = new AtomicInteger(0);
public int increment() {
int prev, next;
do {
prev = value.get();
next = prev + 1;
} while (!value.compareAndSet(prev, next));
return next;
}
}
The body is a classic CAS loop. Read it carefully:
prev = value.get()- snapshot the current value.next = prev + 1- compute the new value off-line, without holding anything.value.compareAndSet(prev, next)- atomically: if the current value still equalsprev, replace it withnext; otherwise leave it alone. Returnstrueif it swapped.while (!...)- if the swap failed, somebody else beat us to it. Loop and try again with a fresh snapshot.
This compiles down to the CPU's LOCK CMPXCHG instruction (on x86). It's wait-free for the single-CAS case; under heavy contention threads can retry forever, but that's a liveness issue, not a correctness one.
Note what we did not do: we never held a lock. No thread can block another by being slow inside increment(). If a thread is paused by the OS scheduler mid-CAS-loop, every other thread still makes progress.
Now make it break: the ABA bug¶
The CAS-loop pattern is safe when the only thing that matters is the current value. It breaks when the value is == to a previous value but the state behind it has changed. This is the ABA problem.
Concrete demo. Suppose we use CAS to pop from a singly-linked stack:
public class LockFreeStack<T> {
private final AtomicReference<Node<T>> top = new AtomicReference<>(null);
record Node<T>(T value, Node<T> next) {}
public void push(T v) {
Node<T> oldTop, newTop;
do {
oldTop = top.get();
newTop = new Node<>(v, oldTop);
} while (!top.compareAndSet(oldTop, newTop));
}
public T pop() {
Node<T> oldTop, newTop;
do {
oldTop = top.get();
if (oldTop == null) return null;
newTop = oldTop.next();
} while (!top.compareAndSet(oldTop, newTop));
return oldTop.value();
}
}
This looks fine. CAS guarantees we only swap if top is still the node we saw. But consider:
- Thread A reads
topand sees nodeX(X.next = Y). About to CAS. - Thread A is descheduled for a moment.
- Thread B pops
X. Top is nowY. - Thread B pops
Y. Top is nownull. - Thread B reuses node
X's memory to push a new value; top is nowXagain, but withX.next = null(a fresh state). - Thread A wakes up. CAS sees top ==
X(the same reference!), succeeds, and sets top toY. ButYis in nobody's list anymore. Catastrophe.
The CAS check passed because the reference is equal, but the state behind it changed and snapped back. That's the A → B → A in "ABA."
What fixes it¶
In Java the standard fix is a tagged pointer - pair the reference with a counter, so each push/pop increments the counter, and the CAS is over the (ref, counter) tuple:
import java.util.concurrent.atomic.AtomicStampedReference;
private final AtomicStampedReference<Node<T>> top =
new AtomicStampedReference<>(null, 0);
public T pop() {
int[] stamp = new int[1];
Node<T> oldTop, newTop;
do {
oldTop = top.get(stamp);
if (oldTop == null) return null;
newTop = oldTop.next();
} while (!top.compareAndSet(oldTop, newTop, stamp[0], stamp[0] + 1));
return oldTop.value();
}
Now even if a reference is reused, the stamp will be different, and the CAS will fail. This is one of two common fixes (the other is hazard pointers / epoch-based reclamation - heavier-weight, used in JVM-level data structures like ConcurrentLinkedQueue).
The trap¶
It's tempting to assume "compare-and-swap = thread-safe = done." It's neither. CAS is a building block. Building correct lock-free data structures on top of CAS requires reasoning about the entire state space, not just the field you're swapping. ABA is the canonical mistake; there are subtler ones (linearizability violations, memory reclamation hazards, weak-memory-model surprises on non-x86).
The pragmatic advice: don't write your own lock-free data structures unless you have to. Use java.util.concurrent.atomic.*, ConcurrentHashMap, ConcurrentLinkedQueue, LongAdder. Those have had a decade of stress testing under jcstress. Your version hasn't.
Exercise¶
- Write the broken stack from the second example. Wrap
push/popin a loop driven by multiple threads. Usejcstress(referenced in the dense Go week on testing - the JVM equivalent) to provoke and observe the ABA bug. - Apply the
AtomicStampedReferencefix. Re-run. Confirm the test passes. - Read the source of
java.util.concurrent.ConcurrentLinkedQueue. Find thecasItemandcasNextcalls. How does the JDK avoid ABA there?
Related reading¶
- The main Week 16 chapter covers CAS, memory ordering, and Java's concurrency utilities.
- The Concurrency cross-topic page puts CAS next to Rust's atomics and Go's channels.
- The Memory models cross-topic page explains the happens-before guarantees that make CAS correct on weak-memory hardware.
- Glossary entries: CAS, Memory ordering, Happens-before in the main glossary.