Skip to content

Worked example - Week 16: a lock-free counter, then the ABA bug

Companion to Java Mastery → Month 04 → Week 16: Lock-Free Patterns. The week introduces compare-and-swap (CAS) and the building blocks of lock-free data structures. This page walks one concrete example end-to-end: build a counter with CAS, then deliberately introduce the ABA hazard so you can see what it looks like.

A naive shared counter

The wrong way:

public class Counter {
    private int value = 0;
    public int increment() { return ++value; }
}

Two threads calling increment() concurrently lose updates. The ++ decomposes into three bytecodes (iload, iconst_1, iadd, istore) and another thread can land between any of them. Classic data race.

The textbook fix is synchronized, but that takes a lock. Lock-free programming asks: can we do this with hardware atomics instead?

CAS to the rescue

import java.util.concurrent.atomic.AtomicInteger;

public class Counter {
    private final AtomicInteger value = new AtomicInteger(0);
    public int increment() {
        int prev, next;
        do {
            prev = value.get();
            next = prev + 1;
        } while (!value.compareAndSet(prev, next));
        return next;
    }
}

The body is a classic CAS loop. Read it carefully:

  • prev = value.get() - snapshot the current value.
  • next = prev + 1 - compute the new value off-line, without holding anything.
  • value.compareAndSet(prev, next) - atomically: if the current value still equals prev, replace it with next; otherwise leave it alone. Returns true if it swapped.
  • while (!...) - if the swap failed, somebody else beat us to it. Loop and try again with a fresh snapshot.

This compiles down to the CPU's LOCK CMPXCHG instruction (on x86). It's wait-free for the single-CAS case; under heavy contention threads can retry forever, but that's a liveness issue, not a correctness one.

Note what we did not do: we never held a lock. No thread can block another by being slow inside increment(). If a thread is paused by the OS scheduler mid-CAS-loop, every other thread still makes progress.

Now make it break: the ABA bug

The CAS-loop pattern is safe when the only thing that matters is the current value. It breaks when the value is == to a previous value but the state behind it has changed. This is the ABA problem.

Concrete demo. Suppose we use CAS to pop from a singly-linked stack:

public class LockFreeStack<T> {
    private final AtomicReference<Node<T>> top = new AtomicReference<>(null);
    record Node<T>(T value, Node<T> next) {}

    public void push(T v) {
        Node<T> oldTop, newTop;
        do {
            oldTop = top.get();
            newTop = new Node<>(v, oldTop);
        } while (!top.compareAndSet(oldTop, newTop));
    }

    public T pop() {
        Node<T> oldTop, newTop;
        do {
            oldTop = top.get();
            if (oldTop == null) return null;
            newTop = oldTop.next();
        } while (!top.compareAndSet(oldTop, newTop));
        return oldTop.value();
    }
}

This looks fine. CAS guarantees we only swap if top is still the node we saw. But consider:

  1. Thread A reads top and sees node X (X.next = Y). About to CAS.
  2. Thread A is descheduled for a moment.
  3. Thread B pops X. Top is now Y.
  4. Thread B pops Y. Top is now null.
  5. Thread B reuses node X's memory to push a new value; top is now X again, but with X.next = null (a fresh state).
  6. Thread A wakes up. CAS sees top == X (the same reference!), succeeds, and sets top to Y. But Y is in nobody's list anymore. Catastrophe.

The CAS check passed because the reference is equal, but the state behind it changed and snapped back. That's the A → B → A in "ABA."

What fixes it

In Java the standard fix is a tagged pointer - pair the reference with a counter, so each push/pop increments the counter, and the CAS is over the (ref, counter) tuple:

import java.util.concurrent.atomic.AtomicStampedReference;

private final AtomicStampedReference<Node<T>> top =
    new AtomicStampedReference<>(null, 0);

public T pop() {
    int[] stamp = new int[1];
    Node<T> oldTop, newTop;
    do {
        oldTop = top.get(stamp);
        if (oldTop == null) return null;
        newTop = oldTop.next();
    } while (!top.compareAndSet(oldTop, newTop, stamp[0], stamp[0] + 1));
    return oldTop.value();
}

Now even if a reference is reused, the stamp will be different, and the CAS will fail. This is one of two common fixes (the other is hazard pointers / epoch-based reclamation - heavier-weight, used in JVM-level data structures like ConcurrentLinkedQueue).

The trap

It's tempting to assume "compare-and-swap = thread-safe = done." It's neither. CAS is a building block. Building correct lock-free data structures on top of CAS requires reasoning about the entire state space, not just the field you're swapping. ABA is the canonical mistake; there are subtler ones (linearizability violations, memory reclamation hazards, weak-memory-model surprises on non-x86).

The pragmatic advice: don't write your own lock-free data structures unless you have to. Use java.util.concurrent.atomic.*, ConcurrentHashMap, ConcurrentLinkedQueue, LongAdder. Those have had a decade of stress testing under jcstress. Your version hasn't.

Exercise

  1. Write the broken stack from the second example. Wrap push/pop in a loop driven by multiple threads. Use jcstress (referenced in the dense Go week on testing - the JVM equivalent) to provoke and observe the ABA bug.
  2. Apply the AtomicStampedReference fix. Re-run. Confirm the test passes.
  3. Read the source of java.util.concurrent.ConcurrentLinkedQueue. Find the casItem and casNext calls. How does the JDK avoid ABA there?

Comments