14 - Testing at the next level¶

What this session is¶

About ninety minutes. From Scratch taught you JUnit 5 basics - write a test, assert a result. This session is about testing real code: isolating the unit under test with mocks, running the same test over many inputs with parameterized tests, finding edge cases you didn't think of with property-based testing, the test-double vocabulary that shows up in every code review, and how to test the concurrent code you learned to write in chapters 09-11. By the end you can test code that has dependencies, not just pure functions.

The problem: real code has dependencies¶

From Scratch tests looked like this - pure functions, easy to test:

@Test
void addsCorrectly() {
    assertEquals(5, Calculator.add(2, 3));
}

But real code depends on other things - a database, an HTTP client, a clock, a payment gateway:

class OrderService {
    private final PaymentGateway gateway;     // external dependency
    private final InventoryRepo inventory;    // external dependency

    OrderResult placeOrder(Order order) {
        if (!inventory.inStock(order.itemId())) return OrderResult.outOfStock();
        var charge = gateway.charge(order.amount());   // calls a real payment system!
        return charge.success() ? OrderResult.placed() : OrderResult.declined();
    }
}

You can't unit-test this against the real PaymentGateway (you'd charge real money) or the real database (slow, stateful, requires setup). You need to substitute fake versions of the dependencies. That's what mocking - and the broader idea of test doubles - is for.

Design for testability first¶

The reason the example above is testable at all is a design choice from chapter 01: OrderService receives its dependencies through its constructor (dependency injection) and they're typed as interfaces. That's not an accident - it's what makes substituting fakes possible.

// Testable: dependencies are interfaces, injected. You can pass fakes.
class OrderService {
    OrderService(PaymentGateway gateway, InventoryRepo inventory) { ... }
}

// Untestable: dependency created internally, concrete. You're stuck with the real one.
class OrderService {
    private final PaymentGateway gateway = new StripeGateway();   // hardcoded - can't substitute
}

This connects the whole path: "accept interfaces" (chapter 01), "program to contracts" (chapter 02), and "depend on abstractions" all pay off here as testability. Code that's hard to test is usually code with bad dependencies - testing pressure reveals design problems. If a class is painful to test, that's a signal to fix its design, not to skip the test.

The test-double vocabulary¶

"Mock" is used loosely for all fakes, but the precise terms show up in reviews and matter for thinking clearly:

Dummy - a placeholder passed but never used (fills a parameter slot).
Stub - returns canned answers to calls (when asked for stock, say true). No verification.
Fake - a working but simplified implementation (an in-memory Map standing in for a database).
Mock - a stub that also records how it was called, so you can verify interactions (assert charge() was called once with $50).
Spy - a wrapper around a real object that records calls while delegating to the real implementation.

The two you use most: stub (control what a dependency returns) and mock (verify how a dependency was called). Most "mocking" is really one of these two.

Mockito: the standard mocking library¶

Mockito is the de facto Java mocking library. It creates fake implementations of interfaces, lets you program their responses, and lets you verify how they were called.

import static org.mockito.Mockito.*;
import org.junit.jupiter.api.Test;

class OrderServiceTest {

    @Test
    void placesOrderWhenInStockAndPaymentSucceeds() {
        // 1. Create mocks of the dependencies.
        PaymentGateway gateway = mock(PaymentGateway.class);
        InventoryRepo inventory = mock(InventoryRepo.class);

        // 2. Stub their behavior - "when called this way, return that".
        when(inventory.inStock("widget")).thenReturn(true);
        when(gateway.charge(50.0)).thenReturn(new Charge(true));

        // 3. Run the code under test with the fakes injected.
        var service = new OrderService(gateway, inventory);
        var result = service.placeOrder(new Order("widget", 50.0));

        // 4. Assert the result.
        assertEquals(OrderResult.placed(), result);

        // 5. Verify the interaction - the gateway WAS charged, exactly once, with $50.
        verify(gateway).charge(50.0);
        verify(gateway, times(1)).charge(anyDouble());
    }

    @Test
    void doesNotChargeWhenOutOfStock() {
        PaymentGateway gateway = mock(PaymentGateway.class);
        InventoryRepo inventory = mock(InventoryRepo.class);
        when(inventory.inStock("widget")).thenReturn(false);   // out of stock

        var service = new OrderService(gateway, inventory);
        var result = service.placeOrder(new Order("widget", 50.0));

        assertEquals(OrderResult.outOfStock(), result);
        verify(gateway, never()).charge(anyDouble());   // CRUCIAL: we never charged anyone
    }
}

The core verbs:

mock(Type.class)                          // create a fake
when(mock.method(args)).thenReturn(value) // stub a return value
when(mock.method(args)).thenThrow(ex)     // stub an exception
verify(mock).method(args)                 // assert it was called (once, by default)
verify(mock, times(n)).method(args)       // called exactly n times
verify(mock, never()).method(args)        // never called
any(), anyString(), anyDouble(), eq(x)    // argument matchers for flexible matching

The never() verification in the second test is the kind of thing mocks make possible and is genuinely valuable: proving the code doesn't do something (charge a card when out of stock) is as important as proving it does.

When not to mock: don't mock value types (just construct them - chapter 03 records are trivial to make real), and don't mock types you don't own in a way that couples your test to their internals. Prefer a real or fake implementation when it's cheap; reach for mocks for genuinely external, expensive, or hard-to-set-up dependencies.

Parameterized tests: one test, many inputs¶

When you'd otherwise copy-paste a test with different values, use a parameterized test - one test method run once per input set. JUnit 5 makes this clean:

import org.junit.jupiter.params.ParameterizedTest;
import org.junit.jupiter.params.provider.*;

class ValidationTest {

    @ParameterizedTest
    @ValueSource(strings = {"", " ", "  ", "\t"})       // run once per value
    void blankStringsAreInvalid(String input) {
        assertFalse(Validator.isValidName(input));
    }

    @ParameterizedTest
    @CsvSource({                                          // each row: input, expected
        "alice@example.com, true",
        "no-at-sign,        false",
        "@nodomain,         false",
        "a@b.co,            true"
    })
    void emailValidation(String email, boolean expected) {
        assertEquals(expected, Validator.isValidEmail(email));
    }

    @ParameterizedTest
    @MethodSource("edgeCaseProvider")                    // supply args from a method
    void handlesEdgeCases(int input, int expected) {
        assertEquals(expected, Math.abs(input));
    }
    static Stream<Arguments> edgeCaseProvider() {
        return Stream.of(
            Arguments.of(-5, 5),
            Arguments.of(0, 0),
            Arguments.of(Integer.MIN_VALUE, Integer.MIN_VALUE)  // the famous abs overflow!
        );
    }
}

This turns ten near-duplicate test methods into one method and a data table - easier to read, easier to add cases, and each input shows as a separate result so you see exactly which case failed. Reach for parameterized tests whenever you're testing "the same logic across a range of inputs."

Property-based testing: finding cases you didn't think of¶

Example-based tests check the cases you thought of. Property-based testing generates hundreds of random inputs and checks that a property (an invariant) always holds - finding edge cases you'd never have written by hand. The Java library is jqwik.

import net.jqwik.api.*;

class SortProperties {

    @Property
    void sortedListIsSameLength(@ForAll List<Integer> input) {
        var sorted = MySort.sort(input);
        assertEquals(input.size(), sorted.size());        // property: sorting preserves length
    }

    @Property
    void sortedListIsOrdered(@ForAll List<Integer> input) {
        var sorted = MySort.sort(input);
        for (int i = 1; i < sorted.size(); i++) {
            assertTrue(sorted.get(i - 1) <= sorted.get(i)); // property: each <= the next
        }
    }

    @Property
    void encodingRoundTrips(@ForAll String original) {
        assertEquals(original, decode(encode(original)));  // property: decode(encode(x)) == x
    }
}

jqwik runs each @Property with ~1000 generated inputs - empty lists, single elements, huge lists, negative numbers, weird Unicode strings. When it finds a failure, it shrinks the input to the minimal failing case ("fails on [0, -1]") so you get a tiny reproduction, not a 500-element monster.

The mental shift: instead of "what example should I test," ask "what must always be true?" Round-trip properties (decode(encode(x)) == x), invariants (sorting preserves length and order), and equivalences (two implementations agree) are the sweet spots. Property-based testing famously finds the edge cases - empty input, overflow, Unicode boundaries - that example tests miss. Use it for pure logic with clear invariants; it complements example tests, doesn't replace them.

Testing concurrent code¶

The hard one, building on chapters 09-11. Concurrency bugs are timing-dependent (chapter 09), so a test that passes once proves little. Strategies:

1. Stress with many threads and check the invariant. Hammer the code from many threads and assert the result is correct - this catches races that single-threaded tests can't.

@Test
void counterIsThreadSafe() throws InterruptedException {
    var counter = new Counter();              // the chapter 10 fixed version
    int threads = 8, perThread = 100_000;
    try (var pool = Executors.newFixedThreadPool(threads)) {
        var done = new CountDownLatch(threads);
        for (int t = 0; t < threads; t++) {
            pool.submit(() -> {
                for (int i = 0; i < perThread; i++) counter.increment();
                done.countDown();
            });
        }
        done.await();                          // wait for all threads
    }
    assertEquals(threads * perThread, counter.get());  // exact - any race would lose increments
}

Run this against chapter 09's broken counter and it fails (wrong total); against chapter 10's fixed one it passes. The high iteration count and thread count maximize the chance of provoking a race.

2. Use synchronization aids for coordination. CountDownLatch (wait for N things), CyclicBarrier (release all threads at once to maximize collision), Phaser - these let you control timing to make races more likely to surface.

3. Use a real concurrency-testing tool. For serious lock-free or low-level concurrent code, jcstress (the Java Concurrency Stress tool) systematically explores interleavings and reports which outcomes are actually possible - far beyond what hand-rolled stress tests catch. It's specialist, but it's the right tool when you're writing genuinely tricky concurrent code.

The honest caveat: concurrency tests are probabilistic. A stress test that passes makes a bug less likely but can't prove absence (the bad interleaving might just not have happened this run). That's why the real defenses are correct design (chapters 09-11: minimize shared mutable state, use the right tools) plus the race detector and stress testing as backup. You test concurrent code, but you don't rely on tests alone to make it correct.

Test structure and quality¶

A few habits that separate good test suites from brittle ones:

Arrange-Act-Assert (AAA). Structure each test in three clear phases: set up (arrange), call the code (act), check the result (assert). The Mockito example above follows it. Readable tests have visible structure.
One logical assertion per test. A test should verify one behavior. "placesOrderWhenInStockAndPaymentSucceeds" tests one scenario; a separate test handles out-of-stock. When a test fails, the name tells you what broke.
Test behavior, not implementation. Assert what the code does (the return value, the observable effect), not how (don't assert internal calls unless the interaction is the behavior, like "must not charge when out of stock"). Over-mocking and over-verifying makes tests break on every refactor - a brittle suite people stop trusting.
Name tests as specifications. doesNotChargeWhenOutOfStock reads as a requirement. A failing test should tell you the violated requirement from its name alone.
Fast and isolated. Unit tests run in milliseconds and don't depend on each other or external state. Slow, order-dependent tests get skipped, and a skipped test protects nothing.

A note on the test pyramid¶

The standard guidance for what to test at which level:

Many unit tests (fast, isolated, mock external dependencies) - the base of the pyramid. Most of your tests.
Fewer integration tests (real database, real HTTP, wired-together components) - verify the pieces work together. Slower, so fewer.
Few end-to-end tests (the whole system) - confirm the critical paths work. Slowest and most brittle, so fewest.

The shape matters: lots of fast unit tests catch most bugs cheaply; a few integration and E2E tests catch the wiring issues units can't. Inverting it (mostly slow E2E tests) gives a suite that's slow, flaky, and hard to debug. Mock dependencies for unit tests; use real ones (often via Testcontainers - real databases in throwaway Docker containers) for the smaller integration layer.

Try it¶

Mock a dependency. Build the OrderService with PaymentGateway and InventoryRepo interfaces. Write the two tests shown: success path (stub both, verify charge) and out-of-stock (verify never() charged). Run them. Then write a third: payment declined (stub charge to return failure, assert declined, verify charge was attempted).
Fake vs mock. Implement InventoryRepo as a fake - a real class backed by an in-memory Map. Rewrite a test using the fake instead of a Mockito mock. Discuss when each is clearer: the fake when you need realistic behavior across many calls, the mock when you need to verify specific interactions.
Parameterize. Take a validation method and write a @CsvSource parameterized test with 6-8 input/expected rows including edge cases (empty, whitespace, boundary values). Run it - note each row is a separate result. Add a row that should fail and confirm only that row reports red.
Property test. If you have jqwik, write @Property void absIsNonNegative(@ForAll int n). Run it - it'll find Integer.MIN_VALUE, where Math.abs returns a negative number (overflow), shrinking to that exact input. A famous bug, found automatically. Then write a round-trip property for any encode/decode pair you have.
Stress-test a counter. Write the 8-thread stress test against chapter 09's broken counter++ and watch it fail (wrong total). Then point it at chapter 10's AtomicInteger version and watch it pass. Run each several times - note the broken one fails by different amounts (probabilistic).
Spot the brittle test. Take a test that over-verifies (verifys every internal method call). Refactor the implementation without changing behavior - watch the brittle test break despite correct behavior. Rewrite it to assert the observable result instead. Feel why "test behavior, not implementation" matters.

What you might wonder¶

"Mockito vs writing fakes by hand?" Mockito is faster to set up for one-off stubbing and is the standard for verifying interactions. Hand-written fakes (an in-memory repo) are better when many tests need the same realistic behavior, or when the fake is reused. Many codebases use both: Mockito for verifying calls and quick stubs, fakes for stateful dependencies used across a test suite. Don't mock value types - just build real ones.

"Is 100% test coverage the goal?" No. Coverage measures lines executed by tests, not lines meaningfully verified - you can have 100% coverage with assertions that prove nothing. Aim to test behavior and edge cases that matter; coverage is a rough hint about untested areas (0% on a complex method is a red flag), not a target to maximize. High coverage of trivial getters while the gnarly logic is untested is the worst of both.

"How do I test code that uses the current time or random numbers?" Inject them. Instead of calling Instant.now() or new Random() directly, take a Clock or a Random/seeded source as a dependency (chapter 01 again). In tests, pass a fixed Clock.fixed(...) or a seeded Random so behavior is deterministic. Hardcoded now()/randomness is the classic "untestable because of bad dependencies" problem.

"Should I test private methods?" Generally no - test them through the public methods that use them. If a private method is complex enough to want its own test, that's often a sign it should be extracted into its own class (with a public, testable interface) - the testing pressure revealing a design improvement again.

"Property-based testing sounds great - why isn't it everywhere?" It shines for pure logic with clear invariants (parsers, encoders, data structures, math) and is underused there. It's awkward for code with lots of side effects or unclear properties ("what's the invariant of this UI handler?"). Use it where invariants are clear; it complements example tests rather than replacing them. Many teams don't know it exists - now you do.

"How do I make flaky concurrency tests reliable?" You largely can't make a probabilistic test deterministic - which is the point that they can't prove correctness. Increase iterations/threads to raise the odds of catching a bug, use jcstress for systematic interleaving exploration on critical code, and most importantly design concurrency correctly (chapters 09-11) rather than testing it in afterward. A flaky test that occasionally catches a real race is still valuable as a signal - just don't mistake "passed once" for "correct."

Done¶

You know real code needs testable design: inject dependencies as interfaces (chapters 01-02 paying off).
You know the test-double vocabulary (dummy, stub, fake, mock, spy) and use stubs and mocks most.
You can use Mockito: mock, when().thenReturn(), verify(), including verifying something didn't happen.
You can write parameterized tests (@ValueSource, @CsvSource, @MethodSource) for one-logic-many-inputs.
You know property-based testing finds edge cases by checking invariants over generated inputs.
You can stress-test concurrent code and know such tests are probabilistic, not proof.
You know the quality habits (AAA, behavior-not-implementation, names-as-specs) and the test pyramid.

Next, the final chapter: bridging to mastery - reading harder code, a more ambitious contribution, and where to go from here.

Next: Bridging to mastery →