Testing¶
Why it matters¶
Five families of tests recur across every path: unit, integration, property-based, fuzz, benchmark. Each language has its own tooling for each, but the strategies transfer. The strongest software-engineering muscles you can build are:
- Knowing which family fits which kind of bug.
- Knowing each ecosystem's canonical answer well enough to read their tests.
This page is the cross-language reading list for both.
The five families¶
| Family | Catches | Run frequency | Canonical tools by path |
|---|---|---|---|
| Unit | logic bugs in pure functions, narrow modules | every commit | go test, JUnit 5, cargo test, pytest, kunit |
| Integration | wiring bugs, real-dependency bugs | every PR | Testcontainers, docker-compose, pytest fixtures |
| Property-based | edge cases you didn't think to write | every PR | jqwik (Java), proptest/quickcheck (Rust), hypothesis (Python), gopter (Go) |
| Fuzz | crashes, security bugs, parser bugs | continuous (oss-fuzz, CI nightly) | go test -fuzz, cargo-fuzz / libFuzzer, atheris (Python), syzkaller (kernel) |
| Benchmark | perf regressions | per release / nightly | JMH (Java), criterion (Rust), go test -bench, pytest-benchmark, perf+ftrace |
Plus a sixth, language-specific family: concurrency stress tests - jcstress (Java), loom (Rust), KCSAN (kernel), -race (Go). See memory models.
The lens, per path¶
Go - go test, -race, fuzz, examples¶
Month 1 - Runtime Foundations for the basics, Appendix A for the discipline.
Built into the toolchain. go test ./... runs everything. -race is the data-race detector. -fuzz (1.18+) is the built-in coverage-guided fuzzer. Example* functions double as documentation and tests - go doc shows them and go test runs them.
What's unique here: no test framework. The testing package gives you t.Errorf and parallelism (t.Parallel); subtests via t.Run("subname", ...). That's the API. Third-party libraries (testify, gocheck) exist; the community largely doesn't use them.
The trap
Forgetting t.Parallel() makes tests serialize unnecessarily on CI. Forgetting to capture the loop variable in subtests pre-Go-1.22 causes every parallel subtest to run with the same input.
Java - JUnit 5, AssertJ, Testcontainers, Mockito, JMH, jcstress, jqwik¶
Month 1 - Language & Toolchain, week 4. The deepest test ecosystem in any path.
- JUnit 5 (Jupiter) -
@Test,@ParameterizedTest,@Nested,@ExtendWith. Modern; supersedes JUnit 4 (still common in legacy code). - AssertJ - fluent assertions.
assertThat(x).isEqualTo(y).hasSize(3).containsExactly(...). Way better failure messages thanassertEquals. - Mockito - collaborator mocking. Mockito 5+, prefer constructor injection, never
@InjectMocksonfinalfields. - Testcontainers - real dependencies in tests via Docker. Postgres, Kafka, Redis, anything with an image. Rebuilds your mental model: most "integration tests" should be Testcontainer tests.
- jqwik - property-based testing.
@Property-annotated methods receive generated inputs; jqwik shrinks failing cases automatically. - JMH - the Java microbenchmark harness. The only correct way to measure JVM perf. See observability.
- jcstress - concurrency stress harness (Shipilëv's). For testing memory-model edge cases in your lock-free code.
What's unique here: every test family has a mature, opinionated, JDK-author-blessed tool. The cost is the learning curve - you genuinely need a week to get fluent.
The trap
Mocking what you don't own (Mockito's own wiki rule). Don't mock Connection/ResultSet/HttpClient/etc - those are external interfaces. Either use a real implementation in a Testcontainer or wrap them in your own interface and mock that.
Rust - cargo test, criterion, proptest, loom, MIRI, cargo-fuzz¶
Month 1 - Foundations, week on testing.
#[test] inside any module. cargo test runs them. Doctests in /// comments are real tests - they compile and run.
criterion- benchmark crate; statistical analysis of variance, regression detection, HTML reports.proptest/quickcheck- property-based testing.loom- model-checks concurrent code by exhaustively exploring thread interleavings. The Rust equivalent ofjcstress.- MIRI - interpreter for Rust's mid-level IR; detects undefined behavior in
unsafecode.cargo +nightly miri test. cargo-fuzz- libFuzzer integration.
What's unique here: the correctness tooling - MIRI + loom + the borrow checker - gives Rust a stronger "your concurrent code is actually correct" story than any other path.
The trap
cargo test runs tests in parallel by default. Shared global state (env vars, working directory, file paths) creates flaky tests. Use serial_test or properly isolate state.
Python - pytest, hypothesis, tox/nox, atheris¶
Month 1 - Foundations, week on testing.
pytest- the de facto test runner. Fixtures, parametrize, plugins (pytest-asyncio,pytest-cov,pytest-mock,pytest-benchmark,pytest-xdist).hypothesis- property-based testing; arguably the best in any language.@given(integers(), text())generates inputs and shrinks failures.tox/nox- matrix runners (test across Python versions, dependency versions).atheris- Google's coverage-guided Python fuzzer.
What's unique here: hypothesis's stateful testing - model your system as a state machine, let hypothesis explore the state space. Comparable to model-checking but more accessible.
The trap
Slow tests because fixtures are session-scoped when they should be function-scoped (or vice versa). pytest --setup-show reveals the actual setup graph.
Linux kernel - kunit, kselftest, syzkaller, ktap, KCSAN¶
Month 1 - Kernel Foundations, test week.
kunit- in-kernel unit tests, run during kernel build or in a tiny qemu instance viakunit_tool.py.kselftest- userspace-driven kernel feature tests; lives intools/testing/selftests/.syzkaller- Google's kernel fuzzer; the source of a huge fraction of recent kernel CVEs.- KCSAN - Kernel Concurrency Sanitizer; data-race detector.
- KASAN / KMSAN - Address / Memory Sanitizer for the kernel.
What's unique here: testing a kernel means testing inside a kernel. kunit literally runs in-tree; kselftest boots a userspace and pokes the kernel from outside; syzkaller runs hundreds of VMs and aggregates crashes.
The trap
Reproducing a syzkaller crash needs the exact same kernel config - random fuzzed reproducers are not portable. Always capture .config alongside the bug report.
AI Systems - pytest + GPU-aware fixtures, eval harnesses¶
Month 3 - Framework Internals and Deep Dive 08 - Evaluation Systems (in tutoriaal/DEEP_DIVES/).
Standard pytest stack, with GPU detection (pytest.mark.skipif(not torch.cuda.is_available())) and numerical-tolerance assertions (torch.testing.assert_close(actual, expected, rtol=1e-3, atol=1e-4)). For LLM applications: an evaluation harness (separate from unit tests) - see Deep Dive 08.
The trap
Asserting on exact floating-point equality across GPU runs. Different reduction orders, different tensor cores, different precisions → never bit-exact. Always use assert_close with documented tolerances.
The contrasts that teach¶
The strongest cross-language reading list for testing:
| Want to learn… | Read… | Then transfer to… |
|---|---|---|
| Property-based testing | hypothesis (Python) | jqwik (Java), proptest (Rust), gopter (Go) |
| Concurrency stress | jcstress (Java) + Shipilëv's writeups | loom (Rust), KCSAN (kernel), -race (Go) |
| Real-dependency integration | Testcontainers (Java) | testcontainers-python, testcontainers-go |
| Fuzzing | syzkaller (kernel) for the gold standard | cargo-fuzz, go fuzz, atheris |
| Benchmarking discipline | JMH (Java) + criterion (Rust) | pytest-benchmark, go test -bench |
| Data-race detection | -race (Go) + MIRI (Rust) |
KCSAN (kernel), TSAN (C) |
What to read first¶
- You write Go services → Go testing in Appendix A. Then run
-raceon your existing suite - most non-trivial Go codebases have at least one race nobody noticed. - You write JVM services → Java Month 1 week 4 + Testcontainers' docs. Replace every "mocked database" test with a Testcontainer one and watch your bug counts converge.
- You write Rust → cargo test + criterion + proptest. Add loom or MIRI if you have any
unsafeor any lock-free code. - You write Python at scale → pytest first, then hypothesis. The hypothesis tutorial is the single highest-leverage 90 minutes of testing reading you can do.
- You hack the kernel → kunit + kselftest, then read 100 syzkaller crash reports until the failure patterns are intuitive.