Testing¶

Why it matters¶

Five families of tests recur across every path: unit, integration, property-based, fuzz, benchmark. Each language has its own tooling for each, but the strategies transfer. The strongest software-engineering muscles you can build are:

Knowing which family fits which kind of bug.
Knowing each ecosystem's canonical answer well enough to read their tests.

This page is the cross-language reading list for both.

The five families¶

Family	Catches	Run frequency	Canonical tools by path
Unit	logic bugs in pure functions, narrow modules	every commit	go test, JUnit 5, cargo test, pytest, kunit
Integration	wiring bugs, real-dependency bugs	every PR	Testcontainers, docker-compose, pytest fixtures
Property-based	edge cases you didn't think to write	every PR	jqwik (Java), proptest/quickcheck (Rust), hypothesis (Python), gopter (Go)
Fuzz	crashes, security bugs, parser bugs	continuous (oss-fuzz, CI nightly)	go test -fuzz, cargo-fuzz / libFuzzer, atheris (Python), syzkaller (kernel)
Benchmark	perf regressions	per release / nightly	JMH (Java), criterion (Rust), go test -bench, pytest-benchmark, perf+ftrace

Plus a sixth, language-specific family: concurrency stress tests - jcstress (Java), loom (Rust), KCSAN (kernel), -race (Go). See memory models.

The lens, per path¶

Go - `go test`, `-race`, fuzz, examples¶

Month 1 - Runtime Foundations for the basics, Appendix A for the discipline.

Built into the toolchain. go test ./... runs everything. -race is the data-race detector. -fuzz (1.18+) is the built-in coverage-guided fuzzer. Example* functions double as documentation and tests - go doc shows them and go test runs them.

What's unique here: no test framework. The testing package gives you t.Errorf and parallelism (t.Parallel); subtests via t.Run("subname", ...). That's the API. Third-party libraries (testify, gocheck) exist; the community largely doesn't use them.

The trap

Forgetting t.Parallel() makes tests serialize unnecessarily on CI. Forgetting to capture the loop variable in subtests pre-Go-1.22 causes every parallel subtest to run with the same input.

Java - JUnit 5, AssertJ, Testcontainers, Mockito, JMH, jcstress, jqwik¶

Month 1 - Language & Toolchain, week 4. The deepest test ecosystem in any path.

JUnit 5 (Jupiter) - @Test, @ParameterizedTest, @Nested, @ExtendWith. Modern; supersedes JUnit 4 (still common in legacy code).
AssertJ - fluent assertions. assertThat(x).isEqualTo(y).hasSize(3).containsExactly(...). Way better failure messages than assertEquals.
Mockito - collaborator mocking. Mockito 5+, prefer constructor injection, never @InjectMocks on final fields.
Testcontainers - real dependencies in tests via Docker. Postgres, Kafka, Redis, anything with an image. Rebuilds your mental model: most "integration tests" should be Testcontainer tests.
jqwik - property-based testing. @Property-annotated methods receive generated inputs; jqwik shrinks failing cases automatically.
JMH - the Java microbenchmark harness. The only correct way to measure JVM perf. See observability.
jcstress - concurrency stress harness (Shipilëv's). For testing memory-model edge cases in your lock-free code.

What's unique here: every test family has a mature, opinionated, JDK-author-blessed tool. The cost is the learning curve - you genuinely need a week to get fluent.

The trap

Mocking what you don't own (Mockito's own wiki rule). Don't mock Connection/ResultSet/HttpClient/etc - those are external interfaces. Either use a real implementation in a Testcontainer or wrap them in your own interface and mock that.

Rust - `cargo test`, criterion, proptest, loom, MIRI, cargo-fuzz¶

Month 1 - Foundations, week on testing.

#[test] inside any module. cargo test runs them. Doctests in /// comments are real tests - they compile and run.

criterion - benchmark crate; statistical analysis of variance, regression detection, HTML reports.
proptest / quickcheck - property-based testing.
loom - model-checks concurrent code by exhaustively exploring thread interleavings. The Rust equivalent of jcstress.
MIRI - interpreter for Rust's mid-level IR; detects undefined behavior in unsafe code. cargo +nightly miri test.
cargo-fuzz - libFuzzer integration.

What's unique here: the correctness tooling - MIRI + loom + the borrow checker - gives Rust a stronger "your concurrent code is actually correct" story than any other path.

The trap

cargo test runs tests in parallel by default. Shared global state (env vars, working directory, file paths) creates flaky tests. Use serial_test or properly isolate state.

Python - pytest, hypothesis, tox/nox, atheris¶

Month 1 - Foundations, week on testing.

pytest - the de facto test runner. Fixtures, parametrize, plugins (pytest-asyncio, pytest-cov, pytest-mock, pytest-benchmark, pytest-xdist).
hypothesis - property-based testing; arguably the best in any language. @given(integers(), text()) generates inputs and shrinks failures.
tox / nox - matrix runners (test across Python versions, dependency versions).
atheris - Google's coverage-guided Python fuzzer.

What's unique here: hypothesis's stateful testing - model your system as a state machine, let hypothesis explore the state space. Comparable to model-checking but more accessible.

The trap

Slow tests because fixtures are session-scoped when they should be function-scoped (or vice versa). pytest --setup-show reveals the actual setup graph.

Linux kernel - kunit, kselftest, syzkaller, ktap, KCSAN¶

Month 1 - Kernel Foundations, test week.

kunit - in-kernel unit tests, run during kernel build or in a tiny qemu instance via kunit_tool.py.
kselftest - userspace-driven kernel feature tests; lives in tools/testing/selftests/.
syzkaller - Google's kernel fuzzer; the source of a huge fraction of recent kernel CVEs.
KCSAN - Kernel Concurrency Sanitizer; data-race detector.
KASAN / KMSAN - Address / Memory Sanitizer for the kernel.

What's unique here: testing a kernel means testing inside a kernel. kunit literally runs in-tree; kselftest boots a userspace and pokes the kernel from outside; syzkaller runs hundreds of VMs and aggregates crashes.

The trap

Reproducing a syzkaller crash needs the exact same kernel config - random fuzzed reproducers are not portable. Always capture .config alongside the bug report.

AI Systems - pytest + GPU-aware fixtures, eval harnesses¶

Month 3 - Framework Internals and Deep Dive 08 - Evaluation Systems (in tutoriaal/DEEP_DIVES/).

Standard pytest stack, with GPU detection (pytest.mark.skipif(not torch.cuda.is_available())) and numerical-tolerance assertions (torch.testing.assert_close(actual, expected, rtol=1e-3, atol=1e-4)). For LLM applications: an evaluation harness (separate from unit tests) - see Deep Dive 08.

The trap

Asserting on exact floating-point equality across GPU runs. Different reduction orders, different tensor cores, different precisions → never bit-exact. Always use assert_close with documented tolerances.

The contrasts that teach¶

The strongest cross-language reading list for testing:

Want to learn…	Read…	Then transfer to…
Property-based testing	hypothesis (Python)	jqwik (Java), proptest (Rust), gopter (Go)
Concurrency stress	jcstress (Java) + Shipilëv's writeups	loom (Rust), KCSAN (kernel), `-race` (Go)
Real-dependency integration	Testcontainers (Java)	testcontainers-python, testcontainers-go
Fuzzing	syzkaller (kernel) for the gold standard	cargo-fuzz, go fuzz, atheris
Benchmarking discipline	JMH (Java) + criterion (Rust)	pytest-benchmark, go test -bench
Data-race detection	`-race` (Go) + MIRI (Rust)	KCSAN (kernel), TSAN (C)

What to read first¶

You write Go services → Go testing in Appendix A. Then run -race on your existing suite - most non-trivial Go codebases have at least one race nobody noticed.
You write JVM services → Java Month 1 week 4 + Testcontainers' docs. Replace every "mocked database" test with a Testcontainer one and watch your bug counts converge.
You write Rust → cargo test + criterion + proptest. Add loom or MIRI if you have any unsafe or any lock-free code.
You write Python at scale → pytest first, then hypothesis. The hypothesis tutorial is the single highest-leverage 90 minutes of testing reading you can do.
You hack the kernel → kunit + kselftest, then read 100 syzkaller crash reports until the failure patterns are intuitive.

Testing¶

Why it matters¶

The five families¶

The lens, per path¶

Go - go test, -race, fuzz, examples¶

Java - JUnit 5, AssertJ, Testcontainers, Mockito, JMH, jcstress, jqwik¶

Rust - cargo test, criterion, proptest, loom, MIRI, cargo-fuzz¶

Python - pytest, hypothesis, tox/nox, atheris¶

Linux kernel - kunit, kselftest, syzkaller, ktap, KCSAN¶

AI Systems - pytest + GPU-aware fixtures, eval harnesses¶

The contrasts that teach¶

What to read first¶

Go - `go test`, `-race`, fuzz, examples¶

Rust - `cargo test`, criterion, proptest, loom, MIRI, cargo-fuzz¶