Saltar a contenido

Week 8 - JMH and Microbenchmarking

Conceptual Core

You will be wrong about Java performance until you measure with JMH. Hand-rolled benchmarks lie systematically, in five specific ways:

  • Dead-code elimination - JIT removes computation whose result is unused.
  • Constant folding - JIT replaces compute(42) with the precomputed constant.
  • On-stack replacement (OSR) - long-running loops get JIT-compiled mid-flight; the average across iterations is meaningless.
  • Warmup curves - interpreter → C1 → C2 takes thousands of iterations. Anything earlier is dominated by compilation cost.
  • GC noise - allocations trigger minor GCs at unpredictable points.

JMH generates per-benchmark wrapper code that defeats every one of these. Hand-rolled benchmarks are not "less accurate" - they are systematically wrong in a direction that flatters your code.

The trap

Running JMH from your IDE and trusting the output. JMH needs the annotation processor to run AND it needs to fork its own JVMs. IDE-launched JMH may skip both. Always run via mvn/gradle jmh or java -jar bench/target/benchmarks.jar.

Mechanical Detail

  • Project setup: separate Maven/Gradle module. Use the shade plugin to produce benchmarks.jar, the canonical runner.
  • Core annotations: @Benchmark, @State(Scope.{Benchmark,Thread}), @Setup/@TearDown, @Warmup(iterations=5, time=1), @Measurement(iterations=10, time=1), @Fork(value=3), @BenchmarkMode({Throughput, AverageTime}), @OutputTimeUnit(NANOSECONDS).
  • Blackhole.consume(result) defeats DCE. Or just return the value - JMH consumes it.
  • @CompilerControl(DONT_INLINE) forces a real call when measuring call-site overhead.
  • Profilers: -prof gc (allocation rate), -prof stack (sampling), -prof async (flame graphs), -prof perfasm (Linux - JIT assembly with perf annotations). perfasm is the killer when a result surprises you.
  • The hard rules: always 3+ forks (single-fork sees one GC schedule, one thermal state - never trust it). Error/score < 5% is trustworthy; > 20% is noise. taskset -c 3 to pin CPU; disable Turbo Boost for reproducibility.

Lab

JMH the Week 3 Stream vs for-loop comparison. Skeleton:

@State(Scope.Benchmark)
public static class Inputs {
    @Param({"100", "10000", "1000000"}) int n;
    List<Event> events;
    @Setup public void setup() { events = generate(n); }
}
@Benchmark public Map<Hour, Long> streamWay(Inputs in) { /* ... */ }
@Benchmark public Map<Hour, Long> forWay(Inputs in)    { /* ... */ }
Fork 3, warmup 5×1s, measurement 10×1s, -prof gc. There IS a crossover - find and explain it.

Idiomatic Drill

Read Aleksey Shipilëv's "JMH - Like a Boss" (slides + talk). Audit one of your own benchmarks against the warmup, Blackhole, and "don't return raw Object" checklist.

Production Hardening Slice

Add a bench/ module to hardening/ with one canonical @Benchmark example and a README.md showing how to run it. CI: a nightly job that publishes JSON results as an artifact. Do not gate on absolute numbers - microbenchmark CI noise is too high. Track trends; a 15% regression sustained over a week is real, a 30% one-run spike is noise.

Comments