Skip to content

12 - Performance-aware coding

What this session is

About an hour. Not premature optimization - awareness. The goal isn't to make every line fast; it's to recognize the patterns that quietly cost performance, avoid the obvious mistakes, and know that real optimization comes from measuring (chapter 13), not guessing. By the end you'll write code that doesn't have gratuitous performance problems, and you'll know the difference between "this matters" and "this is fine."

The first rule: measure, don't guess

Start here because it governs everything else:

"Premature optimization is the root of all evil." - Donald Knuth (the full quote: "we should forget about small efficiencies, say about 97% of the time").

Most code doesn't need to be fast. It needs to be correct and clear. The 3% that's genuinely performance-critical, you find by measuring (profiling, chapter 13) - not by guessing, because human intuition about what's slow is famously wrong. The cost of optimizing the wrong thing is doubled: you spend effort, and you make the code more complex and bug-prone for no benefit.

So this chapter is not "optimize everything." It's "know the patterns that have real cost, avoid the free mistakes, and measure before optimizing anything specific." Awareness, not obsession.

Autoboxing: the invisible allocation

The most common quiet performance cost. Java has primitives (int, long, double) and their wrapper objects (Integer, Long, Double). Autoboxing silently converts between them - and each box is a heap allocation (chapter 08).

// This loop allocates a million Integer objects. Silently.
Long sum = 0L;                      // wrapper type - the bug
for (long i = 0; i < 1_000_000; i++) {
    sum += i;                       // unbox sum, add, BOX the result - new Long every iteration
}

sum is a Long (wrapper). Each sum += i unboxes sum to a primitive, adds, and boxes the result back into a new Long object - a million allocations, a million pieces of garbage. Fix: use the primitive.

long sum = 0L;                      // primitive - zero allocations
for (long i = 0; i < 1_000_000; i++) {
    sum += i;                       // pure primitive arithmetic
}

This single change can be many times faster in a hot loop. The lesson: prefer primitives over wrappers in performance-sensitive code, especially in loops and large collections.

Where boxing hides:

  • Collections. List<Integer>, Map<Integer, ...> - generics can't hold primitives (chapter 04's erasure), so every element is boxed. A List<Integer> of a million numbers is a million Integer objects plus the list. For primitive-heavy data, use primitive arrays (int[]) or specialized libraries (Eclipse Collections, fastutil) instead of List<Integer>.
  • Streams. stream().map(i -> i + 1) on a Stream<Integer> boxes constantly. Use IntStream/mapToInt (chapter 07) for primitive streams.
  • Wrapper-typed accumulators, as above.

You don't eliminate all boxing - List<Integer> is fine for small or non-hot collections. You avoid it where volume × hotness makes it matter.

String handling: the concatenation trap

The classic. String concatenation in a loop with + is O(n²) because strings are immutable (chapter 03) - each + creates a new string copying all previous characters.

// O(n^2) - each += copies the entire string so far. Catastrophic for large n.
String result = "";
for (String word : words) {
    result += word + ", ";          // new String allocated and fully copied every iteration
}

For words of size n, this allocates and copies progressively larger strings - total work proportional to n². For 100,000 words it can take seconds. Fix with StringBuilder, which has a growable internal buffer (like ArrayList):

// O(n) - one growable buffer, appended to.
StringBuilder sb = new StringBuilder();
for (String word : words) {
    sb.append(word).append(", ");
}
String result = sb.toString();      // build the final string once

Or, more idiomatically for joining, String.join or a stream collector:

String result = String.join(", ", words);                          // cleanest for a collection
String result = words.stream().collect(Collectors.joining(", "));  // when you're already streaming

Important nuance: a single + or concatenation in straight-line code is fine - the compiler optimizes a + b + c into a StringBuilder for you. The problem is only concatenation in a loop, where the compiler can't see across iterations. Don't reflexively replace every + with StringBuilder; do replace loop-accumulated concatenation.

Pre-size your collections

From chapter 05, applied as performance: if you know roughly how many elements you'll add, tell the collection up front so it doesn't repeatedly resize (each resize copies the whole backing array).

// Resizes ~log(n) times as it grows, copying each time.
List<String> list = new ArrayList<>();

// One allocation, no resizing, if you know the size.
List<String> list = new ArrayList<>(expectedSize);
Map<String, Integer> map = new HashMap<>(expectedSize * 4 / 3 + 1);  // account for load factor

For an ArrayList that grows to a million elements, pre-sizing avoids ~20 resize-and-copy cycles. In a hot path, measurable; in cold code, irrelevant (but harmless and clear).

Choose the right data structure

The biggest performance wins usually come not from micro-optimizing code but from picking the right collection (chapter 05) and algorithm. A contains check:

// O(n) per check - scanning a list. In a loop, O(n*m). Death by a thousand cuts.
List<String> seen = new ArrayList<>();
if (seen.contains(item)) { ... }          // linear scan every time

// O(1) per check - a hash set.
Set<String> seen = new HashSet<>();
if (seen.contains(item)) { ... }          // constant time

If you contains-check a collection repeatedly, a HashSet instead of an ArrayList turns O(n) lookups into O(1). This kind of structural choice dwarfs any micro-optimization. The chapter 05 Big-O table is a performance tool: reaching for the wrong collection is the most common real performance bug, far more than "should I use ++i or i++" (which doesn't matter at all).

Allocation awareness (without allocation paranoia)

From chapter 08: short-lived objects are cheap (the generational GC is built for them), so don't contort code to avoid every allocation. But in genuinely hot paths, reducing allocation reduces GC pressure. Patterns:

  • Reuse buffers in hot loops instead of allocating per iteration (a byte[] you reuse, or sync.Pool-style reuse - though Java's equivalent is manual or library-provided).
  • Avoid creating objects you immediately discard in tight loops (a new Comparator or boxed value per iteration).
  • Use primitive streams and arrays for numeric bulk data (the boxing point above).

But measure first. The advice "reduce allocation" applies to the 3% that profiling flags, not everywhere. Allocating a few objects per request in a web handler is completely fine; allocating a million in a tight numeric kernel is the thing to fix.

Lazy initialization and computation

Don't compute what you might not need. Two patterns:

// Compute on first use, then cache. (Be careful with threads - chapter 10.)
private List<String> cached;
public List<String> getExpensiveList() {
    if (cached == null) {
        cached = computeExpensiveList();   // only runs once, on first call
    }
    return cached;
}

And short-circuit evaluation - put the cheap check first so the expensive one is skipped when possible:

// && short-circuits: if isCached() is false, expensiveCheck() never runs.
if (isCached(key) && expensiveValidation(key)) { ... }
//   ^ cheap, first        ^ expensive, only if needed

Order conditions cheap-to-expensive and likely-to-fail-first; the JVM and &&/|| short-circuit the rest.

Things that DON'T matter (stop worrying about them)

Awareness includes knowing what's a non-issue, so you don't waste effort or sacrifice clarity for imaginary gains:

  • i++ vs ++i in a loop - identical performance. Use whichever reads better.
  • final on local variables - no runtime performance effect (it's a readability/safety choice).
  • One-line method extraction - the JIT inlines small methods; don't inline by hand for "speed."
  • System.out.println micro-costs - irrelevant unless you're printing in a hot loop (where the cost is the I/O, not the call).
  • Manual loop unrolling, bit-twiddling tricks - the JIT does these; hand-doing them usually just obscures the code and sometimes defeats the JIT's own optimizations.
  • Caching list.size() in a loop variable - the JIT handles it; for (int i = 0; i < list.size(); i++) is fine.

The JIT compiler (Java Mastery covers it deeply) is very good at low-level optimization. Your job is the high-level choices it can't make for you: the right data structure, the right algorithm, avoiding gratuitous allocation and O(n²) patterns. Leave the instruction-level stuff to the JIT.

The performance mindset

Put it together into a discipline:

  1. Write clear, correct code first. Don't optimize while writing - it's premature and usually wrong.
  2. Avoid the free mistakes as you go: don't concatenate strings in loops, don't box in hot loops, don't use O(n) contains repeatedly, do pre-size known collections, do pick the right data structure. These cost nothing in clarity and avoid the common real problems.
  3. If it's too slow, measure (chapter 13). Profile to find the actual hot spot - it's almost never where you'd guess.
  4. Optimize the measured hot spot, and only that. Re-measure to confirm it helped.
  5. Stop when it's fast enough. "Fast enough" is a real, definable target (a latency budget, a throughput goal). Optimizing past it is wasted effort.

This is the difference between a junior who either ignores performance entirely or optimizes everything blindly, and an engineer who writes clean code, sidesteps the known traps, and surgically fixes what measurement proves slow.

Try it

  1. Measure boxing. Sum 10,000,000 longs into a Long (wrapper) accumulator and into a long (primitive). Time both (System.nanoTime). The primitive version is dramatically faster - you're watching a million allocations cost real time. Confirm with -verbose:gc that the wrapper version triggers far more GC.

  2. The O(n^2) string trap. Concatenate 100,000 short strings with += in a loop, then with StringBuilder, then with String.join. Time all three. The += version takes seconds; the others milliseconds. Feel the quadratic blowup.

  3. Wrong collection. Check membership 100,000 times against a 100,000-element ArrayList (O(n) each) vs a HashSet (O(1) each). Time both. The list version is thousands of times slower. This is the most common real performance bug in beginner code.

  4. Pre-sizing. Build a 10,000,000-element ArrayList with and without an initial capacity. Time both and count allocations. Pre-sizing avoids the resize-copy cycles.

  5. Prove a non-issue. Time i++ vs ++i in a billion-iteration loop. Identical. Time final int x vs int x. Identical. Convince yourself these don't matter so you stop thinking about them.

  6. Short-circuit ordering. Write if (cheapCheck() && expensiveCheck()) where cheapCheck usually returns false. Add prints to confirm expensiveCheck rarely runs. Swap the order and watch expensiveCheck run every time. Same logic, different cost.

What you might wonder

"How do I know if something is in the hot 3%?" You profile (chapter 13). You cannot reliably know by reading. Studies repeatedly show developers guess the hot spot wrong most of the time - the bottleneck is in a place you didn't suspect. That's the whole reason "measure, don't guess" is rule #1.

"Isn't avoiding boxing/string-traps itself premature optimization?" No - these are not optimizations, they're avoiding pessimizations. Using long instead of Long, StringBuilder in a loop, or HashSet for repeated lookups costs nothing in clarity (often it's clearer) and avoids a known-bad pattern. Premature optimization is sacrificing clarity for speculative gain. Avoiding O(n²) is just not writing O(n²). Different thing.

"What about the JIT - doesn't it fix everything?" The JIT (just-in-time compiler) optimizes low-level code brilliantly: inlining, dead-code elimination, loop optimizations, escape analysis (it can even stack-allocate objects that don't escape). It does not fix your algorithm or data-structure choices - it can't turn an O(n²) string concatenation into O(n), or a list contains into a hash lookup. Those high-level choices are yours. Trust the JIT for instruction-level; own the structural decisions.

"Should I use StringBuilder everywhere then?" No. For loop-accumulated concatenation, yes. For straight-line a + b + c, no - the compiler already uses StringBuilder for you, and writing it manually just adds noise. The rule is specifically about concatenation inside loops.

"Are micro-benchmarks like my nanoTime timing reliable?" Roughly, for big differences (the boxing and string examples are so dramatic that crude timing shows them). But for subtle comparisons, naive timing lies - JIT warmup, dead-code elimination, and GC noise distort results (this is exactly the JMH lesson). For anything close, use a real benchmark harness. Chapter 13 covers measuring properly.

"What's the single highest-impact performance habit?" Picking the right data structure and algorithm (chapter 05). A wrong collection or an accidental O(n²) loop dwarfs every micro-optimization combined. Get the Big-O right and most code is fast enough without any further effort.

Done

  • You know the prime directive: measure, don't guess; most code doesn't need optimizing.
  • You can spot and fix autoboxing in hot loops, collections, and streams.
  • You avoid the O(n²) string-concatenation-in-a-loop trap (and know single + is fine).
  • You pre-size known collections and pick the right data structure for the access pattern.
  • You're allocation-aware without being allocation-paranoid.
  • You know what doesn't matter (i++ vs ++i, etc.) so you don't waste effort.
  • You have the five-step performance mindset: clear first, avoid free mistakes, measure, fix the hot spot, stop at good enough.

Next: profiling basics - how to actually measure, so the "find the hot spot" step is something you can do.

Next: Profiling basics →

Comments