Saltar a contenido

05 - Collections deep

What this session is

About ninety minutes. In From Scratch you used List, Set, and Map. This session is about choosing the right one - because Java gives you a dozen implementations, each with different performance characteristics, and reaching for the wrong one is one of the most common quiet performance bugs. By the end you'll know which collection to use for any situation and why, with the Big-O costs in your head.

The map of the collections framework

Everything descends from a few interfaces. Hold this picture:

Iterable
  └─ Collection
       ├─ List      - ordered, indexed, allows duplicates
       ├─ Set       - no duplicates
       └─ Queue/Deque - ordered for adding/removing at ends

Map (separate hierarchy - not a Collection)
  └─ key -> value pairs, unique keys

You program to the interface (List, Set, Map) and choose the implementation (ArrayList, HashSet, HashMap) based on performance needs. This is "program to interfaces" (chapter 01) in daily practice:

List<String> names = new ArrayList<>();   // declare interface, pick implementation
Map<String, Integer> counts = new HashMap<>();

If you later need a different implementation, you change one word (new ArrayList<>() to new LinkedList<>()) and nothing else, because all callers depend on List.

Lists: ArrayList vs LinkedList

Both are Lists. They perform completely differently.

ArrayList is a growable array. Elements live in a contiguous block of memory.

Operation Cost Why
get(i) / set(i) O(1) direct array index
add(item) (at end) O(1) amortized usually just writes to the next slot; occasionally resizes
add(i, item) (middle) O(n) must shift everything after i
remove(i) (middle) O(n) must shift everything after i
contains(item) O(n) linear scan

LinkedList is a doubly-linked list. Each element is a separate node with pointers to its neighbors.

Operation Cost Why
get(i) O(n) must walk from an end to index i
add/remove at ends O(1) just relink the end node
add/remove in middle (with iterator) O(1) relink neighbors
contains(item) O(n) linear scan

The practical verdict: use ArrayList by default. Always. Its O(1) random access and cache-friendly contiguous memory beat LinkedList for almost everything real code does, even insertion-heavy work, because CPU cache locality matters more than the Big-O suggests. LinkedList's only real edge is constant-time add/remove at both ends - and for that, ArrayDeque (below) is faster anyway.

In practice: reach for LinkedList almost never. The fact that beginner tutorials present them as equal alternatives is misleading. ArrayList is the answer ~95% of the time.

Sets: HashSet vs LinkedHashSet vs TreeSet

A Set holds unique elements. Three implementations, three tradeoffs:

HashSet - backed by a HashMap. O(1) add/contains/remove. No order - iteration order is unspecified and can change. Use when you just need uniqueness and fast membership tests.

Set<String> seen = new HashSet<>();
seen.add("a"); seen.add("b"); seen.add("a");
System.out.println(seen.size());          // 2 - duplicate ignored
System.out.println(seen.contains("a"));   // true, O(1)

LinkedHashSet - like HashSet but remembers insertion order. Slightly more memory (it maintains a linked list through the entries). Use when you need uniqueness and predictable iteration order.

Set<String> ordered = new LinkedHashSet<>();
ordered.add("c"); ordered.add("a"); ordered.add("b");
// iterates c, a, b - insertion order, every time

TreeSet - keeps elements sorted (by natural order or a Comparator). O(log n) add/contains/remove - slower than hash sets, but you get sorted iteration and range queries (first(), last(), headSet(), subSet()).

TreeSet<Integer> sorted = new TreeSet<>(List.of(5, 1, 3, 2, 4));
System.out.println(sorted);             // [1, 2, 3, 4, 5] - always sorted
System.out.println(sorted.first());     // 1
System.out.println(sorted.headSet(3));  // [1, 2] - everything below 3

The decision: HashSet for plain uniqueness (fastest). LinkedHashSet when iteration order must match insertion. TreeSet when you need sorted order or range queries (and can pay O(log n)).

Critical reminder from chapter 03: HashSet and LinkedHashSet depend on correct equals/hashCode. TreeSet depends on correct compareTo/Comparator. A broken contract silently breaks the set.

Maps: the workhorse family

Map is the most-used collection after List. Same three flavors as sets, for the same reasons (sets are literally implemented on maps):

HashMap - O(1) get/put/remove, no order. The default map. Use it unless you need order.

LinkedHashMap - insertion-order (or access-order) iteration. The access-order mode makes it a ready-made LRU cache:

// Access-order LinkedHashMap that evicts the least-recently-used entry past capacity.
var lru = new LinkedHashMap<String, String>(16, 0.75f, true) {
    protected boolean removeEldestEntry(Map.Entry<String, String> eldest) {
        return size() > 100;     // keep at most 100 entries
    }
};

TreeMap - sorted by key, O(log n). Gives you firstKey(), lastKey(), floorKey(), ceilingKey(), subMap() - range queries on keys.

TreeMap<Integer, String> byScore = new TreeMap<>();
byScore.put(90, "A"); byScore.put(70, "C"); byScore.put(80, "B");
System.out.println(byScore.firstKey());        // 70
System.out.println(byScore.ceilingKey(75));    // 80 - smallest key >= 75

The decision mirrors sets: HashMap by default; LinkedHashMap for ordered iteration or LRU; TreeMap for sorted keys / range queries.

The modern Map methods you should be using

Pre-Java-8 map code is full of get-check-put patterns. Modern methods replace them:

Map<String, Integer> counts = new HashMap<>();

// Counting - the classic. Old way: get, null-check, put.
for (String word : words) {
    counts.merge(word, 1, Integer::sum);   // add 1, or start at 1 if absent
}

// Grouping into lists without the null dance:
Map<String, List<String>> byFirstLetter = new HashMap<>();
for (String name : names) {
    String key = name.substring(0, 1);
    byFirstLetter.computeIfAbsent(key, k -> new ArrayList<>()).add(name);
}

// Default for missing keys:
int n = counts.getOrDefault("missing", 0);   // 0 instead of null

// Compute only if present:
counts.computeIfPresent("apple", (k, v) -> v * 2);

merge, computeIfAbsent, getOrDefault, putIfAbsent, computeIfPresent - learn these five. They turn five-line patterns into one line and avoid a whole class of null bugs.

Queues and Deques: ArrayDeque and PriorityQueue

ArrayDeque - a double-ended queue backed by a resizable array. Add/remove at both ends in O(1). This is your stack and your queue - faster than the legacy Stack (chapter 01's cautionary tale) and faster than LinkedList for the same job.

Deque<Integer> stack = new ArrayDeque<>();
stack.push(1); stack.push(2);
stack.pop();                 // 2 - LIFO, this is a stack

Deque<Integer> queue = new ArrayDeque<>();
queue.offer(1); queue.offer(2);
queue.poll();                // 1 - FIFO, this is a queue

Use ArrayDeque whenever you need a stack or a queue. Never use the legacy Stack class.

PriorityQueue - a heap. poll() always returns the smallest element (by natural order or Comparator), in O(log n). Use for "always process the most urgent next":

PriorityQueue<Integer> pq = new PriorityQueue<>();
pq.addAll(List.of(5, 1, 3, 2));
pq.poll();   // 1 - smallest first
pq.poll();   // 2

// Max-heap with a reversed comparator:
var maxHeap = new PriorityQueue<Integer>(Comparator.reverseOrder());

Iteration and the fail-fast trap

A bug everyone hits once: modifying a collection while iterating it.

List<String> list = new ArrayList<>(List.of("a", "b", "c"));
for (String s : list) {
    if (s.equals("b")) list.remove(s);   // ConcurrentModificationException!
}

The for-each loop uses an iterator, and most collections are fail-fast: they detect structural modification during iteration and throw ConcurrentModificationException rather than silently corrupting. (The name is misleading - it happens single-threaded too.) Three correct ways to remove during iteration:

// 1. Iterator.remove() - the explicit way
Iterator<String> it = list.iterator();
while (it.hasNext()) {
    if (it.next().equals("b")) it.remove();
}

// 2. removeIf - the modern one-liner
list.removeIf(s -> s.equals("b"));

// 3. Collect what to remove, remove after the loop

removeIf is almost always the right answer. Reach for it.

Immutable and unmodifiable collections

Three ways to get a read-only collection, with different guarantees:

// 1. List.of / Set.of / Map.of - truly immutable, can't be changed by anyone (Java 9+)
List<String> a = List.of("x", "y");        // a.add(...) throws UnsupportedOperationException

// 2. List.copyOf - immutable snapshot of an existing collection
List<String> b = List.copyOf(existing);    // independent, immutable

// 3. Collections.unmodifiableList - a read-only VIEW (the backing list can still change!)
List<String> c = Collections.unmodifiableList(backing);
// you can't change c directly, but if someone changes `backing`, c reflects it

The trap is #3: an unmodifiable view doesn't copy - it wraps. If the underlying collection changes, the view changes. For real immutability (chapter 03), use List.of or List.copyOf, which fully decouple. Return List.copyOf(...) from getters to protect your internal collections (the defensive-copy lesson from chapter 03).

Choosing: a decision table

When you need a... Reach for Because
ordered list, indexed access ArrayList O(1) get, cache-friendly, the default
stack or queue ArrayDeque O(1) both ends, beats Stack and LinkedList
unique elements, fast lookup HashSet O(1), no order needed
unique elements, insertion order LinkedHashSet predictable iteration
unique elements, sorted TreeSet O(log n), sorted + range queries
key-value, fast lookup HashMap O(1), the default map
key-value, insertion order or LRU LinkedHashMap ordered iteration / eviction hook
key-value, sorted by key TreeMap O(log n), range queries on keys
"most urgent first" PriorityQueue O(log n) heap

Print this table mentally. "Which collection?" should take you two seconds.

Try it

  1. Measure ArrayList vs LinkedList. Build both with 1,000,000 integers. Time get(500_000) a million times on each. The ArrayList will be dramatically faster (O(1) vs O(n)). Then time adding a million elements at the end of each. ArrayList still wins (cache locality). This kills the "LinkedList is for inserts" myth.

  2. Word frequency three ways. Count word frequencies in a paragraph using (a) the old get-null-check-put pattern, (b) merge, (c) a stream groupingBy + counting (you'll meet streams next chapter). Compare line counts and readability.

  3. Pick the right set. You need to dedupe a list of names but preserve the order they first appeared. Which set? Implement it. Then change the requirement to "deduped and alphabetical" - which set now?

  4. Hit the fail-fast. Reproduce the ConcurrentModificationException with a for-each remove. Fix it three ways: Iterator.remove, removeIf, and collect-then-remove. Note which is cleanest.

  5. Unmodifiable view trap. Create a backing ArrayList, wrap it with Collections.unmodifiableList. Confirm you can't add to the view. Then add to backing directly and print the view - watch it change. Now do the same with List.copyOf and confirm the copy is truly frozen.

  6. TreeMap range query. Put exam scores (key) to names (value) in a TreeMap. Use subMap, ceilingKey, and headMap to answer "who scored between 70 and 90?" and "what's the lowest passing score >= 60?".

What you might wonder

"Is LinkedList ever the right choice?" Rarely. If you need a List and frequent O(1) add/remove at both ends and you specifically need List semantics (indexed-ish access via iterator), maybe. But ArrayDeque covers the both-ends case better, and ArrayList covers everything else. In years of Java you'll reach for LinkedList a handful of times, usually then reconsider.

"HashMap initial capacity - should I set it?" If you know roughly how many entries you'll add, yes: new HashMap<>(expectedSize / 0.75 + 1) avoids rehashing as it grows (same idea as pre-sizing an ArrayList). For small maps it doesn't matter. For large ones built in a hot path, it's a real win - same lesson as the Go pre-sizing chapter if you've seen it.

"What's the load factor (0.75)?" A HashMap resizes when it's 75% full, to keep collisions low. 0.75 is the default sweet spot between memory and speed. You almost never change it. It's the second constructor arg you saw in the LRU example.

"Are these collections thread-safe?" No - ArrayList, HashMap, etc. are not safe for concurrent modification. That's a chapter 10 topic (ConcurrentHashMap, CopyOnWriteArrayList, and friends). For now: a plain HashMap shared across threads without synchronization is a bug, even if it seems to work in testing.

"Vector and Hashtable - what about those?" Legacy, from Java 1.0. They're synchronized (slow) versions of ArrayList/HashMap. Don't use them in new code. If you need thread safety, use the java.util.concurrent collections (chapter 10), not these. They survive only for backward compatibility - the same "sediment" the Java Mastery path's legacy appendix covers.

"Should I always use List.of for constants?" Yes for fixed, never-changing collections - it's immutable, compact, and clear. Just remember List.of rejects nulls and is immutable, so don't use it where you need to mutate or store nulls.

Done

  • You know the framework map: List, Set, Queue/Deque, Map.
  • You know ArrayList is the default list and why LinkedList rarely wins.
  • You can choose among HashSet/LinkedHashSet/TreeSet and the matching Map trio by their order and performance tradeoffs.
  • You know ArrayDeque for stacks/queues and PriorityQueue for "most urgent first."
  • You know the modern map methods (merge, computeIfAbsent, ...) and the fail-fast iteration trap.
  • You know unmodifiable views vs truly immutable collections.
  • You have the decision table in your head.

Next: exceptions done right - the strategy for checked vs unchecked, custom hierarchies, and clean error handling.

Next: Exceptions done right →

Comments