05 - Collections deep¶
What this session is¶
About ninety minutes. In From Scratch you used List, Set, and Map. This session is about choosing the right one - because Java gives you a dozen implementations, each with different performance characteristics, and reaching for the wrong one is one of the most common quiet performance bugs. By the end you'll know which collection to use for any situation and why, with the Big-O costs in your head.
The map of the collections framework¶
Everything descends from a few interfaces. Hold this picture:
Iterable
└─ Collection
├─ List - ordered, indexed, allows duplicates
├─ Set - no duplicates
└─ Queue/Deque - ordered for adding/removing at ends
Map (separate hierarchy - not a Collection)
└─ key -> value pairs, unique keys
You program to the interface (List, Set, Map) and choose the implementation (ArrayList, HashSet, HashMap) based on performance needs. This is "program to interfaces" (chapter 01) in daily practice:
List<String> names = new ArrayList<>(); // declare interface, pick implementation
Map<String, Integer> counts = new HashMap<>();
If you later need a different implementation, you change one word (new ArrayList<>() to new LinkedList<>()) and nothing else, because all callers depend on List.
Lists: ArrayList vs LinkedList¶
Both are Lists. They perform completely differently.
ArrayList is a growable array. Elements live in a contiguous block of memory.
| Operation | Cost | Why |
|---|---|---|
get(i) / set(i) | O(1) | direct array index |
add(item) (at end) | O(1) amortized | usually just writes to the next slot; occasionally resizes |
add(i, item) (middle) | O(n) | must shift everything after i |
remove(i) (middle) | O(n) | must shift everything after i |
contains(item) | O(n) | linear scan |
LinkedList is a doubly-linked list. Each element is a separate node with pointers to its neighbors.
| Operation | Cost | Why |
|---|---|---|
get(i) | O(n) | must walk from an end to index i |
add/remove at ends | O(1) | just relink the end node |
add/remove in middle (with iterator) | O(1) | relink neighbors |
contains(item) | O(n) | linear scan |
The practical verdict: use ArrayList by default. Always. Its O(1) random access and cache-friendly contiguous memory beat LinkedList for almost everything real code does, even insertion-heavy work, because CPU cache locality matters more than the Big-O suggests. LinkedList's only real edge is constant-time add/remove at both ends - and for that, ArrayDeque (below) is faster anyway.
In practice: reach for LinkedList almost never. The fact that beginner tutorials present them as equal alternatives is misleading. ArrayList is the answer ~95% of the time.
Sets: HashSet vs LinkedHashSet vs TreeSet¶
A Set holds unique elements. Three implementations, three tradeoffs:
HashSet - backed by a HashMap. O(1) add/contains/remove. No order - iteration order is unspecified and can change. Use when you just need uniqueness and fast membership tests.
Set<String> seen = new HashSet<>();
seen.add("a"); seen.add("b"); seen.add("a");
System.out.println(seen.size()); // 2 - duplicate ignored
System.out.println(seen.contains("a")); // true, O(1)
LinkedHashSet - like HashSet but remembers insertion order. Slightly more memory (it maintains a linked list through the entries). Use when you need uniqueness and predictable iteration order.
Set<String> ordered = new LinkedHashSet<>();
ordered.add("c"); ordered.add("a"); ordered.add("b");
// iterates c, a, b - insertion order, every time
TreeSet - keeps elements sorted (by natural order or a Comparator). O(log n) add/contains/remove - slower than hash sets, but you get sorted iteration and range queries (first(), last(), headSet(), subSet()).
TreeSet<Integer> sorted = new TreeSet<>(List.of(5, 1, 3, 2, 4));
System.out.println(sorted); // [1, 2, 3, 4, 5] - always sorted
System.out.println(sorted.first()); // 1
System.out.println(sorted.headSet(3)); // [1, 2] - everything below 3
The decision: HashSet for plain uniqueness (fastest). LinkedHashSet when iteration order must match insertion. TreeSet when you need sorted order or range queries (and can pay O(log n)).
Critical reminder from chapter 03: HashSet and LinkedHashSet depend on correct equals/hashCode. TreeSet depends on correct compareTo/Comparator. A broken contract silently breaks the set.
Maps: the workhorse family¶
Map is the most-used collection after List. Same three flavors as sets, for the same reasons (sets are literally implemented on maps):
HashMap - O(1) get/put/remove, no order. The default map. Use it unless you need order.
LinkedHashMap - insertion-order (or access-order) iteration. The access-order mode makes it a ready-made LRU cache:
// Access-order LinkedHashMap that evicts the least-recently-used entry past capacity.
var lru = new LinkedHashMap<String, String>(16, 0.75f, true) {
protected boolean removeEldestEntry(Map.Entry<String, String> eldest) {
return size() > 100; // keep at most 100 entries
}
};
TreeMap - sorted by key, O(log n). Gives you firstKey(), lastKey(), floorKey(), ceilingKey(), subMap() - range queries on keys.
TreeMap<Integer, String> byScore = new TreeMap<>();
byScore.put(90, "A"); byScore.put(70, "C"); byScore.put(80, "B");
System.out.println(byScore.firstKey()); // 70
System.out.println(byScore.ceilingKey(75)); // 80 - smallest key >= 75
The decision mirrors sets: HashMap by default; LinkedHashMap for ordered iteration or LRU; TreeMap for sorted keys / range queries.
The modern Map methods you should be using¶
Pre-Java-8 map code is full of get-check-put patterns. Modern methods replace them:
Map<String, Integer> counts = new HashMap<>();
// Counting - the classic. Old way: get, null-check, put.
for (String word : words) {
counts.merge(word, 1, Integer::sum); // add 1, or start at 1 if absent
}
// Grouping into lists without the null dance:
Map<String, List<String>> byFirstLetter = new HashMap<>();
for (String name : names) {
String key = name.substring(0, 1);
byFirstLetter.computeIfAbsent(key, k -> new ArrayList<>()).add(name);
}
// Default for missing keys:
int n = counts.getOrDefault("missing", 0); // 0 instead of null
// Compute only if present:
counts.computeIfPresent("apple", (k, v) -> v * 2);
merge, computeIfAbsent, getOrDefault, putIfAbsent, computeIfPresent - learn these five. They turn five-line patterns into one line and avoid a whole class of null bugs.
Queues and Deques: ArrayDeque and PriorityQueue¶
ArrayDeque - a double-ended queue backed by a resizable array. Add/remove at both ends in O(1). This is your stack and your queue - faster than the legacy Stack (chapter 01's cautionary tale) and faster than LinkedList for the same job.
Deque<Integer> stack = new ArrayDeque<>();
stack.push(1); stack.push(2);
stack.pop(); // 2 - LIFO, this is a stack
Deque<Integer> queue = new ArrayDeque<>();
queue.offer(1); queue.offer(2);
queue.poll(); // 1 - FIFO, this is a queue
Use ArrayDeque whenever you need a stack or a queue. Never use the legacy Stack class.
PriorityQueue - a heap. poll() always returns the smallest element (by natural order or Comparator), in O(log n). Use for "always process the most urgent next":
PriorityQueue<Integer> pq = new PriorityQueue<>();
pq.addAll(List.of(5, 1, 3, 2));
pq.poll(); // 1 - smallest first
pq.poll(); // 2
// Max-heap with a reversed comparator:
var maxHeap = new PriorityQueue<Integer>(Comparator.reverseOrder());
Iteration and the fail-fast trap¶
A bug everyone hits once: modifying a collection while iterating it.
List<String> list = new ArrayList<>(List.of("a", "b", "c"));
for (String s : list) {
if (s.equals("b")) list.remove(s); // ConcurrentModificationException!
}
The for-each loop uses an iterator, and most collections are fail-fast: they detect structural modification during iteration and throw ConcurrentModificationException rather than silently corrupting. (The name is misleading - it happens single-threaded too.) Three correct ways to remove during iteration:
// 1. Iterator.remove() - the explicit way
Iterator<String> it = list.iterator();
while (it.hasNext()) {
if (it.next().equals("b")) it.remove();
}
// 2. removeIf - the modern one-liner
list.removeIf(s -> s.equals("b"));
// 3. Collect what to remove, remove after the loop
removeIf is almost always the right answer. Reach for it.
Immutable and unmodifiable collections¶
Three ways to get a read-only collection, with different guarantees:
// 1. List.of / Set.of / Map.of - truly immutable, can't be changed by anyone (Java 9+)
List<String> a = List.of("x", "y"); // a.add(...) throws UnsupportedOperationException
// 2. List.copyOf - immutable snapshot of an existing collection
List<String> b = List.copyOf(existing); // independent, immutable
// 3. Collections.unmodifiableList - a read-only VIEW (the backing list can still change!)
List<String> c = Collections.unmodifiableList(backing);
// you can't change c directly, but if someone changes `backing`, c reflects it
The trap is #3: an unmodifiable view doesn't copy - it wraps. If the underlying collection changes, the view changes. For real immutability (chapter 03), use List.of or List.copyOf, which fully decouple. Return List.copyOf(...) from getters to protect your internal collections (the defensive-copy lesson from chapter 03).
Choosing: a decision table¶
| When you need a... | Reach for | Because |
|---|---|---|
| ordered list, indexed access | ArrayList | O(1) get, cache-friendly, the default |
| stack or queue | ArrayDeque | O(1) both ends, beats Stack and LinkedList |
| unique elements, fast lookup | HashSet | O(1), no order needed |
| unique elements, insertion order | LinkedHashSet | predictable iteration |
| unique elements, sorted | TreeSet | O(log n), sorted + range queries |
| key-value, fast lookup | HashMap | O(1), the default map |
| key-value, insertion order or LRU | LinkedHashMap | ordered iteration / eviction hook |
| key-value, sorted by key | TreeMap | O(log n), range queries on keys |
| "most urgent first" | PriorityQueue | O(log n) heap |
Print this table mentally. "Which collection?" should take you two seconds.
Try it¶
-
Measure ArrayList vs LinkedList. Build both with 1,000,000 integers. Time
get(500_000)a million times on each. TheArrayListwill be dramatically faster (O(1) vs O(n)). Then time adding a million elements at the end of each.ArrayListstill wins (cache locality). This kills the "LinkedList is for inserts" myth. -
Word frequency three ways. Count word frequencies in a paragraph using (a) the old get-null-check-put pattern, (b)
merge, (c) a streamgroupingBy+counting(you'll meet streams next chapter). Compare line counts and readability. -
Pick the right set. You need to dedupe a list of names but preserve the order they first appeared. Which set? Implement it. Then change the requirement to "deduped and alphabetical" - which set now?
-
Hit the fail-fast. Reproduce the
ConcurrentModificationExceptionwith a for-each remove. Fix it three ways:Iterator.remove,removeIf, and collect-then-remove. Note which is cleanest. -
Unmodifiable view trap. Create a
backingArrayList, wrap it withCollections.unmodifiableList. Confirm you can't add to the view. Then add tobackingdirectly and print the view - watch it change. Now do the same withList.copyOfand confirm the copy is truly frozen. -
TreeMap range query. Put exam scores (key) to names (value) in a
TreeMap. UsesubMap,ceilingKey, andheadMapto answer "who scored between 70 and 90?" and "what's the lowest passing score >= 60?".
What you might wonder¶
"Is LinkedList ever the right choice?" Rarely. If you need a List and frequent O(1) add/remove at both ends and you specifically need List semantics (indexed-ish access via iterator), maybe. But ArrayDeque covers the both-ends case better, and ArrayList covers everything else. In years of Java you'll reach for LinkedList a handful of times, usually then reconsider.
"HashMap initial capacity - should I set it?" If you know roughly how many entries you'll add, yes: new HashMap<>(expectedSize / 0.75 + 1) avoids rehashing as it grows (same idea as pre-sizing an ArrayList). For small maps it doesn't matter. For large ones built in a hot path, it's a real win - same lesson as the Go pre-sizing chapter if you've seen it.
"What's the load factor (0.75)?" A HashMap resizes when it's 75% full, to keep collisions low. 0.75 is the default sweet spot between memory and speed. You almost never change it. It's the second constructor arg you saw in the LRU example.
"Are these collections thread-safe?" No - ArrayList, HashMap, etc. are not safe for concurrent modification. That's a chapter 10 topic (ConcurrentHashMap, CopyOnWriteArrayList, and friends). For now: a plain HashMap shared across threads without synchronization is a bug, even if it seems to work in testing.
"Vector and Hashtable - what about those?" Legacy, from Java 1.0. They're synchronized (slow) versions of ArrayList/HashMap. Don't use them in new code. If you need thread safety, use the java.util.concurrent collections (chapter 10), not these. They survive only for backward compatibility - the same "sediment" the Java Mastery path's legacy appendix covers.
"Should I always use List.of for constants?" Yes for fixed, never-changing collections - it's immutable, compact, and clear. Just remember List.of rejects nulls and is immutable, so don't use it where you need to mutate or store nulls.
Done¶
- You know the framework map:
List,Set,Queue/Deque,Map. - You know
ArrayListis the default list and whyLinkedListrarely wins. - You can choose among
HashSet/LinkedHashSet/TreeSetand the matchingMaptrio by their order and performance tradeoffs. - You know
ArrayDequefor stacks/queues andPriorityQueuefor "most urgent first." - You know the modern map methods (
merge,computeIfAbsent, ...) and the fail-fast iteration trap. - You know unmodifiable views vs truly immutable collections.
- You have the decision table in your head.
Next: exceptions done right - the strategy for checked vs unchecked, custom hierarchies, and clean error handling.