Skip to content

Week 18 - Data Structures Beyond list/dict

18.1 The Menu

Need Structure
Ordered, indexable, mutable list
Immutable record tuple, NamedTuple, frozen dataclass
Membership / dedup set / frozenset
Key-value dict
Counts collections.Counter
Default-on-miss collections.defaultdict
FIFO / deque collections.deque
Priority queue heapq (no class - operates on list)
Sorted container sortedcontainers.SortedList/SortedDict (third-party but de facto standard)
Disjoint set hand-rolled (~20 lines) or networkx.utils.UnionFind
LRU cache functools.lru_cache for functions; cachetools.LRUCache for objects
Bloom filter pybloom-live or roll your own (Appendix B)
Trie pygtrie or roll your own
Interval tree intervaltree
Graph networkx (development), igraph/graph-tool (scale)
DataFrame pandas (legacy), polars (modern, lazy, multi-threaded)
Tensor numpy (CPU), torch.Tensor (CPU/GPU/autograd)
Sparse vector scipy.sparse, torch.sparse
Vector index faiss, hnswlib, usearch

18.2 Lab - "Right Tool for Right Workload"

  1. A leaderboard with frequent insert + top-K query: implement with list (naive), heapq (better), SortedList (best). Bench at 10k/100k/1M elements.
  2. A rolling-window deduplicator: set (memory-unbounded), Bloom filter (memory-bounded, false positives), cachetools.TTLCache. Pick one with justification.
  3. A nearest-neighbor lookup over 1M 768-dim vectors: brute-force NumPy, hnswlib, faiss. Note recall/latency trade-offs.

18.3 Production Hardening Slice

  • Add a "data-structure decision log" to docs/. Every non-trivial collection choice gets one paragraph: what was rejected and why.

Comments