Appendix B - Build-From-Scratch Data Structures and Patterns¶
A working Python engineer should have implemented each of the following at least once, with pyright-clean types, pytest + hypothesis tests, and a pytest-benchmark micro-bench. This appendix sketches the minimal-viable design.
B.1 LRU Cache (with TTL)¶
When: function memoization, decoded-payload caches, embedding caches.
Design:
- OrderedDict from collections. move_to_end on hit; popitem(last=False) on evict.
- Optional TTL via (value, expiry_monotonic) tuples; lazy expiration on access.
- Concurrent variant: a threading.Lock (or asyncio.Lock for async) around mutations.
Lab: compare to functools.lru_cache and cachetools.TTLCache. Bench miss/hit costs.
B.2 Trie (Prefix Tree)¶
When: autocomplete, IP routing tables, tokenizer prefix lookup, dictionary spell-check.
Design:
- Node = dict[str, Node] + is_end: bool (+ optional payload).
- Insert/lookup O(len(key)); prefix iteration O(prefix + matches).
Lab: implement add, contains, iter_prefix. Bench against set for membership and against linear scan for prefix queries.
B.3 Bloom Filter¶
When: dedup at scale, "definitely-not-seen" checks before expensive lookups.
Design:
- bitarray of size m; k hash functions derived from one (mmh3 or hashlib) via double hashing.
- Sized for target false-positive rate p over expected n: m = -n*ln(p)/(ln(2)^2), k = m/n * ln(2).
Lab: empirically verify FP rate matches predicted. Compare memory to set for 10M items.
B.4 SPSC Ring Buffer (asyncio)¶
When: backpressure between a producer task and a consumer task, fixed-memory pipelines.
Design:
- list[T | None] of capacity N (power of two).
- head/tail integers; full when tail - head == N; empty when equal.
- asyncio.Event for "not full" and "not empty"; set() on transition.
Lab: compare to asyncio.Queue(maxsize=N). The stdlib version is fine; build this once to internalize the contract.
B.5 Bounded Concurrent Map¶
When: caches with tight memory budgets, multi-writer state.
Design:
- N shards of (threading.RLock, dict). Hash key, mod N, lock the shard.
- Eviction: per-shard LRU list.
Lab: compare to a single-lock dict and to a CAS-y "lock-free" attempt. The simple sharded design wins almost always.
B.6 Vector Index (Brute-Force, Then HNSW Wrapper)¶
When: nearest-neighbor over embeddings.
Design - brute force:
- np.ndarray of shape (N, D), L2-normalized.
- Query: (N, D) @ (D,) dot product, np.argpartition for top-K.
Design - HNSW wrapper:
- Wrap hnswlib.Index with a typed Pythonic API.
- Persist with index.save_index(path).
Lab: Build both. Verify recall@10 vs. brute-force ground truth. Plot recall/QPS trade-off.
B.7 Token Bucket Rate Limiter (asyncio)¶
When: client-side rate limiting against an LLM API.
Design:
- tokens: float, last_refill: float (monotonic).
- On acquire(n): refill (now - last_refill) * rate, cap at capacity. If tokens >= n, deduct and return; else await asyncio.sleep for the deficit.
Lab: ensure bursts don't exceed capacity; ensure long-run average matches rate. Compare to aiolimiter.
B.8 Circuit Breaker¶
When: protecting downstream services from cascading failure.
Design: - States: CLOSED (normal), OPEN (fail fast), HALF_OPEN (probe). - Counters: consecutive failures threshold, reset timeout. - On call: if OPEN, raise immediately; if HALF_OPEN, allow one probe.
Lab: integrate with the LLMClient from week 21. Verify behavior under simulated 503 storms.
B.9 Async Worker Pool with Backpressure¶
When: ingestion pipelines, batch-embedding workloads.
Design:
- Bounded asyncio.Queue, N worker tasks consuming, producer awaits put.
- TaskGroup owns the workers; cancellation cleanly drains.
- Each worker: try/except around the unit of work; metrics per outcome.
Lab: process 1M items at a controlled QPS without OOM. Tune N and queue size.
B.10 Domain Patterns Worth Building¶
- Repository + UnitOfWork over SQLite, with an in-memory fake for tests.
- Result type (
Ok[T] | Err[E]) for code where exceptions obscure flow. Don't over-apply - Python has exceptions for a reason. - Saga / state machine for multi-step durable workflows. State table in Postgres + idempotency keys.
- Outbox pattern for reliable event publishing alongside DB writes.