Week 3 - Collections, Comprehensions, Iterators, and Generators¶
3.1 Conceptual Core¶
- The four collection workhorses:
list(dynamic array),tuple(immutable record),dict(open-addressed hash table, insertion-ordered),set(hash table). Internalize O() costs: listappendamortized O(1),insert(0, x)O(n); dictinO(1) avg; listinO(n). - An iterator is a stateful cursor; an iterable is anything you can call
iter()on.for x in xs:desugars toit = iter(xs); while True: try: x = next(it) except StopIteration: break. - Generators are coroutines that yield values, not just iterators. They preserve local state across
yield; they accept values via.send(); they handle exceptions via.throw(); they clean up via.close().
3.2 Mechanical Detail¶
collections.deque(O(1) both ends),collections.Counter,collections.defaultdict,collections.ChainMap,collections.OrderedDict(now mostly redundant; still useful formove_to_end),collections.namedtuple(legacy; preferdataclass(slots=True, frozen=True)ortyping.NamedTuple).itertoolsmastery:chain,islice,takewhile,dropwhile,groupby(note: requires sorted input),tee,product,permutations,combinations,accumulate,pairwise(3.10+),batched(3.12+).- Comprehensions in all four flavors: list, set, dict, generator. Generator expression vs. list comprehension: when the consumer is
sum,any,all,min,max, or anything that doesn't need a materialized list, prefer the genexp - same syntax minus the brackets. yield from(PEP 380): delegate to a sub-generator, propagate sends/throws.- The buffer protocol primer:
memoryviewoverbytes/bytearray/array.array/numpy.ndarraylets you slice without copying. Critical in the AI months.
3.3 Lab - "Streaming Word Count"¶
- Implement
wc -wover arbitrarily large files using a generator pipeline: file → lines → words → counts. Constant memory regardless of file size. - Add a
--top Kflag usingheapq.nlargest. Note that you must materialize the counter - discuss why. - Replace your hand-rolled tokenizer with
re.finditerand benchmark. Then benchmark astr.split()version. Explain the difference. - Add a
--parallel Nflag usingconcurrent.futures.ProcessPoolExecutoranditertools.batched. (We will revisit in Month 4.)
3.4 Idiomatic & Linter Drill¶
- Enable
ruffC4(comprehensions) andPERFrules. Refactorfor-with-appendpatterns into comprehensions. Identify cases where the comprehension is less readable and document them.
3.5 Production Hardening Slice¶
- Add
hypothesisas a dev dep. Write property tests for your word counter: invariants like "total = sum of counts," "tokens are non-empty," "shuffling input lines doesn't change the count."