Skip to content

08 - Iterators, Generators, Comprehensions

What this session is

About an hour and a half. This is the longest page so far because it's where Python's "secret sauce" lives. By the end you'll understand iterators (the protocol behind every for loop), generators (lazy iterators you write with yield), and comprehensions (Python's compact way to build a list, dict, or set from another collection).

Don't skip this page. Real Python code uses these everywhere.

The big idea: lazy sequences

When you write:

for x in range(1_000_000_000):
    do_something(x)

Python doesn't build a billion-element list in memory. range(1_000_000_000) is an iterator - an object that produces values one at a time, on demand. The for loop asks for the next value, uses it, throws it away, asks for the next. The memory cost stays constant.

This idea - lazy evaluation - runs through Python. Most "collection operations" return iterators, not lists, so you can chain them without materializing huge intermediate results.

The iterator protocol (briefly)

An iterator is any object with a __next__ method that produces the next value or raises StopIteration when there's nothing left. An iterable is something you can call iter(...) on to get an iterator - that's why lists, tuples, dicts, strings, files all work in for loops.

nums = [1, 2, 3]
it = iter(nums)
print(next(it))     # 1
print(next(it))     # 2
print(next(it))     # 3
print(next(it))     # StopIteration

You won't usually call iter/next by hand. for does it for you. The protocol exists so any object can opt in.

Generators with yield

Writing a class with __next__ and __iter__ is annoying. Python has a shortcut: a function that uses yield instead of return is a generator - it produces values lazily.

def count_up_to(n):
    i = 1
    while i <= n:
        yield i
        i += 1

for x in count_up_to(5):
    print(x)

Output:

1
2
3
4
5

What's new:

  • yield i - produce i as the next value, then pause. The next call resumes where we left off.
  • The function body acts like a coroutine - it runs a bit, yields, sleeps until asked again, runs a bit more.

Memory: only one value lives at a time. count_up_to(1_000_000) doesn't build a million-item list; it produces them one by one as needed.

A generator is one of Python's most powerful patterns. Use it whenever you want "produce a sequence of things lazily" - reading lines from a huge file, paginating an API, walking a tree.

A real-world example: reading a large file line by line is built in:

with open("huge_file.txt") as f:
    for line in f:           # f is iterable; yields one line at a time
        process(line)

If the file is 100 GB, this works - Python doesn't load it all into RAM.

List comprehensions

A list comprehension is a compact way to build a list from another iterable:

nums = [1, 2, 3, 4, 5]
squares = [n * n for n in nums]
print(squares)              # [1, 4, 9, 16, 25]

That one line replaces:

squares = []
for n in nums:
    squares.append(n * n)

Read [expression for var in iterable] as: "for each var in iterable, produce expression."

You can filter with if:

evens = [n for n in nums if n % 2 == 0]
print(evens)                # [2, 4]

Or both - transform AND filter:

squared_evens = [n * n for n in nums if n % 2 == 0]
print(squared_evens)        # [4, 16]

Read order is: outer-to-inner, just like the loop form. You can also nest:

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [n for row in matrix for n in row]
print(flat)                 # [1, 2, 3, 4, 5, 6, 7, 8, 9]

(That nested form gets confusing fast. Two levels is fine; three is usually a sign to use a regular for loop for clarity.)

Dict and set comprehensions

Same idea, different brackets:

nums = [1, 2, 3, 4, 5]
squares_dict = {n: n * n for n in nums}
print(squares_dict)         # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

unique_lengths = {len(w) for w in ["foo", "bar", "baz", "quux"]}
print(unique_lengths)       # {3, 4}

Generator expressions

Parentheses instead of brackets make a generator expression - same syntax, but lazy:

nums = [1, 2, 3, 4, 5]
squares_gen = (n * n for n in nums)
print(squares_gen)          # <generator object ...>
print(next(squares_gen))    # 1
print(list(squares_gen))    # [4, 9, 16, 25] - rest, consumed in one go

Use generator expressions when the consumer will iterate once and you want to save memory:

total = sum(n * n for n in range(1_000_000))

That doesn't build a million-item list - it streams the squares through sum. Same result, constant memory.

Useful built-ins that work with iterators

These appear in real code constantly:

  • sum(iter) - total.
  • min(iter), max(iter) - extremes.
  • len(seq) - count (for sequences, not all iterators).
  • sorted(iter) - returns a sorted list. Optional key=lambda x: ... and reverse=True.
  • reversed(seq) - iterator over the sequence backward.
  • zip(a, b) - pair up two iterables: zip([1,2,3], ["a","b","c"]) yields (1, "a"), (2, "b"), (3, "c").
  • enumerate(iter) - yields (index, value) pairs (page 06).
  • any(iter) / all(iter) - short-circuit "is any/all true?".

A working example:

names = ["Alice", "Bob", "Chioma"]
ages = [30, 25, 35]
adults = [(name, age) for name, age in zip(names, ages) if age >= 18]
print(adults)               # [('Alice', 30), ('Bob', 25), ('Chioma', 35)]

lambda: anonymous functions

Briefly: lambda x: expression is a short way to write a function inline. Used with sorted, filter, map, etc.

words = ["banana", "apple", "cherry"]
sorted_by_length = sorted(words, key=lambda w: len(w))
print(sorted_by_length)     # ['apple', 'banana', 'cherry']

lambda w: len(w) is a function taking w and returning len(w). Used as the sort key - sort by length.

Rule of thumb: use lambda only for one-line transformations. If it's longer than that, def a named function.

Exercise

In a new file iter_practice.py:

  1. Use a list comprehension to build a list of the first 20 cubes (1³, 2³, ..., 20³). Print it.

  2. Use a list comprehension with a filter to build the list of cubes that are even. Print it.

  3. Use a dict comprehension to build {n: n³ for n in range(1, 11)}. Print it.

  4. Write a generator function fib(n) that yields the first n Fibonacci numbers (1, 1, 2, 3, 5, 8, 13, ...). Loop over fib(10) and print each.

  5. Use sum and a generator expression to compute the sum of cubes from 1 to 100. (Expected: 25502500.) Don't build a list.

  6. Stretch: Open 01-setup.md from this path (if you have it locally) or any text file. Use a with block and a generator-style iteration to count lines without loading the whole file into memory.

What you might wonder

"Why bother with generators if comprehensions exist?" Two reasons. (1) Memory - a generator can produce billions of items without building a billion-item list. (2) Composability - generators can read from other generators, building pipelines like sum(x*x for x in filter(is_even, range(1_000_000))).

"List comprehension vs for loop - which?" If the transformation is a single expression, the comprehension is more readable. If it has multiple statements (mutating state, multiple side effects, complex logic), use the for loop. Comprehensions should produce a collection, not do work.

"Generator vs list comprehension?" If you'll iterate once: generator (saves memory). If you need to index, length, or iterate multiple times: list (you'll need it materialized eventually).

"What's map and filter?" Older Python idioms: map(func, iter) returns an iterator of func(x) for each x; filter(func, iter) keeps only items where func(x) is truthy. Replaced in modern style by comprehensions: (func(x) for x in iter) and (x for x in iter if func(x)). You'll see map/filter in older code; recognize them, prefer comprehensions in new code.

"Are list comprehensions hard to read?" Two-clause ones (transform + optional filter) - easy. Triple-nested ones - yes, often. The fix: if a comprehension reads like noise, write it as a loop. Clarity beats compactness.

Done

You can now: - Recognize iterators and iterables. - Write generator functions with yield. - Build lists, dicts, and sets with comprehensions. - Write generator expressions for streaming computation. - Use built-ins like zip, enumerate, sorted, sum correctly. - Use lambda for one-line transformations.

These features make Python feel different from most languages. Real Python code uses them on every page. Internalizing them is what separates "I write Python" from "I write Pythonic code."

Next page: working with files and the standard library.

Next: Files and the standard library →

Comments