Skip to content

06 - Collections

What this session is

About an hour. You'll learn Python's four built-in collection types: lists, tuples, dictionaries (dicts), and sets. Each is good at different things. Real Python code uses all four constantly.

Lists: ordered, mutable

fruits = ["apple", "banana", "cherry"]
print(fruits)            # ['apple', 'banana', 'cherry']
print(fruits[0])         # apple - lists are 0-indexed
print(fruits[2])         # cherry
print(len(fruits))       # 3

What's new:

  • ["apple", "banana", "cherry"] - a list. Created with square brackets, comma-separated.
  • fruits[0] - read the first element. Indexing starts at 0.
  • len(fruits) - built-in function for the number of elements.
  • Negative indices count from the end: fruits[-1] is cherry, fruits[-2] is banana.

Common mistake: going off the end. fruits[10] raises IndexError. Always know how many elements you have.

Lists grow and shrink

fruits = ["apple", "banana"]
fruits.append("cherry")          # add to end
fruits.insert(0, "apricot")      # insert at index
fruits.remove("banana")          # remove by value
last = fruits.pop()              # remove and return last item
print(fruits)                    # ['apricot', 'apple']
print(last)                      # cherry

Lists are mutable - they change in place. fruits.append(...) modifies fruits; it doesn't return a new list. (Returns None, in fact.)

Slicing

A powerful Python feature - take a slice of a list:

nums = [10, 20, 30, 40, 50]
print(nums[1:4])     # [20, 30, 40] - start inclusive, stop exclusive
print(nums[:3])      # [10, 20, 30] - start defaults to 0
print(nums[2:])      # [30, 40, 50] - stop defaults to end
print(nums[::2])     # [10, 30, 50] - every other element
print(nums[::-1])    # [50, 40, 30, 20, 10] - reversed

Slicing returns a new list; the original is untouched. Works on strings too (a string is a sequence of characters).

Iterating a list

for fruit in fruits:
    print(fruit)

If you need the index too:

for i, fruit in enumerate(fruits):
    print(i, fruit)

enumerate yields (index, value) pairs. The shape for x, y in something is tuple unpacking (page 04) - Python takes the 2-element tuple and unpacks it into two names.

Tuples: ordered, immutable

point = (3, 4)
print(point[0])         # 3
print(point[1])         # 4
# point[0] = 99         # ERROR - tuples can't be modified

A tuple is a list that can't change. Created with parentheses (or just commas - point = 3, 4 is the same tuple).

Why bother? Three reasons: 1. Communicates "this won't change." A function returning (width, height) is signaling "you can rely on these two values together." 2. Usable as dictionary keys (lists aren't, because they could change underneath the hash). 3. Slightly faster than lists for fixed-size data.

You've seen tuples already: returning multiple values from a function (page 04) returns a tuple.

Tuple unpacking:

point = (3, 4)
x, y = point
print(x, y)     # 3 4

Dictionaries: lookups by key

ages = {"Alice": 30, "Bob": 25, "Chioma": 35}
print(ages["Alice"])      # 30
print(len(ages))          # 3

A dict maps keys to values. Created with {key: value, ...}. Keys can be strings, numbers, tuples (anything hashable - immutable types). Values can be anything.

ages["Dimeji"] = 40       # add
ages["Alice"] = 31        # update
del ages["Bob"]           # remove

Check whether a key is there:

if "Alice" in ages:
    print("found")

The safe lookup (no crash on missing key):

score = ages.get("Zara")              # None if missing
score = ages.get("Zara", 0)           # 0 if missing

Iterate:

for name in ages:
    print(name, ages[name])

for name, age in ages.items():        # name + value at once
    print(name, age)

for age in ages.values():             # just values
    print(age)

Modern Python (3.7+): dicts preserve insertion order. The order you put items in is the order you get them out.

Sets: unique, unordered

letters = {"a", "b", "c", "a"}     # duplicate dropped
print(letters)                     # {'a', 'b', 'c'} - order varies
print(len(letters))                # 3

A set is an unordered collection of unique values. Created with { ... } (or set() for empty - {} makes an empty dict, not an empty set, because braces had to mean something).

What sets are good at: - Membership checks ("a" in letters) - much faster than scanning a list when the collection is large. - De-duplication: set(my_list) gives you the unique values. - Set math: a | b (union), a & b (intersection), a - b (difference), a ^ b (symmetric difference).

weekday = {"mon", "tue", "wed", "thu", "fri"}
busy = {"mon", "wed", "fri"}
free = weekday - busy
print(free)                        # {'thu', 'tue'} - order varies

Quick comparison

Type Syntax Ordered? Mutable? Duplicates? Use when
list [1, 2, 3] yes yes yes ordered collection, will change
tuple (1, 2, 3) yes no yes fixed group, won't change
dict {"a": 1} yes (3.7+) yes keys unique lookup by key
set {1, 2, 3} no yes no membership, unique values

Nested collections

You can put any type in any collection:

people = [
    {"name": "Alice", "age": 30},
    {"name": "Bob",   "age": 25},
]
print(people[0]["name"])     # Alice

A list of dicts is the most common shape - close to a JSON array of objects, which it often is.

Exercise

In a new file wordcount.py:

Write a program that counts how many times each word appears in a sentence.

  1. Hardcode this sentence: "the quick brown fox jumps over the lazy dog the end".

  2. Split it into words:

    words = sentence.split()
    
    .split() on a string with no arguments splits on whitespace, returning a list of strings.

  3. Build a dict counts mapping each word to how many times it appeared.

  4. Print each word and its count, one per line.

Expected output (order may differ since dict iteration is insertion-order):

the 3
quick 1
brown 1
fox 1
jumps 1
over 1
lazy 1
dog 1
end 1

Stretch: Sort the output by count (most-frequent first). Use:

sorted_items = sorted(counts.items(), key=lambda x: x[1], reverse=True)
We'll explain lambda properly in a later page; for now, that's "sort by the second element of each tuple, biggest first."

What you might wonder

"Why is {} an empty dict and set() an empty set?" Historical accident - dicts predate sets in Python. {} was already taken. Live with it.

"What's the difference between a tuple and a list?" Tuples can't change after creation; lists can. Use tuples for "this is one immutable group of related things" (like a coordinate); lists for "an evolving collection of items."

"Can dict keys be lists?" No - keys must be hashable, which essentially means immutable. Strings, numbers, tuples, frozen sets - yes. Lists, dicts, sets - no. Trying to use a list as a key raises TypeError.

"Why preserve insertion order in dicts? Other languages don't." Python's BDFL (Benevolent Dictator) was convinced after CPython's implementation incidentally preserved order in 3.6. Made official in 3.7. It's now relied on heavily, so it's permanent.

"What's a frozenset?" An immutable set. Usable as a dict key (since it's hashable, like a tuple). Rare; mention it for recognition.

Done

You can now: - Build ordered lists; index, slice, append, remove. - Build immutable tuples; unpack them. - Build dicts; set, get, check membership, iterate by keys/values/items. - Build sets; do membership checks, set operations, de-duplication. - Pick the right collection for the access pattern.

Collections are most of what Python code does - slicing data, looking it up, transforming it. You now have the basic vocabulary.

Next page: how Python handles things going wrong - exceptions.

Next: Errors and exceptions →

Comments