Python From Scratch (Beginner)¶

Beginner path: from never-coded to reading and contributing to real OSS Python.

Printing this page

Use your browser's Print → Save as PDF. The print stylesheet hides navigation, comments, and other site chrome; pages break cleanly at section boundaries; advanced content stays included regardless of beginner-mode state.

Python From Scratch - Beginner to OSS Contributor¶

This path takes you from "I have never written code" to "I can clone a real Python project, read most of it, and submit a pull request for a small fix." It is unhurried, honest, and assumes nothing.

Who this is for¶

You have never written code, OR
You have copy-pasted Python code from tutorials but couldn't explain it line by line.

That's it. No "you should already know X." If you need to know something, this path will teach it.

What you'll need¶

A computer (any OS - macOS, Linux, Windows all work).
A text editor. VS Code is a good free default. Notepad/TextEdit work too, just less comfortable.
A terminal. Built into every operating system.
About 5 hours per week. The path is sized for ~4-6 months at that pace. The path doesn't expire.

Why Python (and not some other language)¶

A few reasons that matter when you're starting:

The syntax is gentle. No semicolons, no curly braces, indentation tells the structure. Code looks close to how you'd describe the task in English.
You can experiment instantly. Python has a REPL - type a line, see the result. No "compile, run, repeat" cycle for small explorations.
It's the dominant language for data, ML, scripting, and automation. If your interest is anywhere near AI, scientific computing, scripting, or "glue this thing to that thing," Python is the answer.
The OSS ecosystem is enormous. Tens of thousands of well-maintained projects on GitHub, ranging from one-file libraries to whole frameworks. You'll have no shortage of places to contribute.

How this path works¶

Each page does one thing:

Says what you'll learn this session.
Shows you a small program.
Walks through the code line by line.
Gives you a tiny exercise.

Do the exercises. Reading without doing won't stick. Type the code yourself; don't copy-paste.

The deal¶

I will not pretend things are easy when they aren't. When something is confusing the first time, I'll say so.
I will not send you to "consult the primary source." If you need to know it, this path will teach it.
There are no stupid questions, only stupid skipped exercises.
The goal at the end is real: a pull request to a real open-source project.

The pages¶

#	Title	What you'll know after
00	Introduction	What we're doing and why
01	Setup	Python installed, virtual environment, hello world
02	First real program	Variables, numbers, text, f-strings
03	Decisions and loops	`if`, `for`, `while`
04	Functions	Named reusable blocks
05	Classes	Custom types, methods, `self`
06	Collections	Lists, tuples, dicts, sets
07	Errors and exceptions	`try`/`except`, raising, custom exception types
08	Iterators, generators, comprehensions	Python's superpower
09	Files and the standard library	Practical I/O
10	Tests	Writing your first pytest
11	Modules, packages, pip, venv	Using code other people wrote
12	Reading other people's code	The bridge
13	Picking a project	What "manageable" looks like
14	Anatomy of a small Python OSS repo	Case study
15	Your first contribution	Workflow + PR

Start with Introduction.

00 - Introduction¶

What this session is¶

A 10-minute read. No code yet. The point is to set expectations honestly so you can decide if this path is for you.

What you're going to build, eventually¶

Programming is not a thing you watch - it's a thing you do. By the end of this path, you'll have done all of these:

Written and run small programs that print, calculate, and make decisions.
Built a little command-line script that takes input and produces output.
Written tests for your own code and watched them pass and fail.
Cloned an open-source Python project off the internet, browsed its code, run its tests, and understood roughly what it does.
Submitted a small fix to one of those open-source projects as a real pull request.

That last point is the goal. Everything else is preparation.

The deal we're making¶

A few things you should know about how this path works:

It's slow on purpose. Most beginner Python tutorials drop you into "build a Flask app" by page three. That works for some people. For most, it leaves them able to copy code without understanding it. This path is the opposite: one concept per page, with time to actually internalize each one.

It assumes nothing. If a word appears that you haven't seen, it'll be defined right there. No glossary lookups. No "see chapter 12."

It does the work where the work is. Some pages are short because the concept is small. Some are long because the concept is hard. We don't pad.

You have to type the code. Reading code without typing it has roughly the same effect as reading sheet music without playing it. Type every example, even when you "get it" from reading.

You will be confused. Often. Especially in the first month. That's normal. Programming is unusual in how often you feel stuck - the trick is not to panic when it happens. Re-read the page. Run the code. Change one thing and see what changes. Confusion is not a sign you're bad at this; it's a sign you're doing it.

What you need to start¶

A computer. Any operating system works.
A text editor. VS Code is free, multi-platform, and what I'd suggest unless you already love something else.
A terminal. (On macOS it's the Terminal app. On Windows it's PowerShell or the Windows Terminal app. On Linux you already know.)
~5 hours per week. Less is fine; the path just takes longer.
A specific notebook or text file where you can write down questions as they come up. You'll have lots. Writing them down lets you keep going past them; when you come back you can answer them with what you've since learned.

What you do NOT need¶

Math beyond basic arithmetic. Programming uses arithmetic; it's not "advanced math."
A computer-science degree, or any plan to get one.
A "gift" for computers. There is no such thing. People who seem to "just get it" have spent more hours doing it than you have. That's all.
To know any other programming language first. Python is a fine first language - arguably the best.

How long this realistically takes¶

The honest answer: 4 to 6 months at 5 focused hours per week, to get to the "submit a pull request" goal.

If you have less time, take longer. If you have more time, take less. The path doesn't expire.

I cannot make the time go faster. Nobody can. The thing that takes weeks is not "absorbing information" - it's your brain getting used to a new way of thinking. That happens at biology speed, not internet speed.

What success looks like at the end¶

You'll be able to:

Open a Python file you've never seen and read it like a recipe - knowing what each piece does.
Open a project on GitHub written in Python and tell me, in two paragraphs, what it does and how.
Find a small bug or missing feature in such a project and fix it.
Submit that fix as a pull request that follows the project's conventions.

You will not be able to:

Build a self-driving car. (Not in 6 months. Maybe ever - that's its own multi-year career.)
Win Kaggle competitions. (Different skill, mostly orthogonal.)
Tell people you're a "senior Python engineer." (That takes years of doing the work after this path ends.)

What you will have: the foundation to keep going.

One last thing before we start¶

If at any point a page feels too dense, stop and re-read it. If you re-read it and it's still too dense, that's a bug in the page - note it, skip forward, and come back. The path is alive; it gets fixed when readers say "this part lost me."

Ready? Next: Setup →

01 - Setup¶

What this session is¶

About 30 minutes. By the end you'll have Python installed, the terminal open, your first program running, and a virtual environment - a small thing that prevents a large category of future pain.

Step 1: Install Python¶

Most operating systems already have Python somewhere - but often the wrong version, or one used by the system that you shouldn't disturb. The safe move is to install a fresh recent Python yourself.

macOS: download the installer from python.org/downloads. Double-click; follow the prompts.
Windows: download the installer from python.org. Important: during install, check the box that says "Add python.exe to PATH." Without it, the terminal won't find Python.
Linux: your distro has Python in its package manager. sudo apt install python3 python3-pip python3-venv on Debian/Ubuntu; sudo dnf install python3 python3-pip on Fedora.

Once it's done, open a terminal:

macOS: press ⌘ Space, type "terminal", hit enter.
Windows: press Windows, type "powershell", hit enter.
Linux: you know how.

Check:

python3 --version

You should see something like:

Python 3.13.0

The minor version will differ. As long as it's 3.10 or higher, you're fine. (On some systems the command is just python, not python3. Try both.)

If you get command not found, the install didn't work. On Windows, make sure you ticked the PATH box; you may need to reinstall. On macOS, try opening a new terminal window after install.

Step 2: Pick a folder for your code¶

You're going to write a lot of small programs. They need somewhere to live.

mkdir -p ~/code/python-learning
cd ~/code/python-learning

(~ is your home folder. mkdir -p creates the folder and any missing parents; cd enters it.)

Check you're there:

pwd

You should see the folder's full path.

Step 3: Create a virtual environment¶

This is the step that surprises beginners and prevents months of "why is my Python broken" pain.

Python projects often need third-party libraries. If you install them globally (system-wide), eventually two projects need different versions of the same library and the whole thing collapses. The solution: a virtual environment - a private, project-local Python install where you can put libraries without affecting anything else.

Create one:

python3 -m venv .venv

That creates a folder called .venv (yes, the dot is intentional - it's a hidden folder by convention). Inside it is a private Python installation.

Activate it so your terminal uses it:

source .venv/bin/activate          # macOS, Linux
.venv\Scripts\activate             # Windows

Your terminal prompt should now have (.venv) at the front. That's how you know you're inside the virtual environment.

From now on: - Always activate the venv before working (source .venv/bin/activate). - When you're done, type deactivate to leave (rare; mostly you stay in). - Each project gets its own .venv. Don't share.

If you forget to activate, your python commands will use the system Python - and pip install will try to install globally, sometimes failing, sometimes succeeding-and-causing-problems-later.

Step 4: The smallest possible Python program¶

Open your text editor (VS Code, or whatever you chose). Create a new file. Save it as hello.py inside the python-learning folder.

Type this - type it, don't copy-paste:

print("hello, world")

That's the whole file. One line.

Save it.

Step 5: Run it¶

Back in your terminal, with the venv active:

python hello.py

You should see:

hello, world

That's your first program. Take a moment.

What just happened¶

You typed one line. print(...) is a built-in function that takes whatever you give it and prints it followed by a newline. "hello, world" is text in quotes - a string.

In a language like Go, the simplest program is ten lines of scaffolding (package, import, func main, braces). In Python it's one. Python trades structure for brevity. You'll feel both sides over the next pages.

Try changing things¶

The way to learn is to break things on purpose and see what happens. Try each:

Change "hello, world" to your name. Run again.

Add a second line that prints something else:

print("hello, world")
print("this is my second line")

Try printing a number - no quotes:
```
print(42)
```
Now break it on purpose. Remove the closing ) and run. Read the error Python gives you. (Don't worry about understanding all of it - just notice that Python tells you which line.)
Put the ) back. Now mistype print as Print (capital P). Run. Read the error. Python is case-sensitive: print and Print are different names.

Reading errors is most of programming. Get comfortable seeing them.

The REPL - instant gratification¶

Python has a thing Go doesn't: a REPL (Read-Eval-Print Loop). Type lines, get results instantly. Try it:

python

You'll see a >>> prompt. Type:

>>> 2 + 2
4
>>> "hello" + " " + "world"
'hello world'
>>> print("hi")
hi

Press Ctrl-D (macOS/Linux) or Ctrl-Z then Enter (Windows) to exit.

The REPL is great for testing one-line ideas. "How does X behave?" → open REPL → try → see. Use it constantly.

What you might wonder¶

"Why do I need a virtual environment for a one-line program?" You don't - yet. The habit pays off the first time you install a third-party library (page 11). Get used to seeing (.venv) in your prompt now, and you'll never get bitten by the cross-project version-conflict bug.

"Do I need to compile?" No. Python reads and runs your file line by line. There's no separate compile step. The trade: Python is slower than compiled languages (like Go or Rust) at runtime, but faster at "edit, run, see result."

"Are tabs or spaces required for the indentation?" You haven't seen indentation matter yet - but in Python, indentation defines the structure of your code. Most editors insert 4 spaces when you press Tab. Stick to that. The first time you mix tabs and spaces, Python will complain and you'll lose 30 minutes finding the issue.

"My terminal says python3 but examples use python. Which one?" Inside an activated venv, python is your venv's Python - use it. Outside the venv, python3 is safer (some systems alias python to Python 2, which you don't want).

Done¶

You have: - Python installed. - A folder to put your code in. - A virtual environment (activate it before any session). - One working program. - The REPL as a scratchpad.

This was the boring infrastructure step. Next page is where the real learning starts.

Next: First real program →

02 - First Real Program¶

What this session is¶

About 45 minutes. You'll learn three things: variables (storing data), types (kinds of data), and the f-strings Python uses for building text with values in it. By the end you'll have written a program that uses all three.

Why variables exist¶

Programs do things with data. To do things with data, you have to store it somewhere named, so you can refer to it later.

That's all a "variable" is: a name attached to a piece of data.

A small program with variables¶

Create a new file called greet.py. Type this in:

name = "Alice"
age = 30
print(name, "is", age, "years old")

Run it:

python greet.py

You should see:

Alice is 30 years old

What's new here¶

Two lines you haven't seen before:

name = "Alice"
age = 30

Let's unpack name = "Alice":

name is the name of a new variable.
= is the assignment operator. Read it as "is set to."
"Alice" is the value we're putting into it. The double quotes mean it's text (a "string").

So name = "Alice" reads as: "set name to the text Alice."

age = 30 does the same with a number. No quotes around 30 - numbers don't take quotes.

The last line:

print(name, "is", age, "years old")

print can take multiple things separated by commas, and it prints them with spaces in between. We give it four things - the value of name, the text "is", the value of age, and the text "years old". Out comes one line with all four glued together by spaces.

Types: what kind of thing is this?¶

Every value in Python has a type. The type tells Python what kind of thing it is and what you can do with it.

You'll meet many types over time. The first four you need to know:

Type	What it holds	Example values
`int`	whole numbers (positive, negative, or zero)	`0`, `42`, `-7`, `1000`
`float`	numbers with a decimal point	`3.14`, `-0.5`, `0.0`
`str`	text in quotes	`"hello"`, `''`, `"a long sentence"`
`bool`	one of two values	`True`, `False` (note capital T/F)

Notice you didn't have to tell Python the type. Python figured it out from the value. This is called dynamic typing - types are tracked at runtime, not declared in source.

In the REPL you can ask:

>>> type("hello")
<class 'str'>
>>> type(42)
<class 'int'>
>>> type(3.14)
<class 'float'>
>>> type(True)
<class 'bool'>

What you can do with numbers¶

The usual arithmetic works:

x = 10
y = 3
print(x + y)   # 13
print(x - y)   # 7
print(x * y)   # 30
print(x / y)   # 3.3333... - true division (returns float)
print(x // y)  # 3        - integer division (drops remainder)
print(x % y)   # 1        - modulo (remainder)
print(x ** y)  # 1000     - exponentiation (10 to the 3rd)

That # 13 part is a comment - anything after # on a line is ignored by Python. Comments are how you leave notes for yourself in the code.

Two things to notice:

Python has two division operators. / always returns a float (decimal); // does integer division (drops remainder). 10 / 3 is 3.333...; 10 // 3 is 3. Many other languages only have one division operator and it surprises beginners; Python's choice is friendlier.
** is exponentiation. x ** 2 is x squared.

What you can do with strings¶

You can stick two strings together with +:

greeting = "hello"
name = "world"
message = greeting + ", " + name
print(message)   # hello, world

The technical word for "stick two strings together" is concatenate.

You can repeat a string with *:

print("ha" * 3)   # hahaha

You cannot mix freely:

n = 5
print("items: " + n)   # ERROR

The error: TypeError: can only concatenate str (not "int") to str. The fix: convert the number to a string first. Two ways.

Way one - str():

print("items: " + str(n))   # items: 5

Way two - f-strings (the modern, preferred way):

print(f"items: {n}")        # items: 5

An f-string (formatted string literal) starts with f before the opening quote. Inside the string, anything in { } is treated as Python code - its value gets inserted. You'll use f-strings constantly.

name = "Alice"
age = 30
print(f"{name} is {age} years old")
# Alice is 30 years old

You can put expressions in the braces, not just variable names:

print(f"next year, {name} will be {age + 1}")

And you can format numbers:

pi = 3.14159
print(f"pi is approximately {pi:.2f}")   # pi is approximately 3.14

The :.2f after the variable means "format as a float with 2 decimal places." There are many such format codes; you don't need to memorize them - look them up when you need them.

What you can do with booleans¶

A bool is just True or False (note the capitals - Python is picky). You'll use them in decisions (next page). For now:

is_ready = True
has_permission = False
print(is_ready, has_permission)   # True False

Exercise¶

In a new file called me.py:

Write a program that:

Has a variable for your name (a string).
Has a variable for your favorite number (an int).
Has a variable for whether it's morning right now (a bool).
Prints a line like: "Hi, I'm Victor, my favorite number is 7, and yes (True) it's morning."

Try it two ways: - First with multiple arguments to print (print("Hi, I'm", name, ...)). - Then with an f-string: print(f"Hi, I'm {name}, ...").

Don't skip this. The act of typing is the learning.

What you might wonder¶

"What's the difference between single and double quotes?" None. 'hello' and "hello" are identical. Pick one and be consistent; switch when you need the other quote inside ("don't" is easier than 'don\'t').

"Why does 10 / 3 return a float?" Python's designers picked the friendlier default: math behaves like math. If you want integer division, use //. Some other languages (Go, C) make / integer-only and surprise beginners by losing decimals; Python doesn't.

"What happens if I never use a variable I declared?" Python doesn't complain. (Go does.) This is a small downside - you can have typo bugs (naem instead of name) that don't surface until you actually use the typo'd variable. Tools called "linters" (page 11) catch this for you.

"Can a variable hold different types over time?" Yes - Python is dynamically typed. You can do x = 5 then later x = "hello" and Python doesn't mind. This is flexibility; it's also a source of bugs. The discipline is: pick a type per variable and stick with it. (We'll meet type hints in page 04 - Python's optional way to declare what type a variable holds.)

Done¶

You can now: - Make variables and give them values. - Tell apart the four basic types: int, float, str, bool. - Do arithmetic, including Python's two division operators. - Stick strings together with + and repeat with *. - Build a string with f-strings and {expression} placeholders.

Next page: making your program decide and repeat.

Next: Decisions and loops →

03 - Decisions and Loops¶

What this session is¶

About an hour. You'll learn how to make your program decide between options (with if) and how to make it repeat something (with for and while). These two things are the building blocks of every program that does anything more than print a fixed message.

This page is also where you meet Python's defining feature: indentation matters. We'll come back to that.

Decisions with `if`¶

The world's smallest decision:

age = 18
if age >= 18:
    print("adult")
else:
    print("minor")

Run it. You'll see adult.

Now change age = 18 to age = 15 and run again. You'll see minor.

What's happening:

if age >= 18: - the colon ends the condition line.
The indented block that follows runs only when the condition is true.
else: - the indented block under it runs when the condition was false.

Notice: no curly braces, no end if. The indentation IS the block. The colon at the end of if/else is required.

Indentation is the syntax¶

This is the thing other-language programmers find weirdest about Python. There's no way around it: indentation defines structure.

A typical convention is 4 spaces per level. Most editors do this when you press Tab. Stick to 4 spaces; mix tabs and spaces and Python will (rightly) complain.

if x > 0:
    print("positive")
    print("still inside the if")
print("outside the if")

The two print calls under if are inside its block (4 spaces in). The third is outside (no indentation). Run it with x = 5 and x = -1 to see the difference.

You'll meet indentation everywhere from now on. After 30 minutes of writing Python, it'll feel natural.

Comparison operators¶

The operators that produce True/False:

Operator	Meaning
`==`	equal to
`!=`	not equal to
`<`	less than
`<=`	less than or equal to
`>`	greater than
`>=`	greater than or equal to

A common mistake: writing = (one equals sign) when you mean == (two). = is assignment; == is comparison. Python catches the error immediately in if blocks - you can't write if x = 5: because it isn't valid syntax.

Chaining decisions with `elif`¶

Python uses elif (short for "else if"):

score = 75
if score >= 90:
    print("A")
elif score >= 80:
    print("B")
elif score >= 70:
    print("C")
else:
    print("F")

Reads top to bottom. First condition that's true wins; everything else is skipped. If none match, the else block runs.

Combining conditions: `and`, `or`, `not`¶

Python spells these as words, not symbols:

Operator	Meaning	Example
`and`	both true	`age >= 18 and has_license`
`or`	at least one true	`is_weekend or is_holiday`
`not`	flip true/false	`not is_ready`

age = 25
has_license = True
if age >= 18 and has_license:
    print("can drive")

(Other languages use &&, ||, !. Python's word form is more readable.)

Truthy and falsy values¶

A useful Python quirk: many values can be tested directly with if, without an explicit comparison.

name = ""
if name:                    # treated as False when empty
    print(f"hello, {name}")
else:
    print("name is empty")

These count as falsy: - False - None (Python's "nothing" value) - 0, 0.0 - "" (empty string) - [] (empty list), {} (empty dict), () (empty tuple), set() (empty set)

Everything else is truthy. So if my_list: is the idiomatic way to say "if my_list has any items." if name: means "if name is non-empty." You'll see this everywhere in Python code.

Repetition: `for` (over things)¶

Python's for loop is for iterating over a collection:

fruits = ["apple", "banana", "cherry"]
for fruit in fruits:
    print(fruit)

Output:

apple
banana
cherry

fruit is the loop variable - each time through, it takes the next value from fruits. We'll meet lists properly in page 06; for now, recognize [ ... ] as a list.

To iterate a fixed number of times, use range:

for i in range(5):
    print(i)

Output: 0, 1, 2, 3, 4. (range(5) produces 0, 1, 2, 3, 4 - five numbers starting at 0.)

You can also specify start and stop:

for i in range(1, 6):    # 1, 2, 3, 4, 5
    print(i)

for i in range(0, 10, 2):  # 0, 2, 4, 6, 8 (step of 2)
    print(i)

Repetition: `while` (until)¶

while loops keep going while a condition stays true:

n = 10
while n > 0:
    print(n)
    n = n - 1

This prints 10, 9, 8, ..., 1. The body has to do something that changes the condition, otherwise the loop runs forever. (Forever loops are sometimes useful - for waiting on events - and you exit them with break.)

Breaking out: `break` and `continue`¶

break stops the loop entirely. continue skips to the next iteration.

for i in range(1, 11):
    if i == 5:
        break          # stop the whole loop when i is 5
    if i % 2 == 0:
        continue       # skip the print for even numbers
    print(i)

Output: 1, 3. Why?

i=1: not 5, not even → print 1.
i=2: not 5, even → continue (skip print).
i=3: not 5, not even → print 3.
i=4: even → skip.
i=5: break → stop entirely.

Putting it together¶

A program that classifies numbers 1 to 10:

for i in range(1, 11):
    if i % 2 == 0:
        print(i, "even")
    else:
        print(i, "odd")

Type and run. Read the output. Read the code. Look at each line and ask: which line produced this output?

Exercise¶

In a new file called classify.py:

Write the classic FizzBuzz. For each number from 1 to 20:

If divisible by 3, print Fizz instead of the number.
If divisible by 5, print Buzz instead.
If divisible by both 3 and 5, print FizzBuzz.
Otherwise print the number.

Hint: check the "both 3 and 5" case first. Why? Think about what would happen if you checked "divisible by 3" first.

Expected output starts:

1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
...

Don't move on until your program prints exactly the right thing.

What you might wonder¶

"Why no parentheses around the condition?" Python's grammar doesn't need them. if (age >= 18): works (the parens are just grouping) but feels noisy. The convention is no parens unless you need them for grouping a complex expression.

"Why and/or/not instead of &&/||/!?" Python's design favors readability. Symbols save keystrokes; words read aloud better. Python has &, |, ~ too - but those are bitwise operators (for working with binary numbers), not logical operators. Don't mix them up.

"What's None?" Python's "no value" - like null in other languages. Used to signal "this variable hasn't been set" or "this function had no result to return." We'll meet it again in pages 04 and 07.

"What about switch/case statements?" Python didn't have them for decades. As of 3.10, there's match/case (called "structural pattern matching"). For beginners, if/elif/else chains do the same job and are clearer. We'll see match if it comes up in real code.

Done¶

You can now: - Make a program take different actions based on conditions (if, elif, else). - Combine conditions with and, or, not. - Recognize truthy and falsy values (if my_list: idiom). - Iterate a collection with for x in things:. - Iterate a fixed number of times with for i in range(...). - Repeat until a condition fails with while. - Exit a loop early with break, skip an iteration with continue. - Read and respect Python's indentation rules.

You now have the basic shapes that every program is built from. The next page is the abstraction that lets your programs stay short as they get bigger: functions.

Next: Functions →

04 - Functions¶

What this session is¶

About an hour. You'll learn how to define your own functions, default arguments, keyword arguments, return values, and a glimpse of type hints - Python's optional way to declare what types your functions expect and return. By the end you can break programs into named pieces.

The problem functions solve¶

So far every program has been a single block of statements. That works for tiny programs. Once you get past 30-40 lines, it stops working - you can't see structure, you can't reuse anything, and you can't test pieces in isolation.

A function is a named, reusable block of code that takes some input and (usually) returns some output.

The shape¶

def name(parameters):
    # body
    return something

Concrete:

def double(x):
    return x * 2

print(double(5))   # 10
print(double(7))   # 14

Type and run.

Walk through:

def double(x): - defines a function called double with one parameter x. The colon ends the signature; the indented body follows.
return x * 2 - compute x * 2 and send it back to whoever called the function.
double(5) - call the function. The result is 10.

def is short for "define." The function exists after the def runs, ready to be called.

Multiple parameters¶

def add(a, b):
    return a + b

print(add(3, 4))     # 7

Parameters are separated by commas.

Default values¶

A parameter can have a default - used when the caller doesn't supply one:

def greet(name, greeting="hello"):
    return f"{greeting}, {name}"

print(greet("Alice"))                  # hello, Alice
print(greet("Alice", "hi"))            # hi, Alice
print(greet("Alice", greeting="hey"))  # hey, Alice

The third call uses a keyword argument - naming the parameter explicitly. Useful when a function has many parameters; the call site is self-documenting.

Default values must come after any non-default parameters: def f(a, b=2): is fine; def f(a=1, b): is a syntax error.

The trap

Don't use mutable defaults like def f(items=[]):. The list is created once when the function is defined and shared between all calls - modifying it changes the default for all future callers. Universal advice: use def f(items=None): if items is None: items = [] instead. This is the classic Python beginner bug.

Functions that don't return anything¶

If you don't write a return, the function returns None (Python's "nothing" value):

def say_hi(name):
    print(f"Hi, {name}")

result = say_hi("Alice")    # prints, but returns nothing
print(result)               # None

Type hints (optional, increasingly standard)¶

Python lets you annotate the types of parameters and return values. These are hints - Python doesn't enforce them at runtime - but tools (mypy, pyright, IDE inspections) catch type bugs before you run.

def double(x: int) -> int:
    return x * 2

def greet(name: str, greeting: str = "hello") -> str:
    return f"{greeting}, {name}"

Reading the syntax: - x: int - parameter x is an int. - -> int after the parameter list - the function returns an int.

You can use type hints from the start (recommended) or never (also fine - old code rarely has them). Modern Python projects use them. We'll use them lightly in this path; you'll get used to seeing them.

Returning multiple values¶

In some languages (Go) functions can return multiple values. In Python they "can't" - but they can return a tuple (an ordered group), which the caller can unpack:

def divide(a: int, b: int) -> tuple[int, int]:
    quotient = a // b
    remainder = a % b
    return quotient, remainder

q, r = divide(17, 5)
print(f"quotient: {q}, remainder: {r}")    # quotient: 3, remainder: 2

The return quotient, remainder actually creates a 2-tuple (quotient, remainder). The q, r = ... on the receiving end unpacks it back into two variables. You'll see this pattern often.

Functions calling functions¶

def square(x: int) -> int:
    return x * x

def sum_of_squares(a: int, b: int) -> int:
    return square(a) + square(b)

print(sum_of_squares(3, 4))   # 9 + 16 = 25

Functions calling functions is how programs get built up - small named pieces, composed.

Why functions matter¶

Naming. double(7) reads better than re-typing 7 * 2, especially when the operation is more complex.
Reuse. Write once, call many times.
Testing. You can test double by itself, separately from the rest of the program (page 10).
Structure. Reading a 500-line script is awful. Reading 20 small named functions tells you what the program does at a glance.

Variables inside vs outside¶

A variable created inside a function exists only inside that function. The technical word is scope.

def double(x):
    result = x * 2
    return result

print(result)   # ERROR - `result` doesn't exist out here

Each function has its own world. Get information in via parameters; get information out via return values.

(There's a way to share variables across functions called "global" state - generally avoided. Pass values explicitly.)

Exercise¶

In a new file iseven.py:

Write a function is_even(n: int) -> bool that returns True if n is even, False otherwise. Use the % operator.
Print is_even(4) and is_even(7). You should see True and False.
Write a function count_evens(max: int) -> int that counts even numbers in 1, 2, ..., max. Use a for loop and call is_even.
Print count_evens(10). Expected: 5.
Print count_evens(100). Expected: 50.
Now write a function greet(name, greeting="hello") with a default argument. Call it as greet("Alice") and greet("Alice", "hey").

What you might wonder¶

"What's *args and **kwargs?" A way to accept any number of positional or keyword arguments. def f(*args, **kwargs): - args becomes a tuple of all extra positional args; kwargs becomes a dict of extra named args. You'll see them constantly in framework code. For your own functions, prefer named parameters when you know what you're accepting.

"Are type hints required?" No. Old code rarely has them. New code increasingly does. They help tools catch bugs and make code self-documenting. Use them when you can.

"What if I don't write a return?" The function returns None. Calling f() and assigning the result gives you None.

"Can a function call itself?" Yes - that's recursion. A useful tool for certain problems (tree traversal, divide-and-conquer). We'll meet a use case later.

"Why is the mutable-default thing a trap?" Because of when the default value is created. Python evaluates default expressions once, when def runs. The list survives across calls; every call modifying it sees changes from previous calls. It's surprising. Use None as the default and create the list inside.

Done¶

You can now: - Define your own functions with def. - Use default and keyword arguments. - Optionally add type hints. - Return zero, one, or many values (via tuple unpacking). - Call functions from other functions. - Avoid the mutable-default trap.

You now have all the fundamentals: variables, types, control flow, functions. Every Python program is built from these. The next pages extend the toolkit: classes (your own types), collections (lists/dicts/sets), errors, iterators, files, tests, and packages.

Next: Classes →

05 - Classes¶

What this session is¶

About an hour. You'll learn how to define your own types in Python using classes. By the end you'll know how to bundle data together, attach behavior to it, and you'll see Python's self, __init__, and the slightly newer @dataclass shortcut.

The problem this solves¶

Every variable so far has held one value - one int, one str. Real things have many properties at once: a person has a name, an age, a city. You could pass each property as a separate parameter:

def describe(name, age, city):
    print(f"{name}, age {age}, lives in {city}")

That works for two or three properties. At six, you're sad. At twelve, you're lost. A class lets you bundle them.

A class¶

class Person:
    def __init__(self, name, age, city):
        self.name = name
        self.age = age
        self.city = city

alice = Person("Alice", 30, "Lagos")
print(alice.name)            # Alice
print(alice.age)             # 30
print(alice.city)            # Lagos

Type and run.

What's new:

class Person: - defines a new class called Person. The convention is CamelCase for class names.
def __init__(self, name, age, city): - a special method called when you create a Person. Its job is to set up the new object. The double-underscore name (__init__) is convention for "called by Python machinery, not by you directly."
self - the first parameter of every method. Refers to the object being operated on. You don't pass it explicitly when calling; Python supplies it.
self.name = name - set the new object's name attribute to the value passed in.
Person("Alice", 30, "Lagos") - create a new Person. Python calls __init__ for you, passing your arguments. The result is the newly-built object.
alice.name - read an attribute. (Setting works the same way: alice.name = "Alicia".)

Methods¶

A method is a function defined inside a class. Like __init__, it takes self as the first parameter:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def greet(self):
        return f"Hi, I'm {self.name}"

    def birthday(self):
        self.age += 1

alice = Person("Alice", 30)
print(alice.greet())     # Hi, I'm Alice
alice.birthday()
print(alice.age)         # 31

Methods can read and modify self's attributes. birthday() mutates alice in place - no need to return anything.

When you call alice.greet(), Python implicitly passes alice as self. You write greet(self); you call alice.greet(). Don't get tripped up by this.

A useful trick: `repr`¶

If you print an object without overriding anything, you get something ugly: <__main__.Person object at 0x10502f140>. Useless.

Define __repr__ to make it print nicely:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    def __repr__(self):
        return f"Person(name={self.name!r}, age={self.age})"

alice = Person("Alice", 30)
print(alice)     # Person(name='Alice', age=30)

!r in an f-string calls repr() on the value - which for strings adds quotes. Useful for debug output.

(There's also __str__, used by str(obj). If you only define __repr__, it's used for both. Define __repr__ first; add __str__ only if you need a different "friendly" form.)

A modern shortcut: `@dataclass`¶

The class above has a lot of boilerplate. Almost every class with data in it starts the same way: take values in __init__, store as attributes, add __repr__. Python has a built-in shortcut: @dataclass.

from dataclasses import dataclass

@dataclass
class Person:
    name: str
    age: int
    city: str = "unknown"

alice = Person("Alice", 30, "Lagos")
print(alice)            # Person(name='Alice', age=30, city='Lagos')
print(alice.name)       # Alice

The @dataclass decorator (a thing applied to a class - page 08 explains decorators) auto-generates __init__, __repr__, and equality (==). You declare the fields as class-level annotations with optional defaults.

Modern Python code uses @dataclass heavily for "things with data and not much else." Reach for it before writing a class with a long __init__ of self.x = x lines.

You can still add methods normally:

@dataclass
class Person:
    name: str
    age: int

    def greet(self) -> str:
        return f"Hi, I'm {self.name}"

Inheritance (briefly)¶

Classes can inherit from other classes - pick up their attributes and methods. Used heavily in some codebases, sparingly in others.

class Animal:
    def __init__(self, name):
        self.name = name

    def speak(self):
        return "(generic animal sound)"

class Dog(Animal):
    def speak(self):
        return f"{self.name} says woof"

class Cat(Animal):
    def speak(self):
        return f"{self.name} says meow"

for pet in [Dog("Rex"), Cat("Whiskers")]:
    print(pet.speak())

Output:

Rex says woof
Whiskers says meow

class Dog(Animal): means "Dog is an Animal, plus customizations." Dog inherits __init__ from Animal (so we didn't need to define it). The speak method in Dog overrides the one in Animal.

Modern Python advice: prefer composition over inheritance. Inheritance is a tight coupling that bites later. Use it when the relationship is naturally "is-a" (Dog IS a Animal); reach for "has-a" (a Garage HAS a Car) by storing instances as attributes instead.

Exercise¶

In a new file shapes.py:

Define a class Rectangle with two attributes: width and height.
Add an __init__ taking both as parameters.
Add an area() method returning width * height.
Add a perimeter() method returning 2 * (width + height).
Create a Rectangle(5, 3). Print its area and perimeter. Expected: 15 and 16.
Now rewrite it as a @dataclass - should be ~5 lines.
Stretch: add a __repr__ (or let @dataclass give you one). Print a rectangle; confirm the output is readable.
Stretch: write a function larger_of(a, b) that returns whichever rectangle has the bigger area. Test with two rectangles.

What you might wonder¶

"Why self? Other languages use this." Convention from the language's first design (1991). The Python community settled on self; you'll see it everywhere. You can name it differently - this, obj, anything - but don't. Sticking to self is one of the strongest conventions in Python.

"What's __init__ vs __new__?" __init__ initializes an already-created object. __new__ actually creates the object. You will essentially never write __new__. Forget it for now.

"What if I don't write __init__?" You get a default one that takes no arguments. You can still set attributes after creation: p = Person(); p.name = "Alice". Useful sometimes; less clear than constructor-injection.

"Are there private attributes?" Not enforced. By convention, attributes starting with _ (one underscore) are "internal - don't touch from outside." Attributes starting with __ (two underscores) get name-mangled to discourage external access. Python's philosophy: "we're all consenting adults" - the convention is a contract, not a wall.

"Should I use @dataclass everywhere?" For "things with data and minor logic" - yes. For things with significant behavior, or that need custom validation, or that don't quite fit the dataclass mold - regular classes are fine. Mixing both in a project is normal.

Done¶

You can now: - Define your own types with class. - Use __init__ to set up new objects. - Read and write attributes via self.x. - Define methods that operate on self. - Add __repr__ for useful debug output. - Use @dataclass for the common "bundle of data" case. - Know that inheritance exists; prefer composition.

You can now model real things - people, accounts, points, rectangles, anything with structure. Combined with what came before, you can write programs that work with non-trivial domains.

Next page: how Python handles collections - many things at once.

Next: Collections →

06 - Collections¶

What this session is¶

About an hour. You'll learn Python's four built-in collection types: lists, tuples, dictionaries (dicts), and sets. Each is good at different things. Real Python code uses all four constantly.

Lists: ordered, mutable¶

fruits = ["apple", "banana", "cherry"]
print(fruits)            # ['apple', 'banana', 'cherry']
print(fruits[0])         # apple - lists are 0-indexed
print(fruits[2])         # cherry
print(len(fruits))       # 3

What's new:

["apple", "banana", "cherry"] - a list. Created with square brackets, comma-separated.
fruits[0] - read the first element. Indexing starts at 0.
len(fruits) - built-in function for the number of elements.
Negative indices count from the end: fruits[-1] is cherry, fruits[-2] is banana.

Common mistake: going off the end. fruits[10] raises IndexError. Always know how many elements you have.

Lists grow and shrink¶

fruits = ["apple", "banana"]
fruits.append("cherry")          # add to end
fruits.insert(0, "apricot")      # insert at index
fruits.remove("banana")          # remove by value
last = fruits.pop()              # remove and return last item
print(fruits)                    # ['apricot', 'apple']
print(last)                      # cherry

Lists are mutable - they change in place. fruits.append(...) modifies fruits; it doesn't return a new list. (Returns None, in fact.)

Slicing¶

A powerful Python feature - take a slice of a list:

nums = [10, 20, 30, 40, 50]
print(nums[1:4])     # [20, 30, 40] - start inclusive, stop exclusive
print(nums[:3])      # [10, 20, 30] - start defaults to 0
print(nums[2:])      # [30, 40, 50] - stop defaults to end
print(nums[::2])     # [10, 30, 50] - every other element
print(nums[::-1])    # [50, 40, 30, 20, 10] - reversed

Slicing returns a new list; the original is untouched. Works on strings too (a string is a sequence of characters).

Iterating a list¶

for fruit in fruits:
    print(fruit)

If you need the index too:

for i, fruit in enumerate(fruits):
    print(i, fruit)

enumerate yields (index, value) pairs. The shape for x, y in something is tuple unpacking (page 04) - Python takes the 2-element tuple and unpacks it into two names.

Tuples: ordered, immutable¶

point = (3, 4)
print(point[0])         # 3
print(point[1])         # 4
# point[0] = 99         # ERROR - tuples can't be modified

A tuple is a list that can't change. Created with parentheses (or just commas - point = 3, 4 is the same tuple).

Why bother? Three reasons: 1. Communicates "this won't change." A function returning (width, height) is signaling "you can rely on these two values together." 2. Usable as dictionary keys (lists aren't, because they could change underneath the hash). 3. Slightly faster than lists for fixed-size data.

You've seen tuples already: returning multiple values from a function (page 04) returns a tuple.

Tuple unpacking:

point = (3, 4)
x, y = point
print(x, y)     # 3 4

Dictionaries: lookups by key¶

ages = {"Alice": 30, "Bob": 25, "Chioma": 35}
print(ages["Alice"])      # 30
print(len(ages))          # 3

A dict maps keys to values. Created with {key: value, ...}. Keys can be strings, numbers, tuples (anything hashable - immutable types). Values can be anything.

ages["Dimeji"] = 40       # add
ages["Alice"] = 31        # update
del ages["Bob"]           # remove

Check whether a key is there:

if "Alice" in ages:
    print("found")

The safe lookup (no crash on missing key):

score = ages.get("Zara")              # None if missing
score = ages.get("Zara", 0)           # 0 if missing

Iterate:

for name in ages:
    print(name, ages[name])

for name, age in ages.items():        # name + value at once
    print(name, age)

for age in ages.values():             # just values
    print(age)

Modern Python (3.7+): dicts preserve insertion order. The order you put items in is the order you get them out.

Sets: unique, unordered¶

letters = {"a", "b", "c", "a"}     # duplicate dropped
print(letters)                     # {'a', 'b', 'c'} - order varies
print(len(letters))                # 3

A set is an unordered collection of unique values. Created with { ... } (or set() for empty - {} makes an empty dict, not an empty set, because braces had to mean something).

What sets are good at: - Membership checks ("a" in letters) - much faster than scanning a list when the collection is large. - De-duplication: set(my_list) gives you the unique values. - Set math: a | b (union), a & b (intersection), a - b (difference), a ^ b (symmetric difference).

weekday = {"mon", "tue", "wed", "thu", "fri"}
busy = {"mon", "wed", "fri"}
free = weekday - busy
print(free)                        # {'thu', 'tue'} - order varies

Quick comparison¶

Type	Syntax	Ordered?	Mutable?	Duplicates?	Use when
list	`[1, 2, 3]`	yes	yes	yes	ordered collection, will change
tuple	`(1, 2, 3)`	yes	no	yes	fixed group, won't change
dict	`{"a": 1}`	yes (3.7+)	yes	keys unique	lookup by key
set	`{1, 2, 3}`	no	yes	no	membership, unique values

Nested collections¶

You can put any type in any collection:

people = [
    {"name": "Alice", "age": 30},
    {"name": "Bob",   "age": 25},
]
print(people[0]["name"])     # Alice

A list of dicts is the most common shape - close to a JSON array of objects, which it often is.

Exercise¶

In a new file wordcount.py:

Write a program that counts how many times each word appears in a sentence.

Hardcode this sentence: "the quick brown fox jumps over the lazy dog the end".
Split it into words:
```
words = sentence.split()
```
.split() on a string with no arguments splits on whitespace, returning a list of strings.
Build a dict counts mapping each word to how many times it appeared.
Print each word and its count, one per line.

Expected output (order may differ since dict iteration is insertion-order):

the 3
quick 1
brown 1
fox 1
jumps 1
over 1
lazy 1
dog 1
end 1

Stretch: Sort the output by count (most-frequent first). Use:

sorted_items = sorted(counts.items(), key=lambda x: x[1], reverse=True)

We'll explain lambda properly in a later page; for now, that's "sort by the second element of each tuple, biggest first."

What you might wonder¶

"Why is {} an empty dict and set() an empty set?" Historical accident - dicts predate sets in Python. {} was already taken. Live with it.

"What's the difference between a tuple and a list?" Tuples can't change after creation; lists can. Use tuples for "this is one immutable group of related things" (like a coordinate); lists for "an evolving collection of items."

"Can dict keys be lists?" No - keys must be hashable, which essentially means immutable. Strings, numbers, tuples, frozen sets - yes. Lists, dicts, sets - no. Trying to use a list as a key raises TypeError.

"Why preserve insertion order in dicts? Other languages don't." Python's BDFL (Benevolent Dictator) was convinced after CPython's implementation incidentally preserved order in 3.6. Made official in 3.7. It's now relied on heavily, so it's permanent.

"What's a frozenset?" An immutable set. Usable as a dict key (since it's hashable, like a tuple). Rare; mention it for recognition.

Done¶

You can now: - Build ordered lists; index, slice, append, remove. - Build immutable tuples; unpack them. - Build dicts; set, get, check membership, iterate by keys/values/items. - Build sets; do membership checks, set operations, de-duplication. - Pick the right collection for the access pattern.

Collections are most of what Python code does - slicing data, looking it up, transforming it. You now have the basic vocabulary.

Next page: how Python handles things going wrong - exceptions.

Next: Errors and exceptions →

07 - Errors and Exceptions¶

What this session is¶

About an hour. You'll learn how Python handles things going wrong - files that don't exist, numbers that can't be parsed, keys missing from dicts. Python's model is exceptions: when something goes wrong, the language throws an exception that flies up the call stack until someone catches it.

Heads-up if you've come from a Go-style language: Python is the opposite. Functions don't return error values; they raise exceptions. The discipline is different.

A small example¶

n = int("hello")    # boom

Run it. Python prints something like:

Traceback (most recent call last):
  File "test.py", line 1, in <module>
    n = int("hello")
        ~~~^^^^^^^^^
ValueError: invalid literal for int() with base 10: 'hello'

ValueError is the type of exception. The error message says what happened. The "Traceback" shows the call chain that led to it. Read tracebacks bottom-up - the last line is what actually failed; the lines above show how you got there.

If nothing catches the exception, the program crashes.

Catching exceptions: `try`/`except`¶

try:
    n = int(input("Enter a number: "))
    print(f"You entered {n}")
except ValueError:
    print("That wasn't a number.")

How it reads: - Try the code in the try: block. - If it raises ValueError, jump to the except ValueError: block. - If no exception, skip the except.

Run it. Enter 42 - get "You entered 42". Run again, enter hello - get "That wasn't a number."

Catching multiple exception types¶

try:
    risky_thing()
except ValueError:
    handle_bad_value()
except (FileNotFoundError, PermissionError):
    handle_io_problem()
except Exception as e:
    print(f"Something else went wrong: {e}")

Patterns: - Multiple except clauses - first match wins (top to bottom). - Group exceptions in a tuple: except (A, B):. - as e captures the exception object - you can print it, log it, inspect it. - Exception is the catch-all. Use sparingly (you might swallow bugs you'd rather see).

The trap

Don't catch bare except: (no exception type). It also catches KeyboardInterrupt (Ctrl-C) and SystemExit - which means your program won't exit cleanly and Ctrl-C won't kill it. Always at minimum: except Exception:.

The full shape: `try`/`except`/`else`/`finally`¶

try:
    f = open("data.txt")
except FileNotFoundError:
    print("file not found")
else:
    # runs only if try succeeded with no exception
    contents = f.read()
    f.close()
finally:
    # runs no matter what (success, handled exception, unhandled exception)
    print("cleaning up")

In practice you'll mostly write try/except, sometimes try/except/finally. The else clause is occasionally useful - it lets you separate "the success path" from the try block clearly.

`with` statements: the modern way to clean up¶

The try/finally for "open a resource, use it, close it" gets old. Python's context managers + with statement automate the cleanup:

with open("data.txt") as f:
    contents = f.read()
## file is automatically closed here, even if an exception occurred

Reads as: "open data.txt as f, use it inside, automatically close when leaving the block." Any object that supports the with protocol (called a "context manager") works. You'll see it for files, network connections, database transactions, locks. Use with whenever you have a "must clean up after" resource.

Common exception types¶

You'll meet these often:

Exception	Meaning
`ValueError`	wrong value (e.g., `int("hello")`)
`TypeError`	wrong type (e.g., `len(42)`)
`KeyError`	dict key missing
`IndexError`	list index out of range
`AttributeError`	object doesn't have that attribute
`FileNotFoundError`	file doesn't exist
`ZeroDivisionError`	divided by zero
`RuntimeError`	generic "something went wrong"
`NotImplementedError`	placeholder for "I haven't written this yet"
`Exception`	base of all the above (catch-all)

When in doubt, look at the traceback's last line - it names the type.

Raising your own exceptions¶

You can throw an exception with raise:

def withdraw(balance, amount):
    if amount > balance:
        raise ValueError(f"can't withdraw {amount} from {balance}")
    return balance - amount

new_balance = withdraw(100, 200)   # raises ValueError

Use existing exception types when they fit (ValueError, TypeError, KeyError). Create your own when you want callers to catch this specific kind of failure:

class InsufficientFundsError(Exception):
    pass

def withdraw(balance, amount):
    if amount > balance:
        raise InsufficientFundsError(f"can't withdraw {amount} from {balance}")
    return balance - amount

try:
    withdraw(100, 200)
except InsufficientFundsError as e:
    print(f"transaction declined: {e}")

Custom exception classes inherit from Exception. The pass body means "no extra code; just be a distinct type."

Re-raising¶

Sometimes you want to catch, do something, then re-raise:

try:
    risky()
except ValueError as e:
    log.error(f"value error in risky(): {e}")
    raise   # re-raise the same exception, preserving the traceback

A bare raise inside an except block re-raises. Use when you want to add logging/context but not change the failure path.

EAFP vs LBYL¶

A Python idiom worth knowing: EAFP - "Easier to Ask Forgiveness than Permission."

The contrast: LBYL ("Look Before You Leap") - check preconditions, then do.

## LBYL - check first
if "Alice" in ages:
    print(ages["Alice"])
else:
    print("not found")

## EAFP - try, catch failure
try:
    print(ages["Alice"])
except KeyError:
    print("not found")

Both work. The Pythonic preference is EAFP for most cases - it's faster (no double lookup) and handles race conditions better (the key might disappear between the check and the use). LBYL is fine when the check is cheap and clearer to read.

A real example: safely parse an integer¶

def parse_positive(s: str) -> int:
    """Parse s as a positive integer. Raise ValueError on failure."""
    try:
        n = int(s)
    except ValueError:
        raise ValueError(f"not a number: {s!r}")
    if n <= 0:
        raise ValueError(f"must be positive: {n}")
    return n

Notice: we catch ValueError, then raise our own ValueError with a better message. The caller still sees a ValueError - but with context that the underlying int("hello") failure didn't have.

Exercise¶

In a new file parse.py:

Write parse_positive(s: str) -> int (above).
In the main script, loop over these inputs and call parse_positive on each. For each, print either the parsed number or the error message:
```
inputs = ["42", "hello", "-5", "0", "100"]
```
Use try/except. Print like: 42 -> 42 for success, hello -> error: not a number: 'hello' for failure.

Expected output:

42 -> 42
hello -> error: not a number: 'hello'
-5 -> error: must be positive: -5
0 -> error: must be positive: 0
100 -> 100

Stretch: write a custom exception BadInputError(Exception). Have parse_positive raise it instead of ValueError. Update the loop's except to catch BadInputError.

What you might wonder¶

"Should I always wrap things in try/except, just in case?" No. The Pythonic approach is to catch only what you can meaningfully recover from. try: x = int(input()) except ValueError: prompt_again() makes sense. try: x = 1 + 1 except Exception: is just noise - there's nothing to recover from. Let unexpected exceptions propagate and crash the program loudly; that's how you find bugs.

"What about assert?" assert condition raises AssertionError if the condition is false. Useful for "this should never happen - if it does, fail loudly so I notice." Not for input validation - assertions can be disabled with python -O (optimizations on), and you don't want validation to disappear in production.

"Why does Python use exceptions instead of error values like Go does?" Design choice. Exceptions hide control flow (a function call may secretly jump to a handler 10 frames up). The trade: less code in the happy path, but harder to see all failure paths. Different philosophies - Python's been exceptions-first since 1991. You learn it.

"What's the traceback chain thing I sometimes see?" If an exception happens inside an except block, Python prints both. The default link is "during handling of the above exception, another exception occurred." Useful for debugging cascading failures. You can also explicitly chain with raise NewError(...) from old_error.

Done¶

You can now: - Recognize exceptions and read tracebacks (bottom-up). - Catch exceptions with try/except. - Use try/except/else/finally correctly. - Use with statements for automatic cleanup. - Raise your own exceptions. - Define custom exception types. - Know the EAFP idiom (try, catch failure) vs LBYL.

You've now seen Python's distinctive failure-handling idiom. Real Python code is mostly: data shaping, control flow, function composition, exception handling, and the things on the next few pages.

Next page: Python's most distinctive positive feature - iterators, generators, and comprehensions.

Next: Iterators, generators, comprehensions →

08 - Iterators, Generators, Comprehensions¶

What this session is¶

About an hour and a half. This is the longest page so far because it's where Python's "secret sauce" lives. By the end you'll understand iterators (the protocol behind every for loop), generators (lazy iterators you write with yield), and comprehensions (Python's compact way to build a list, dict, or set from another collection).

Don't skip this page. Real Python code uses these everywhere.

The big idea: lazy sequences¶

When you write:

for x in range(1_000_000_000):
    do_something(x)

Python doesn't build a billion-element list in memory. range(1_000_000_000) is an iterator - an object that produces values one at a time, on demand. The for loop asks for the next value, uses it, throws it away, asks for the next. The memory cost stays constant.

This idea - lazy evaluation - runs through Python. Most "collection operations" return iterators, not lists, so you can chain them without materializing huge intermediate results.

The iterator protocol (briefly)¶

An iterator is any object with a __next__ method that produces the next value or raises StopIteration when there's nothing left. An iterable is something you can call iter(...) on to get an iterator - that's why lists, tuples, dicts, strings, files all work in for loops.

nums = [1, 2, 3]
it = iter(nums)
print(next(it))     # 1
print(next(it))     # 2
print(next(it))     # 3
print(next(it))     # StopIteration

You won't usually call iter/next by hand. for does it for you. The protocol exists so any object can opt in.

Generators with `yield`¶

Writing a class with __next__ and __iter__ is annoying. Python has a shortcut: a function that uses yield instead of return is a generator - it produces values lazily.

def count_up_to(n):
    i = 1
    while i <= n:
        yield i
        i += 1

for x in count_up_to(5):
    print(x)

Output:

What's new:

yield i - produce i as the next value, then pause. The next call resumes where we left off.
The function body acts like a coroutine - it runs a bit, yields, sleeps until asked again, runs a bit more.

Memory: only one value lives at a time. count_up_to(1_000_000) doesn't build a million-item list; it produces them one by one as needed.

A generator is one of Python's most powerful patterns. Use it whenever you want "produce a sequence of things lazily" - reading lines from a huge file, paginating an API, walking a tree.

A real-world example: reading a large file line by line is built in:

with open("huge_file.txt") as f:
    for line in f:           # f is iterable; yields one line at a time
        process(line)

If the file is 100 GB, this works - Python doesn't load it all into RAM.

List comprehensions¶

A list comprehension is a compact way to build a list from another iterable:

nums = [1, 2, 3, 4, 5]
squares = [n * n for n in nums]
print(squares)              # [1, 4, 9, 16, 25]

That one line replaces:

squares = []
for n in nums:
    squares.append(n * n)

Read [expression for var in iterable] as: "for each var in iterable, produce expression."

You can filter with if:

evens = [n for n in nums if n % 2 == 0]
print(evens)                # [2, 4]

Or both - transform AND filter:

squared_evens = [n * n for n in nums if n % 2 == 0]
print(squared_evens)        # [4, 16]

Read order is: outer-to-inner, just like the loop form. You can also nest:

matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [n for row in matrix for n in row]
print(flat)                 # [1, 2, 3, 4, 5, 6, 7, 8, 9]

(That nested form gets confusing fast. Two levels is fine; three is usually a sign to use a regular for loop for clarity.)

Dict and set comprehensions¶

Same idea, different brackets:

nums = [1, 2, 3, 4, 5]
squares_dict = {n: n * n for n in nums}
print(squares_dict)         # {1: 1, 2: 4, 3: 9, 4: 16, 5: 25}

unique_lengths = {len(w) for w in ["foo", "bar", "baz", "quux"]}
print(unique_lengths)       # {3, 4}

Generator expressions¶

Parentheses instead of brackets make a generator expression - same syntax, but lazy:

nums = [1, 2, 3, 4, 5]
squares_gen = (n * n for n in nums)
print(squares_gen)          # <generator object ...>
print(next(squares_gen))    # 1
print(list(squares_gen))    # [4, 9, 16, 25] - rest, consumed in one go

Use generator expressions when the consumer will iterate once and you want to save memory:

total = sum(n * n for n in range(1_000_000))

That doesn't build a million-item list - it streams the squares through sum. Same result, constant memory.

Useful built-ins that work with iterators¶

These appear in real code constantly:

sum(iter) - total.
min(iter), max(iter) - extremes.
len(seq) - count (for sequences, not all iterators).
sorted(iter) - returns a sorted list. Optional key=lambda x: ... and reverse=True.
reversed(seq) - iterator over the sequence backward.
zip(a, b) - pair up two iterables: zip([1,2,3], ["a","b","c"]) yields (1, "a"), (2, "b"), (3, "c").
enumerate(iter) - yields (index, value) pairs (page 06).
any(iter) / all(iter) - short-circuit "is any/all true?".

A working example:

names = ["Alice", "Bob", "Chioma"]
ages = [30, 25, 35]
adults = [(name, age) for name, age in zip(names, ages) if age >= 18]
print(adults)               # [('Alice', 30), ('Bob', 25), ('Chioma', 35)]

`lambda`: anonymous functions¶

Briefly: lambda x: expression is a short way to write a function inline. Used with sorted, filter, map, etc.

words = ["banana", "apple", "cherry"]
sorted_by_length = sorted(words, key=lambda w: len(w))
print(sorted_by_length)     # ['apple', 'banana', 'cherry']

lambda w: len(w) is a function taking w and returning len(w). Used as the sort key - sort by length.

Rule of thumb: use lambda only for one-line transformations. If it's longer than that, def a named function.

Exercise¶

In a new file iter_practice.py:

Use a list comprehension to build a list of the first 20 cubes (1³, 2³, ..., 20³). Print it.
Use a list comprehension with a filter to build the list of cubes that are even. Print it.
Use a dict comprehension to build {n: n³ for n in range(1, 11)}. Print it.
Write a generator function fib(n) that yields the first n Fibonacci numbers (1, 1, 2, 3, 5, 8, 13, ...). Loop over fib(10) and print each.
Use sum and a generator expression to compute the sum of cubes from 1 to 100. (Expected: 25502500.) Don't build a list.
Stretch: Open 01-setup.md from this path (if you have it locally) or any text file. Use a with block and a generator-style iteration to count lines without loading the whole file into memory.

What you might wonder¶

"Why bother with generators if comprehensions exist?" Two reasons. (1) Memory - a generator can produce billions of items without building a billion-item list. (2) Composability - generators can read from other generators, building pipelines like sum(x*x for x in filter(is_even, range(1_000_000))).

"List comprehension vs for loop - which?" If the transformation is a single expression, the comprehension is more readable. If it has multiple statements (mutating state, multiple side effects, complex logic), use the for loop. Comprehensions should produce a collection, not do work.

"Generator vs list comprehension?" If you'll iterate once: generator (saves memory). If you need to index, length, or iterate multiple times: list (you'll need it materialized eventually).

"What's map and filter?" Older Python idioms: map(func, iter) returns an iterator of func(x) for each x; filter(func, iter) keeps only items where func(x) is truthy. Replaced in modern style by comprehensions: (func(x) for x in iter) and (x for x in iter if func(x)). You'll see map/filter in older code; recognize them, prefer comprehensions in new code.

"Are list comprehensions hard to read?" Two-clause ones (transform + optional filter) - easy. Triple-nested ones - yes, often. The fix: if a comprehension reads like noise, write it as a loop. Clarity beats compactness.

Done¶

You can now: - Recognize iterators and iterables. - Write generator functions with yield. - Build lists, dicts, and sets with comprehensions. - Write generator expressions for streaming computation. - Use built-ins like zip, enumerate, sorted, sum correctly. - Use lambda for one-line transformations.

These features make Python feel different from most languages. Real Python code uses them on every page. Internalizing them is what separates "I write Python" from "I write Pythonic code."

Next page: working with files and the standard library.

Next: Files and the standard library →

09 - Files and the Standard Library¶

What this session is¶

About an hour. You'll learn how to read and write files, work with file paths portably, parse JSON, handle dates and times, and get acquainted with Python's standard library - the giant collection of useful modules that ship with Python itself.

The standard library is one of Python's biggest selling points. "Batteries included" is the slogan. Whatever you need - HTTP, JSON, CSV, dates, sockets, subprocesses, regular expressions, threading - there's a module for it.

Reading a file¶

The basic pattern (already met in page 07):

with open("notes.txt") as f:
    contents = f.read()
print(contents)

open returns a file object; with closes it automatically when the block ends. .read() returns the entire contents as one string.

For large files, iterate line by line - Python streams (page 08):

with open("notes.txt") as f:
    for line in f:
        print(line.rstrip())   # rstrip removes trailing newline

To get a list of lines:

with open("notes.txt") as f:
    lines = f.readlines()      # list of strings, each with newline

Writing a file¶

Open in write mode ("w"):

with open("output.txt", "w") as f:
    f.write("hello, world\n")
    f.write("second line\n")

"w" truncates the file if it exists (you lose what was there). For append, use "a". For read-and-write, "r+".

You can also print to a file:

with open("output.txt", "w") as f:
    print("hello, world", file=f)
    print("second line", file=f)

print adds a newline automatically (which is sometimes nicer than f.write with manual \n).

Text vs binary mode¶

open(path) opens in text mode by default - Python decodes bytes to a string. For binary data (images, archives, anything non-text), open in binary mode ("rb", "wb"):

with open("photo.jpg", "rb") as f:
    data = f.read()       # bytes, not str
print(type(data))         # <class 'bytes'>
print(len(data))          # size in bytes

When in doubt: text mode for text, binary for everything else.

File paths: `pathlib`¶

You'll see two ways to handle paths in Python:

Old way - strings + os.path:

import os
path = os.path.join("data", "files", "notes.txt")
exists = os.path.exists(path)
parent = os.path.dirname(path)

Modern way - pathlib:

from pathlib import Path
path = Path("data") / "files" / "notes.txt"
exists = path.exists()
parent = path.parent

pathlib's Path object overrides / to mean "join path components." That makes building paths intuitive. It also has useful methods:

p = Path("notes.txt")
p.read_text()                    # whole file as str
p.write_text("hello")            # write str to file (creates file)
p.read_bytes()                   # whole file as bytes
p.exists()                       # does it exist?
p.is_file()                      # is it a file?
p.is_dir()                       # is it a directory?
p.stat().st_size                 # size in bytes
p.suffix                         # ".txt"
p.stem                           # "notes"
p.parent                         # Path("")
for child in Path(".").iterdir():
    print(child)                 # every file/dir in current folder

Use pathlib for all new code. It's clearer than os.path and works portably across Windows / macOS / Linux.

JSON: structured data on disk and over the wire¶

Most APIs and config files use JSON. Python has a built-in json module:

import json

# Python -> JSON string
data = {"name": "Alice", "age": 30, "languages": ["Python", "Go"]}
text = json.dumps(data)
print(text)              # {"name": "Alice", "age": 30, "languages": ["Python", "Go"]}

# JSON string -> Python
loaded = json.loads(text)
print(loaded)            # {'name': 'Alice', 'age': 30, 'languages': ['Python', 'Go']}
print(loaded["age"])     # 30

Reading/writing JSON files:

import json
from pathlib import Path

# Write
data = {"name": "Alice", "age": 30}
Path("data.json").write_text(json.dumps(data, indent=2))

# Read
loaded = json.loads(Path("data.json").read_text())
print(loaded)

indent=2 makes the output human-readable (pretty-printed).

JSON maps cleanly to Python: - JSON object → Python dict. - JSON array → Python list. - JSON string → str. - JSON number → int or float. - JSON true/false/null → Python True/False/None.

Anything that can be expressed in JSON can round-trip through json.dumps/json.loads. Things that can't: dates, custom classes, sets, tuples (become lists). For complex types, you wire custom encoders or use a richer format (msgpack, pickle, protobuf).

Dates and times: `datetime`¶

The standard module is datetime:

from datetime import datetime, timedelta, timezone

now = datetime.now(timezone.utc)
print(now)                  # 2026-05-17 14:23:45.123456+00:00

# Construct a specific time
launch = datetime(2026, 12, 1, 9, 0, 0, tzinfo=timezone.utc)
print(launch)               # 2026-12-01 09:00:00+00:00

# Arithmetic
diff = launch - now
print(diff)                 # 197 days, 18:36:14.876544
print(diff.days)            # 197

# Add/subtract
later = now + timedelta(hours=3, minutes=15)
print(later)

# Format
print(now.strftime("%Y-%m-%d %H:%M"))         # "2026-05-17 14:23"

# Parse
parsed = datetime.strptime("2026-05-17 14:23", "%Y-%m-%d %H:%M")
print(parsed)

Lessons: - Always use timezone-aware datetimes for anything that touches more than one machine. datetime.now() without timezone.utc is "naive" - no zone info - and silently breaks across timezones. - timedelta is the type for durations. Use it for arithmetic. - strftime formats; strptime parses. The format codes (%Y, %m, %d, %H, ...) are the same as C's strftime.

Other essential standard library modules¶

You don't need to learn these all now - just know they exist. Each is a python -m <name> away (the module name is the import name).

Module	What it does
`os`	OS-level operations (env vars, processes, working directory)
`sys`	Python interpreter info, argv, stdin/stdout/stderr
`pathlib`	File paths (you met it)
`json`	JSON encoding/decoding (you met it)
`datetime`	Dates and times (you met it)
`csv`	CSV files
`re`	Regular expressions
`urllib`, `urllib.request`	Basic HTTP (use `requests` or `httpx` for anything serious)
`http.server`	A simple HTTP server. `python -m http.server 8000` serves the current directory.
`subprocess`	Run external commands
`argparse`	Command-line argument parsing
`logging`	Structured logging
`collections`	Specialized data types (`Counter`, `defaultdict`, `deque`, `namedtuple`)
`itertools`	Combinatorics and iterator helpers (`chain`, `groupby`, `combinations`, `product`)
`functools`	Higher-order function helpers (`partial`, `reduce`, `lru_cache`)
`unittest`	Built-in testing framework (most code uses pytest instead)

Two especially useful ones to know about:

collections.Counter - count things easily:

from collections import Counter
words = "the quick brown fox jumps over the lazy dog the end".split()
counts = Counter(words)
print(counts)               # Counter({'the': 3, 'quick': 1, ...})
print(counts.most_common(2))  # [('the', 3), ('quick', 1)]

(Compare to your page 06 wordcount exercise - Counter is the one-liner.)

itertools - useful iterator combinators:

from itertools import chain, groupby, combinations
list(chain([1,2,3], [4,5,6]))             # [1, 2, 3, 4, 5, 6]
list(combinations([1,2,3,4], 2))          # [(1,2), (1,3), (1,4), (2,3), (2,4), (3,4)]

Exercise¶

In a new file summarize.py:

Write a program that:

Reads a small JSON file events.json containing a list of events, where each event has name, timestamp (ISO format like "2026-05-17T14:23:00+00:00"), and severity ("low", "medium", "high").

Example data:

[
  {"name": "login", "timestamp": "2026-05-17T08:00:00+00:00", "severity": "low"},
  {"name": "error", "timestamp": "2026-05-17T08:15:00+00:00", "severity": "high"},
  {"name": "login", "timestamp": "2026-05-17T09:00:00+00:00", "severity": "low"},
  {"name": "error", "timestamp": "2026-05-17T10:30:00+00:00", "severity": "high"},
  {"name": "warn",  "timestamp": "2026-05-17T11:00:00+00:00", "severity": "medium"}
]

Save this as events.json first.

Parses the timestamps with datetime.fromisoformat.
Counts events by severity (Counter). Print the result.
Finds the earliest and latest event timestamps. Print them.
Writes a summary to summary.json containing:
total: total event count.
by_severity: the counts.
first: ISO timestamp of the earliest.
last: ISO timestamp of the latest.

Use pathlib, json, datetime, and collections.Counter.

What you might wonder¶

"Why two ways to do file paths?" History. os.path is the original; pathlib was added in 3.4 and is now the recommended way. Old code uses os.path; new code uses pathlib. You'll see both.

"Are there standard library bits I should NOT use?" A few. urllib.request for HTTP is awkward - use the third-party httpx or requests instead. xml.etree is OK but lxml is faster for serious XML work. pickle is convenient but unsafe - never pickle.loads untrusted data. (Pickle can execute arbitrary code during deserialization - security CVE territory.)

"What's the right way to do dates?" For business logic: datetime with explicit UTC. For date-only (no time): datetime.date. For more complex calendar work or natural-language parsing: third-party arrow or pendulum. Avoid naive datetimes (no timezone) like the plague.

"How big should my standard library tour be?" Don't try to read all of it. Skim the index at docs.python.org/3/library/ once so you know what categories exist. Then look up specifics when you have a real need.

Done¶

You can now: - Read and write text and binary files with with open(...). - Manipulate file paths portably with pathlib. - Encode and decode JSON for both API payloads and config files. - Work with timezone-aware datetimes and durations. - Reach for Counter, defaultdict, itertools when they fit. - Know that the standard library is huge and worth skimming.

You can now do practical I/O work. Next page: testing your own code with pytest.

Next: Tests →

10 - Tests¶

What this session is¶

About an hour. You'll learn how to write tests for your Python code with pytest - the standard test framework. By the end you can verify your own code works, watch it fail when you break it, and read the tests in any Python OSS project to understand what the code is supposed to do.

Why tests¶

When you change code, you might break something that used to work. The change you made looks fine. The thing that broke is in a file you haven't opened in three weeks. Without tests, you find out when a user does.

A test is a small program that calls your code with known inputs and checks the outputs match expectations. You run them after every change. If they pass, you keep going. If one fails, you know what broke.

This sounds obvious. Beginner programmers skip it for years because it feels like extra work. It isn't. It's the work that prevents three hours of debugging next week.

Install pytest¶

Python ships with a built-in unittest framework, but the wider community uses pytest - friendlier syntax, better error messages, more flexibility. Install in your active venv:

pip install pytest

Verify:

pytest --version

Your first test¶

Create a folder:

mkdir -p ~/code/python-learning/mathutils && cd ~/code/python-learning/mathutils

Create a mathutils.py:

def add(a, b):
    return a + b

def is_even(n):
    return n % 2 == 0

Create test_mathutils.py (note the test_ prefix - pytest auto-discovers these):

from mathutils import add, is_even

def test_add():
    assert add(2, 3) == 5

def test_is_even():
    assert is_even(4)
    assert not is_even(7)

Run:

pytest

You should see:

============================ test session starts ============================
collected 2 items

test_mathutils.py ..                                              [100%]

============================= 2 passed in 0.01s =============================

Each . is a passing test. [100%] means all of them passed.

The mechanics¶

File naming: files must start with test_ (or end with _test). pytest finds them automatically.
Function naming: test functions must start with test_.
Assertions: plain Python assert. No special API. pytest rewrites failed assert statements to give you rich error messages.

That's it. No setUp, no test classes, no special inheritance. The simplest possible thing that works.

Watching a test fail (do this)¶

Open mathutils.py. Change add to return a - b. Save. Run pytest.

You should see:

============================== FAILURES ==============================
______________________________ test_add ______________________________

    def test_add():
>       assert add(2, 3) == 5
E       assert -1 == 5
E        +  where -1 = add(2, 3)

test_mathutils.py:4: AssertionError
=================== short test summary info ==========================
FAILED test_mathutils.py::test_add - assert -1 == 5

Notice how informative: it shows the line that failed, the actual value (-1), the expected value (5), and which call produced what. Pytest's assert rewriting is what gives you this.

Change add back. Re-run. Green again.

Parametrize: many cases, one function¶

When you have many cases for the same function, don't write test_x_1, test_x_2. Parametrize:

import pytest
from mathutils import is_even

@pytest.mark.parametrize("n, expected", [
    (0, True),
    (1, False),
    (2, True),
    (-4, True),
    (-7, False),
    (1000, True),
])
def test_is_even(n, expected):
    assert is_even(n) == expected

What's happening:

@pytest.mark.parametrize runs the same test multiple times with different inputs.
First argument: a string naming the parameters.
Second argument: a list of tuples - one tuple per case.
pytest generates one test per case, each with a distinct name like test_is_even[2-True].

This is the idiomatic Python testing shape. You'll see it in 80% of test files.

Run with pytest -v to see each case named individually.

Fixtures: shared setup¶

Many tests need the same setup - a temporary file, a fresh database connection, a particular object. Pytest has fixtures for this:

import pytest

@pytest.fixture
def sample_data():
    return {"name": "Alice", "age": 30}

def test_name(sample_data):
    assert sample_data["name"] == "Alice"

def test_age(sample_data):
    assert sample_data["age"] == 30

Fixtures are functions decorated with @pytest.fixture. Tests "request" them by listing them as parameters. Pytest runs the fixture, passes the return value to the test.

Built-in fixtures you'll meet often: - tmp_path - a unique temp directory (pathlib.Path). Cleaned up after the test. - monkeypatch - modify env vars, attributes, dict items; auto-undone after the test. - capsys - capture print output for assertion.

Example: testing a function that reads a file.

def test_read_file(tmp_path):
    p = tmp_path / "data.txt"
    p.write_text("hello")
    assert read_file(p) == "hello"

tmp_path gives you a fresh, isolated directory; you write a file there, run your code against it, the directory disappears after. No cleanup boilerplate, no global state.

Testing exceptions¶

Use pytest.raises:

import pytest
from mathutils import divide

def test_divide_by_zero():
    with pytest.raises(ZeroDivisionError):
        divide(10, 0)

The test passes if the call inside the with block raises ZeroDivisionError. It fails if no exception is raised, or a different type is raised.

You can also assert on the message:

def test_divide_by_zero_message():
    with pytest.raises(ZeroDivisionError, match="division by zero"):
        divide(10, 0)

match is a regex applied to the error message.

Useful pytest commands¶

Command	What it does
`pytest`	Run all tests in the current directory and subdirectories.
`pytest -v`	Verbose - show each test by name.
`pytest -x`	Stop at the first failure.
`pytest -k pattern`	Run only tests whose name matches the pattern.
`pytest path/to/test_file.py`	Run one file.
`pytest test_x.py::test_func`	Run one function.
`pytest --tb=short`	Compact tracebacks.
`pytest --pdb`	Drop into the Python debugger on failure.
`pytest -q`	Quiet - minimal output.
`pytest --collect-only`	List what would run without running it.

pytest -v is the most useful during development.

Running tests as you change code¶

Install pytest-watch and let it re-run tests every time you save a file:

pip install pytest-watch
ptw                # in the project root

Or simpler: run pytest -v after every save. The instant feedback loop is the productive way to work.

A note on coverage¶

pytest-cov shows what percentage of your code your tests touch:

pip install pytest-cov
pytest --cov=mathutils

100% coverage is a misleading goal - you can hit it with tests that don't actually catch bugs. A better target: "every code path has at least one test, and every bug fix gets a regression test." Coverage gives you a floor, not a ceiling.

Exercise¶

Set up and test a small package.

Make a folder ~/code/python-learning/wordtools and cd in.

Create wordtools.py:

def word_count(s: str) -> int:
    return len(s.split())

def is_palindrome(s: str) -> bool:
    s = s.lower()
    return s == s[::-1]

Create test_wordtools.py. Write parameterized tests for both:
word_count: "" → 0, "hello" → 1, "hello world" → 2, " many spaces here " → 3.
is_palindrome: "" → True, "a" → True, "racecar" → True, "hello" → False, "Racecar" → True (lowercase first).
Run pytest -v. All tests should pass.
Break each function on purpose, watch the relevant test fail, fix it, watch it pass.
Stretch: add a function most_common_word(s: str) -> str (returns the word appearing most). Use collections.Counter. Write a parametrized test for it, including a tie-breaking case.

What you might wonder¶

"Where do tests live in real projects?" Three common layouts: - Next to the code (mathutils.py, test_mathutils.py in the same folder). Common for small projects. - In a tests/ directory at the top level, mirroring the source layout. Common for medium-to-large projects. - In src/<package>/ + tests/ (the "src layout"). The modern best-practice. Avoids a class of import bugs.

All three are valid. The README or pytest configuration tells you which a project uses.

"What about unittest, the stdlib framework?" Older. More boilerplate (test classes, self.assertEqual). Some projects (especially within Python itself) use it. Recognize it; prefer pytest for new code.

"Should I write the test first or the code first?" Either works. Any tests are infinitely better than no tests. Start by writing the code, then writing a test. After a few months, try writing the test first sometimes; see which feels better.

"How much testing is enough?" A useful heuristic: every bug you fix gets a test that would have caught it. Every important code path has at least one test. Don't chase 100% coverage; chase confidence.

"What about mocking?" Mocking means replacing real dependencies (databases, APIs) with fake ones during a test. The stdlib unittest.mock is the standard tool; pytest has pytest-mock as a nicer wrapper. Use sparingly - overuse leads to tests that pass on broken code.

Done¶

You can now: - Install pytest and write tests in test_*.py files. - Use assert with rich pytest error messages. - Parametrize tests with @pytest.mark.parametrize. - Share setup via fixtures (including tmp_path, monkeypatch, capsys). - Test that exceptions are raised with pytest.raises. - Drive pytest from the command line for fast iteration.

You can now verify your own code. More importantly, you can read the test files in any real Python project and understand what they're checking - that's most of what makes a real codebase legible.

Next page: how Python projects are organized into modules and packages, and how to use code other people wrote.

Next: Modules, packages, pip, venv →

11 - Modules, Packages, pip, venv¶

What this session is¶

About an hour. You'll learn how Python code is organized (modules and packages), how to use code other people wrote (pip and PyPI), how virtual environments isolate dependencies, and what pyproject.toml is for. This is the page that bridges you from "I write standalone scripts" to "I work with real codebases."

A module is a file¶

Any .py file IS a module. The filename (without .py) is the module name. To use code from another file, import it.

Make a folder ~/code/python-learning/greetapp/. Inside, two files:

greet.py:

def hello(name):
    return f"Hello, {name}!"

def _internal(name):
    return f"(internal) {name}"

main.py:

import greet

print(greet.hello("Alice"))         # Hello, Alice!
print(greet._internal("Alice"))     # works, but you shouldn't (see below)

Run from the greetapp/ folder:

python main.py

Three import shapes you'll see:

import greet                              # use as greet.hello
from greet import hello                   # use directly as hello
from greet import hello as say_hi         # rename on import
from greet import *                       # import everything (avoid)

The * form pulls in everything not starting with _ (and pollutes your namespace). Avoid it in real code; you'll see it occasionally in scripts.

The leading-underscore convention¶

Names starting with _ (one underscore) are conventionally private - "internal use, don't touch from outside." Python doesn't enforce this; it's a contract.

def public_thing():       # use freely
    pass

def _internal_thing():    # "don't use from outside this module"
    pass

Names with __ (two underscores) at the start of a class trigger name mangling - Python rewrites them to discourage external access. You don't need to write __names yourself for a while.

The slogan: "we're all consenting adults." Python trusts you to respect the contract.

A package is a folder of modules¶

When you have several related modules, group them in a folder. Add an __init__.py to make it a package:

greetapp/
├── main.py
└── greet/
    ├── __init__.py
    ├── english.py
    └── yoruba.py

greet/english.py:

def hello(name):
    return f"Hello, {name}!"

greet/yoruba.py:

def hello(name):
    return f"Bawo ni, {name}!"

greet/__init__.py (can be empty, or can re-export):

from .english import hello as hello_english
from .yoruba import hello as hello_yoruba

main.py:

from greet import hello_english, hello_yoruba

print(hello_english("Alice"))     # Hello, Alice!
print(hello_yoruba("Alice"))      # Bawo ni, Alice!

The .english in the __init__.py is a relative import - "from this package's english module." Use relative imports inside packages; absolute imports (from greetapp.greet.english import ...) work too, but get verbose.

Modern Python (3.3+) actually allows packages without __init__.py ("namespace packages") - but writing an __init__.py is still the safer, more explicit choice.

pip and PyPI¶

PyPI (Python Package Index, pypi.org) hosts hundreds of thousands of third-party libraries. pip is the tool that installs them.

Inside your active venv (you remembered to activate, right?):

pip install requests

That downloads requests (a popular HTTP library) and its dependencies into your venv. Now you can:

import requests
response = requests.get("https://api.github.com/users/octocat")
print(response.json()["name"])

Useful pip commands:

Command	What it does
`pip install <pkg>`	Install a package.
`pip install <pkg>==1.2.3`	Pin to a specific version.
`pip install -U <pkg>`	Upgrade to latest.
`pip install -r requirements.txt`	Install from a requirements file.
`pip list`	List installed packages.
`pip show <pkg>`	Show details (version, location, dependencies).
`pip uninstall <pkg>`	Remove a package.
`pip freeze`	Print all installed packages with exact versions (suitable as a `requirements.txt`).

pip freeze > requirements.txt is the old way to capture exact versions for reproducibility. Modern projects use pyproject.toml + a lockfile instead.

`requirements.txt`: the traditional way¶

A simple text file listing dependencies:

requests>=2.31.0
pytest>=8.0.0
httpx

Install everything with pip install -r requirements.txt. Common in older projects and quick scripts.

`pyproject.toml`: the modern way¶

Modern Python projects use a single pyproject.toml file at the project root for everything: dependencies, build configuration, tool settings.

[project]
name = "myapp"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
    "requests>=2.31.0",
    "httpx",
]

[project.optional-dependencies]
dev = [
    "pytest>=8.0.0",
    "ruff",
    "mypy",
]

With this:

pip install -e .          # install your project + its dependencies
pip install -e .[dev]     # also install dev dependencies

The -e (editable) install means: install in development mode - Python imports your local code, so edits show up immediately without reinstalling.

`uv`: the modern alternative to pip¶

uv (astral.sh/uv) is a Rust-implemented replacement for pip + venv + a lot more. ~10-100× faster than pip. The 2026 default for new Python projects.

pip install uv
uv venv                     # create .venv
source .venv/bin/activate
uv pip install requests     # install (much faster than pip)
uv pip install -r requirements.txt
uv add httpx                # add to pyproject.toml + install
uv lock                     # generate lockfile
uv sync                     # install exactly what's in the lockfile

If you start a new project, use uv. If you're working in an existing project that uses pip, keep using pip - don't mix tools mid-project.

Lockfiles¶

A lockfile records the exact version of every direct and transitive dependency. Reproducible installs: if I have your lockfile, I get exactly the same package versions you have.

Tools that produce lockfiles: - pip-tools (pip-compile → requirements.txt with pinned versions). - poetry (poetry.lock). - uv (uv.lock). - pipenv (Pipfile.lock) - older, less used now.

For applications (something you deploy): always use a lockfile. For libraries (something other people import): don't ship a lockfile; let downstream users resolve.

Standard project layout¶

A typical small Python project:

myapp/
├── README.md
├── LICENSE
├── pyproject.toml
├── src/
│   └── myapp/
│       ├── __init__.py
│       ├── core.py
│       └── cli.py
├── tests/
│   ├── test_core.py
│   └── test_cli.py
└── .gitignore

The src/myapp/ layout (vs putting myapp/ at the top level) prevents a class of import bugs and is the modern recommendation. Old projects often have myapp/ at the top.

Exercise¶

Two parts.

Part 1 - your own multi-module package:

Create ~/code/python-learning/bank/. cd in. Create a venv.

Create a package:

mkdir bank
touch bank/__init__.py
touch bank/account.py

In bank/account.py, define a dataclass Account with owner and balance fields, plus deposit(amount) and withdraw(amount) methods (page 05).
Create main.py at the top level that imports from bank.account import Account, creates one, deposits and withdraws.
Run python main.py.

Part 2 - install and use a third-party library:

Activate the same venv.
pip install requests.
Write a script that fetches https://api.github.com/users/octocat and prints the user's name and bio.
pip freeze > requirements.txt. Open the file. Find the requests line and its version.
Stretch: create a pyproject.toml with requests as a dependency. pip install -e .. Confirm it works.

What you might wonder¶

"Why do I need both requests and httpx libraries?" requests is the venerable HTTP client - battle-tested, sync only. httpx is the modern one - sync + async, HTTP/2, mostly drop-in compatible. New projects often pick httpx; existing projects keep requests.

"What's the difference between a script and a package?" A script is a single .py file you run. A package is a structured folder you import. Scripts grow into packages when they become unwieldy.

"Why so many ways to manage dependencies?" Python's packaging history is messy. Each generation tried to fix the last. pip → pip + virtualenv → pipenv → poetry → uv. As of 2026, uv is the front-runner; the others still work.

"What if I pip install something globally by accident?" Probably fine; uninstall it (pip uninstall <pkg>) and try again in a venv. Sometimes you'll need --user or sudo issues on Linux. The fix is always: activate a venv first.

"What's conda / Anaconda?" A separate package manager popular in scientific computing. Manages Python itself + non-Python dependencies (C libraries). Different ecosystem from pip/PyPI. If you'll do heavy data-science work involving compiled scientific libraries (NumPy, scikit-learn, JAX), conda is sometimes easier; otherwise stick with pip/uv.

Done¶

You can now: - Split code across modules (.py files) and import between them. - Group modules into packages with __init__.py. - Install third-party libraries with pip (or uv). - Use requirements.txt or pyproject.toml for dependency declarations. - Recognize the standard project layout. - Know the convention: leading _ is "internal."

You've now covered every fundamental Python concept needed to read and write real codebases. The remaining pages are about applying them - reading real OSS code, picking a project, contributing.

Next: Reading other people's code →

12 - Reading Other People's Code¶

What this session is¶

About 45 minutes. You'll learn the strategy for reading code you didn't write - a different skill from writing your own. This page has less code than usual; what it teaches is how to approach a new codebase without drowning.

The mistake most beginners make¶

When you open a new codebase, the temptation is to start reading the first file you see and try to understand every line. By line 50 you're lost; by line 200 you've given up.

This doesn't work because real code isn't a story - it's a graph. Every function calls others. Every class is defined somewhere else. Trying to load it all into your head at once is impossible, even for experienced engineers.

The trick is to not try. Pick a small thread; follow only it; let the rest stay fuzzy.

The five-minute orientation¶

Whenever you open a new Python project, do exactly this, in order:

Read the README. What does this project DO? What is the one-sentence elevator pitch? If you can't answer this, the project is too unfinished - pick another.
List the top-level directories and files. Common layout:
README.md, LICENSE, pyproject.toml, requirements.txt, .gitignore - meta.
src/<package>/ - the actual code. Modern projects use src/; older ones may have <package>/ at the top.
tests/ - tests.
docs/ - documentation source.
examples/ - runnable usage examples.
.github/ - GitHub workflows, issue templates.
scripts/ - helper scripts.
Open pyproject.toml (or setup.py/setup.cfg for older projects). What's the package name? What are the dependencies? This tells you which ecosystem the project lives in.
Find the entry point. For a CLI tool, look at pyproject.toml's [project.scripts] section - it tells you where main lives. For a library, the entry point is the top-level package's __init__.py. Read that file - it often re-exports the public API and tells you the project's shape.
Read one test file. Pick a small test_*.py and read it. Tests show you what the code is supposed to do, with concrete examples. Often clearer than the code being tested.

After this five-minute pass, you should be able to write a one-paragraph summary of what the project does. If you can't, repeat.

Tools for reading¶

A few things make reading 10× faster:

help(thing) - show docstrings inside the REPL.

>>> import json
>>> help(json.dumps)

python -m pydoc <name> - same docs from the command line.

Your editor's "Go to definition" / "Find references." Right-click a name → "Go to Definition" jumps to where it's defined. "Find All References" shows everywhere it's used. This is how you trace a name through a project quickly.

grep -r 'pattern' . - old-school but unbeatable. Find every place a string appears.

pytest -k <pattern> -v - run one specific test. Tests are the most reliable "what does this actually do?" diagnostic.

Reading recent merged PRs on GitHub. PRs are bite-sized - a few files, a clear description, a discussion. Often the best way to understand a project is to read its five most recent merged PRs.

A real session (worked example)¶

Let's read a piece of a real Python project: the standard library's json.dumps function. Pretend this is a project we just opened.

Step 1: what does it do?

>>> import json
>>> help(json.dumps)

Output starts: "Serialize obj to a JSON formatted str." Clear: takes a Python value, returns a JSON string.

Step 2: where is it defined?

Command-click on json.dumps in your editor, or look in the standard library: $(python -c "import json; print(json.__file__)") is the path. You'll find dumps defined in json/__init__.py. It's a thin wrapper that creates a JSONEncoder and calls its .encode().

Step 3: follow one thread.

Open json/encoder.py. Read the top docstring and the JSONEncoder.encode method. Don't try to understand the C-accelerated fast path at the bottom. Recognize: "it traverses the value tree and emits JSON text."

Step 4: confirm with a test.

Find Lib/test/test_json/. Open test_dump.py. Read a few test cases. Now you know - and have verified you know - what dumps does.

Step 5: write the one-line summary.

json.dumps(obj) returns a JSON string for any Python value that maps cleanly to JSON (dicts, lists, strings, numbers, booleans, None). Implementation is in json/encoder.py.

That whole investigation took ~5 minutes. You did not understand every byte of the C extension. That's fine. You understood enough to use it.

Things you will see that look scary¶

Real codebases use language features you haven't met yet. A few common ones with "don't panic" notes:

Decorators (@decorator) - a function that wraps another function. You met @dataclass and @pytest.fixture. There are many: @property (turn a method into an attribute-like access), @staticmethod/@classmethod, framework-specific decorators (@app.route in Flask, @app.command in Typer). For reading: a decorator is "this function gets wrapped by that one." Don't worry about the wrapping mechanics; recognize what each decorator commonly means.
Type hints with generics - list[int], dict[str, int], Optional[str], Union[X, Y], Callable[..., T]. Read them like declarations: "a list of ints," "a string-to-int dict," "a string or None."
async def / await - asynchronous code. Used heavily in modern web (FastAPI, Starlette) and async libraries (httpx async client). For reading: async def f is a function that returns a coroutine; await x waits for an async operation to finish. You can read async code linearly; just notice the awaits.
Context managers (with) - you met them. When you see with foo as x:, foo enters something, does its thing, cleans up at the end.
Magic methods (__getattr__, __call__, __iter__, ...) - special methods Python calls on your behalf. __init__ you know. The others customize how an object behaves with built-in syntax. Recognize them; look up specifics when you need to.
Metaclasses, __init_subclass__, descriptors - deep Python features used in frameworks (Django ORM, Pydantic). You will encounter them in major libraries; you almost never need to write them yourself. For reading: "this is doing something fancy at class creation time."
C extensions / Cython - files like _speedups.c or *.pyx. Performance-critical code. Read the Python wrappers, skip the C unless you specifically care.

You will hit things you don't recognize. That's normal. The skill is knowing when to dig in and when to skim past. Most of the time: skim past.

Reading vs understanding¶

A useful distinction:

Reading code means following what it does, line by line. You can read code without understanding it deeply.
Understanding code means knowing why it's shaped the way it is. You don't need to understand to contribute.

A first PR to a project often involves reading 1000 lines, understanding 100, modifying 5. That ratio is normal.

Exercise¶

No coding this time. Reading.

Pick a small Python project on GitHub. Three suggestions:

peterbourgon/pkg-template - no, that's Go. Let me suggest Python: hynek/structlog (~5k LOC), structured logging.
pallets/click - CLI library, well-documented, well-organized.
encode/httpx (~10k LOC) - modern HTTP client.

Pick one. Do the five-minute orientation:

Read the README.
List the top-level directories. What does the layout suggest?
Open pyproject.toml. What does it depend on?
Find the entry point. Trace the most-public function for 5 minutes.
Open the test file for the main code file. Pick three test cases; understand them.

Write a paragraph (in a note file, for yourself) answering: - What does this project do? - How is it organized? - What's the most interesting thing you noticed?

That paragraph is your start point for everything in pages 13-15.

What you might wonder¶

"What if I don't understand something even after reading it three times?" Write down what you don't understand, skip it, keep going. Come back later. Often the thing that confused you on page 1 makes sense after you've seen page 50. If it still doesn't, ask in the project's discussion forum - but only after you've tried for an hour.

"What about huge projects like Django or Flask?" The same techniques work, scaled. You won't read all of Django; nobody has. Pick one sub-area (URL routing, ORM, middleware) and read just that slice.

"How do I know which tests are 'representative'?" The ones with the simplest names usually exercise the basic case. test_simple, test_basic, test_empty. Start there. Save test_edge_case_unicode_in_nested_url_with_query_params for later.

Done¶

You can now: - Apply a five-minute orientation to any new Python project. - Use help(), pydoc, pkg.go.dev's Python equivalent (the project's RTD or python -m pydoc), and editor navigation to read code efficiently. - Distinguish reading from understanding. - Recognize "looks scary, isn't" patterns: decorators, async, type hints, magic methods. - Pick a small project and write a one-paragraph summary.

The skill on this page is what separates "people who learned a language" from "people who can contribute to software." Practice it on three projects, not one.

Next page: how to choose a project worth your time.

Next: Picking a project →

13 - Picking a Project to Contribute To¶

What this session is¶

About 30 minutes plus your own browsing. You'll learn what makes a project a good first target, how to evaluate one in 10 minutes, and we'll list several real Python projects that consistently welcome new contributors.

Why the wrong project will burn you out¶

A first contribution to the wrong project goes like this:

You pick something you use (Django, say).
You spend three hours setting up the dev environment.
You find a "good first issue" that hasn't been touched in six months.
You spend two weeks understanding enough of the codebase to make a change.
You submit a PR.
Nobody reviews it for three weeks. A maintainer asks for changes you don't understand.
You give up.

Every step in that story is normal. The fix isn't to be smarter; the fix is to pick a smaller, more responsive project first.

What "manageable" means¶

The criteria, in priority order:

The project is small enough to comprehend. Under ~10k lines of Python is great for a first contribution. Under ~50k is doable. Above 100k, the orientation phase alone is a week.
The maintainers are active. PRs get reviewed within a week, ideally a few days. Issues get responses.
There are labeled "good first issue" or "help wanted" tickets. These are pre-screened to be approachable.
There's a CONTRIBUTING.md. Tells you the project's conventions - coding style, tests they expect, the PR process.
The tests run cleanly on a fresh clone. If pytest fails on git clone && pip install -e .[dev], that's a red flag about how careful the maintainers are.
You actually understand or care about what the project does. Bonus, but real - motivation matters when you're stuck.

How to evaluate a project in 10 minutes¶

Open the GitHub page. Check, in order:

Signal	What you're looking for
Stars	More than ~100, less than ~50000. (Too few = abandoned, too many = crowded.)
Last commit date	Within the last month. Older = inactive.
Open PRs	Some, but not 200+. Look at how recent the most recent merged PR is.
PR merge time	Pick 3 recently merged PRs. How many days from open to merge? Under 14 is healthy.
Open issues with `good first issue` label	At least 5 is comfortable.
CONTRIBUTING.md	Exists and is readable.
CI status	Green ✓ on the main branch. Means tests pass.
Code of conduct	Means maintainers think about how contributors are treated.

If a project fails on multiple of these, find another. There are thousands of Python projects on GitHub; you don't have to settle.

Several real candidates¶

These are Python projects that, as of 2026, have a track record of welcoming new contributors. Verify their current state with the 10-minute evaluation before you commit.

Tier 1: very small, very gentle¶

hynek/structlog - structured logging. ~5k LOC. Clean code; one of the friendliest maintainers in the ecosystem.
hynek/attrs - predecessor to dataclasses; still widely used. ~6k LOC.
peterbourgon/... - no, that's Go. Python equivalent: asottile/pre-commit-hooks - small hooks for the pre-commit framework. Tiny PRs welcome.
tartiflette/... - niche libraries, often <3k LOC and welcoming.

Tier 2: small to medium, well-organized¶

pallets/click - CLI framework. Well-documented, responsive.
encode/httpx - modern HTTP client. ~10k LOC.
tiangolo/typer - CLI library built on Click. Beginner-friendly issue labels.
samuelcolvin/pydantic - data validation. Larger (~30k LOC) but excellent maintainer ratio.
rich/rich (Will McGugan) - terminal formatting library. Wonderful README. Lots of issues at varying difficulties.

Tier 3: larger, more visible¶

After you've done a Tier 1 or 2 contribution.

pytest-dev/pytest - the test framework itself. Plenty of room for first contributions, especially in plugins.
pallets/flask - the web framework. Large but very welcoming.
scikit-learn/scikit-learn - ML library. Big, but documentation contributions are very accessible.
numpy/numpy - foundational. Documentation issues are the on-ramp.

Tier 4: massive - don't start here¶

django/django - yes, eventually. Not first.
python/cpython - Python itself. The process is slow even for senior contributors.
pytorch/pytorch - gigantic; mostly C++.

How to find issues¶

Once you've picked a project, visit its Issues tab.

Click "Labels." Filter by: - good first issue - help wanted - documentation (often the easiest first contribution) - easy or beginner (some projects use these instead)

Read 5-10 issues. Look for one where: - The description is clear ("X happens when Y, expected Z"). - The fix is contained ("update this string", "add a test for..."). - Nobody has claimed it (no comment like "I'm working on this"). - It hasn't been open for a year (older = harder than it looks).

Add a comment: "I'd like to take this. Can you confirm it's still wanted?" Wait for the maintainer's reply. Don't start work until they confirm.

What counts as a contribution¶

Don't underestimate small contributions. Real first contributions look like:

Fixing a typo in the README.
Adding a missing example in the documentation.
Adding a test case for an existing function.
Improving an error message to include more context.
Adding a type hint to an old function that lacks one.
Removing a deprecated dependency.
Fixing a small bug with a clear reproduction.

These are not "cheating." Every contribution is real. Maintainers prefer ten small clean PRs to one giant murky one. Your first PR's job is to get you through the workflow.

Exercise¶

Pick a project. Evaluate three; commit to one.

Browse three projects from Tiers 1-2 above. For each, do the 10-minute evaluation. Write down the numbers in a notes file.
Compare. Pick the one with the most responsive maintainers and at least 3 unclaimed first issues.
Read its CONTRIBUTING.md end to end. Note unusual requirements (signed commits, specific PR templates, dev container).

Clone it locally:

git clone https://github.com/<owner>/<repo>
cd <repo>

Set up the dev environment per the CONTRIBUTING:

python -m venv .venv
source .venv/bin/activate
pip install -e .[dev]      # or whatever the project's instructions say
pytest                     # should pass

If tests don't pass on fresh clone, consider a different project.

Browse the open good first issue tickets. Pick two candidates. Don't claim either yet.

What you might wonder¶

"What if I don't see a good first issue label?" Some projects use other labels (help wanted, beginner-friendly, easy). Some don't label at all - look at recently closed PRs and see what kind of changes get merged. Documentation fixes are almost always welcome.

"What if my favorite project is too big?" Find a sub-project of it. The Pallets ecosystem (Flask) includes click, jinja2, werkzeug, itsdangerous - each smaller than Flask itself. Django has many smaller third-party packages (django-rest-framework's sub-packages, django-allauth, etc.) that are Tier 2.

"What if I find a bug but there's no issue for it?" File one first. Describe what you saw, what you expected, how to reproduce. Wait for acknowledgement. Then say "I'd like to send a fix."

"I'm worried about being judged for asking a basic question." (1) Most maintainers remember being new. (2) A polite, specific question is welcome. ("I tried X, expected Y, got Z" beats "doesn't work.") (3) A bad reception in the issue is itself useful information about the project. Try another.

Done¶

You can now: - Articulate what makes a project a "good first target." - Run a 10-minute evaluation on any GitHub project. - Recognize tiers and start at Tier 1 or 2. - Find issues appropriately sized for a first contribution. - Avoid the most common first-contribution traps.

You've chosen your target. The next page goes through the file structure of a real Python project so you know what every piece is for.

Next: Anatomy of a small Python OSS repo →

14 - Anatomy of a Small Python OSS Repo¶

What this session is¶

About 45 minutes. We'll walk through the file layout of a real (small) Python open-source project, file by file, so you know what every common piece is for. The next page asks you to make a contribution; this page makes the project feel less like a maze.

We'll use the modern Python project layout as our template. There's no single official spec, but the conventions are stable enough that you can predict where things live.

A typical small Python project, from the top¶

After git clone and cd into it, you'll usually see something like:

.
├── README.md
├── LICENSE
├── CONTRIBUTING.md
├── CODE_OF_CONDUCT.md
├── pyproject.toml
├── .gitignore
├── .pre-commit-config.yaml
├── .github/
│   ├── workflows/         (GitHub Actions CI files)
│   ├── ISSUE_TEMPLATE/
│   └── PULL_REQUEST_TEMPLATE.md
├── src/
│   └── mypackage/
│       ├── __init__.py
│       ├── core.py
│       └── cli.py
├── tests/
│   ├── conftest.py
│   ├── test_core.py
│   └── test_cli.py
├── docs/
│   ├── conf.py
│   ├── index.rst              (or .md if using MyST)
│   └── ...
├── examples/
│   └── basic.py
└── tox.ini                    (older projects)

Not every project has all of these. The shape varies, but the roles are consistent.

What each piece is for¶

Root-level files¶

README.md - the project's homepage. Should give you: one-line description, install instructions, smallest working example. If the README isn't useful, the project is incomplete.
LICENSE - legal terms (MIT, Apache 2.0, BSD, GPL). Know the license before contributing. Some projects (Apache Foundation, CNCF) require signing a CLA (Contributor License Agreement) - the bot will prompt you on your first PR.
CONTRIBUTING.md - the most important file for you right now. Spells out how to propose changes, conventions, branch naming, commit message style, how tests should look. Read it before doing anything.
CODE_OF_CONDUCT.md - community standards. Usually the Contributor Covenant. "Be respectful, no harassment" is the gist.
pyproject.toml - project metadata, dependencies, build config, tool config (ruff, mypy, pytest, coverage). Modern projects put nearly all configuration here.
setup.py / setup.cfg - older alternatives to pyproject.toml. You'll see them in projects predating ~2021.
requirements.txt / requirements-dev.txt - pinned dependencies (often for applications, less for libraries). Some projects have both pyproject.toml and requirements.txt.
tox.ini - config for tox, an older multi-env test runner. Increasingly replaced by nox (Python-based config). If a project uses one, tox -e py311 runs the test suite against Python 3.11.
.gitignore - files git should ignore.
.pre-commit-config.yaml - config for pre-commit, a tool that runs linters/formatters before each commit. If the project uses it, install with pre-commit install after cloning - it'll catch style issues automatically.

`.github/`¶

GitHub-specific configuration:

workflows/ - CI pipelines (YAML files). One file per workflow. Reading them tells you what the project considers "the green path" - the exact commands your PR will be measured against.
ISSUE_TEMPLATE/ - templates for issue types.
PULL_REQUEST_TEMPLATE.md - what GitHub pre-fills the PR description with. Address every checkbox.
CODEOWNERS - who automatically reviews PRs touching a file.

`src/<package>/` (or `<package>/` at top level)¶

The actual code. The src/ layout (vs top-level) is the modern best practice - it forces you to install the package to use it, which catches a class of "works on my machine but breaks in CI" bugs.

Inside, every folder with .py files needs __init__.py to be a package (though modern Python allows "namespace packages" without it).

A __init__.py often: - Re-exports the public API: from .core import MainClass, main_function. - Sets __version__ = "1.2.3" for runtime version access. - Sometimes is empty (when the package is just a folder grouping).

`tests/`¶

Tests, mirroring the source layout. test_*.py files; conftest.py for shared fixtures (page 10).

Common shape:

tests/
├── conftest.py            # shared fixtures, available to all tests below
├── test_core.py           # tests for src/mypackage/core.py
├── test_cli.py
└── integration/
    └── test_end_to_end.py

Tests are usually run from the project root with pytest.

`docs/`¶

Documentation source. Common Python tools: - Sphinx - the original. Files in .rst (reStructuredText) or .md (with the MyST extension). Generates HTML, PDF, ePub. - MkDocs (often with the Material theme) - Markdown-only, simpler. The platform you're reading is built with this.

The hosted docs are usually on Read the Docs (free for OSS) or GitHub Pages.

`examples/`¶

Runnable example code showing how to use the project. Read these - they're the "official" way to use the API. Often the fastest way to understand a library.

`Makefile` or `noxfile.py` or `tox.ini`¶

A script of common dev commands. Open it and read the targets: - make test or nox -s test - run tests. - make lint - run linters. - make docs - build docs locally. - make format - auto-format with black/ruff.

These commands often pass project-specific flags you'd get wrong from memory. Use them.

Common tools you'll meet¶

Modern Python projects use a stack of tooling. Recognize the names:

ruff (Rust-implemented) - linter and formatter, ~100× faster than the old options. Replaces flake8, isort, sometimes black. Increasingly the default since ~2024.
black - opinionated formatter. Older but still widely used.
mypy or pyright - static type checkers. Run them to catch type bugs without running the code.
pytest - the test framework (page 10).
coverage / pytest-cov - measure test coverage.
pre-commit - runs these on every commit.

When the CI workflow runs ruff check && mypy && pytest, that's what your PR will be measured against. Run them locally first.

A worked walkthrough: `hynek/structlog`¶

Let's apply the above to a real project: hynek/structlog, a structured logging library. Clone it:

git clone https://github.com/hynek/structlog ~/code/structlog
cd ~/code/structlog
ls

You should see roughly:

README.md  LICENSE  CHANGELOG.md  CONTRIBUTING.md
pyproject.toml  tox.ini
src/structlog/
tests/
docs/
.github/

Apply what you just learned:

README.md - read it. What does structlog do? (Structured logging for Python.)
pyproject.toml - package name? (structlog.) Dependencies? (Almost none - quality signal.)
src/structlog/ - the code. Open __init__.py. Note what's re-exported - that's the public API.
tests/ - tests right next to per-source-file structure. Standard layout.
docs/ - Sphinx-based docs (conf.py, .rst files).
.github/workflows/ - open the workflow YAML. CI runs on multiple Python versions; runs pytest, mypy, ruff.
tox.ini - alternative test runner. tox -e py312 runs tests on Python 3.12.
CONTRIBUTING.md - read it end to end.

Five minutes later, you have a map. You haven't read the implementation; you don't need to.

The conventions in `CONTRIBUTING.md`¶

Open the file and look for:

Setup instructions. Usually pip install -e .[dev] or pip install -e .[tests,docs].
How to run tests. pytest, tox, nox.
Code style. Usually "run pre-commit install and the rest is automated."
Type-checking. Run mypy or pyright - your PR must pass.
Commit message format. Some require Conventional Commits, most don't.
CHANGELOG. Some require you to add a line to CHANGELOG.md describing your change.
Sign-off / CLA. Some require git commit -s for DCO; some require signing a CLA via a bot.

Follow them. The maintainers will be relieved.

Exercise¶

Use the project you picked in page 13.

Clone it locally.
Walk the layout, file by file, mapping each piece to the categories above.
Read CONTRIBUTING.md end to end.
Open one CI workflow YAML in .github/workflows/. Identify: what commands does CI run? On what Python versions?

Run those CI commands locally:

python -m venv .venv && source .venv/bin/activate
pip install -e .[dev]
pytest
ruff check .
mypy src/

Adjust to match whatever the project's CONTRIBUTING says.

Open the issue you tentatively picked. Identify the three files most likely to be involved in the fix (guess based on file names and grep).

You're now ready to actually make a change.

What you might wonder¶

"What if a project doesn't follow the standard layout?" Some don't. Read the README and CONTRIBUTING.md; they'll explain. If neither does, follow the entry point and see where it leads.

"What's src/ vs no src/?" Cosmetic, but src/<pkg>/ prevents a subtle bug: you can accidentally import the local source instead of the installed package. Modern projects use src/; older ones often don't.

"What's __init_subclass__ and other dunders?" Magic methods. Recognize them; understand what each does when you need to.

"What's noxfile.py vs tox.ini?" Both are matrix runners (run tests across Python versions, dep versions). nox is Python-based config (more flexible); tox is INI (older). Pick whichever the project uses; don't mix.

"What if CI breaks on main when I clone?" A red flag about project health. Consider another project. At minimum, ask in the issue tracker whether main is in a known-broken state.

Done¶

You can now: - Recognize the typical Python project layout. - Locate every common file/folder by role. - Read CONTRIBUTING.md for conventions you'll need to follow. - Read CI workflows to know exactly what your PR will be measured against. - Make a confident guess at which files a given change will touch.

You're ready to actually do the thing.

Next: Your first contribution →

15 - Your First Contribution¶

What this session is¶

The whole thing. Maybe two sessions. We're going to walk through the workflow of making a real contribution to a real open-source Python project, end to end: fork, branch, change, test, push, PR, review, merge. This is the page the whole path has been building toward.

By the end you'll have submitted a pull request. When it merges (which may be days or weeks later), you'll be an open-source contributor - a small, real one. Welcome.

The whole workflow at a glance¶

Eight steps:

Fork the project on GitHub.
Clone your fork.
Add upstream as a remote.
Branch off main.
Set up the dev environment, including pre-commit if used.
Change the code; add a test.
Run tests + linters locally.
Push to your fork; open the PR.

Each step is short. The whole sequence takes 30 minutes the first time; 5 minutes once it's automatic.

Step 1: Fork¶

On the project's GitHub page, click Fork (top right). GitHub creates a copy at github.com/<you>/<project>. This is your personal copy.

Step 2: Clone¶

git clone https://github.com/<you>/<project>
cd <project>

Step 3: Add upstream as a remote¶

git remote add upstream https://github.com/<owner>/<project>
git fetch upstream
git remote -v

You should see origin (your fork) and upstream (the original).

To pull updates from upstream later:

git fetch upstream
git checkout main
git merge upstream/main
git push origin main

Step 4: Branch¶

Never commit directly to main. Always branch.

git checkout -b fix/issue-123-clarify-error-message

The name should hint at the change. Some projects have conventions; follow them.

Step 5: Set up the dev environment¶

Create a venv and install dev dependencies:

python -m venv .venv
source .venv/bin/activate
pip install -e .[dev]            # most projects
# or: pip install -e .[tests,docs]   # depends on what extras the project defines

If the project uses pre-commit, install the git hooks:

pre-commit install

Now every git commit will auto-run formatters/linters and fail if there's an issue (which is what you want - better to know now than after pushing).

Run the test suite to make sure everything works on your machine:

pytest

All green? Good. If not, stop. Figure out why before changing anything. Common causes: missing system dependencies, wrong Python version, the project's main is currently broken.

Step 6: Make the change¶

Edit the files involved. The change should be:

Small. Touch as few files and lines as possible. A 5-line diff is easier to review than 500.
Focused. One issue per PR. Don't bundle unrelated fixes.
Tested. If your change has logic, add a test. Even one is enough.

For code changes, follow the project's style. The pre-commit hooks usually handle formatting automatically.

Step 7: Run tests + linters locally¶

Replay exactly what CI runs. You looked at the CI workflow in page 14; run the same commands:

pytest                    # all tests
ruff check .              # lint
mypy src/                 # type-check (or whatever path the project uses)

Every command should be green. If something fails, fix it before pushing. Pushing red CI is rude - it makes reviewers babysit your PR through cycles of "now please fix this, now please fix that."

If the project has multiple Python versions in CI (3.10, 3.11, 3.12, 3.13), you don't have to run them all locally. Tools like tox or nox automate this:

nox -s tests              # run tests on all configured Python versions

But for a small change, "passes on my local Python" is enough; CI will catch anything version-specific.

Step 8: Commit and push¶

Stage and commit:

git add <files>
git commit -m "fix: clarify error message in core.py (#123)"

If pre-commit is installed, it runs now. If it modifies files (formatting), the commit aborts; re-stage and re-commit.

Commit message conventions: - First line, short. ~50 chars. Imperative mood ("Add", not "Added"). - Optional body. Blank line, then a longer description. - Reference the issue. "#123" auto-links to it.

If CONTRIBUTING.md mandates Conventional Commits (feat:, fix:, chore:), follow it.

If DCO is required:

git commit -s -m "fix: ..."

Adds Signed-off-by: Your Name <your@email> to the commit.

Push to your fork:

git push origin fix/issue-123-clarify-error-message

GitHub prints a URL - click it to open a pre-filled PR page.

Step 9: Open the PR¶

On the upstream project's GitHub, you'll see a banner suggesting "Compare & pull request." Click it.

Fill out:

Title. Mirror the commit message. Or match the issue title.
Description. What does this change? Why? What did you test? Reference the issue: "Closes #123" or "Fixes #123" - GitHub auto-closes the issue when the PR merges.
Checklist. Address every item in the PR template.

Submit. CI starts. Wait for green; if red, look at the failing step and fix locally, then push more commits (they automatically attach to the PR).

If the project requires a CHANGELOG entry, add a line under "Unreleased" describing your change.

What happens next: review¶

A maintainer will look. Possible outcomes:

"LGTM, merging." Best case.
"Could you make these changes?" Most common. They leave inline comments. Address each - either by changing code or replying with a reason. Push more commits.
"Thanks, but we don't want this." Rare for good first issue work. Don't take it personally. Ask if there's a related issue.
Silence. Sometimes. After a week, leave a polite comment: "Friendly bump - anything I should address?"

Code review is not personal. Even senior engineers get review comments. The skill is address feedback efficiently without arguing about style preferences. Disagree only on substance.

After the merge¶

When your PR merges:

Update your fork's main (the workflow from step 3).
Delete the branch. Locally (git branch -d ...) and on your fork (git push origin --delete ...).
Take a screenshot. Really. You'll be glad later.
Sit with it for a day. Re-read the merged code, the review comments. The learning is in the loop.

A copy-paste sequence (template)¶

Full sequence for a small docs fix on example-org/example-repo issue #42:

# 1-2. Fork on GitHub, then clone:
git clone https://github.com/<you>/example-repo
cd example-repo

# 3. Add upstream:
git remote add upstream https://github.com/example-org/example-repo
git fetch upstream

# 4. Branch:
git checkout -b docs/fix-typo-in-readme

# 5. Set up dev env:
python -m venv .venv && source .venv/bin/activate
pip install -e .[dev]
pre-commit install
pytest    # baseline green

# 6. Make the change. Edit README.md.

# 7. Run linters locally:
ruff check .
pytest

# 8. Commit and push:
git add README.md
git commit -m "docs: fix typo in installation section (#42)"
git push origin docs/fix-typo-in-readme

# 9. Open the PR on github.com. Wait. Respond to review.

That's the whole thing.

After your first contribution: what next¶

Once you've landed one PR:

Pick another issue in the same project. Familiarity compounds; your second PR will be much faster.
After 3-5 PRs, consider becoming a regular. Watch the issue tracker. Answer issues you can. Review other people's PRs (you don't need to be a maintainer to leave helpful comments).
Branch out. Use what you learned for a Tier 2 or 3 project from page 13.
Build something of your own. Use Python to scratch a personal itch. Publish it (pip install -e . then python -m build && twine upload). Iterate based on real use.
Read the "AI Expert Roadmap" path on this site when you want to grow into Python's biggest current niche.

What you might wonder¶

"What if my PR sits unreviewed for weeks?" Polite check-in after ~1 week. After 3 weeks of silence, ask in any community channel whether to redirect. Some projects are slow; some are abandoned.

"What if a maintainer is rude?" Disengage. There are thousands of projects.

"What if I disagree with a review comment?" Two questions: (1) Is it about correctness or style? Style: do what they ask. Correctness: explain your reasoning with a specific example. (2) Are they more experienced with this codebase than you? Yes: probably right. No: reasonable to push back. Either way: stay polite, stay specific.

"What if I can't make the tests pass locally?" Re-read CONTRIBUTING.md for missed setup. Check the CI workflow for env vars. Stuck after an hour: ask in the issue or PR, with specifics about what you tried.

"What if I introduce a bug in my fix?" Comes up. Push another commit fixing it. Don't squash or rewrite history unless asked.

"Can I list this on my CV?" Yes. "Open-source contributor, projects: X, Y, Z" is a real signal. Link to your specific merged PRs.

Done¶

You can now: - Walk through the full GitHub contribution workflow. - Run a project's CI commands locally before pushing. - Write a contribution-ready commit message. - Read and address code-review feedback. - Recover when a PR sits or gets pushback.

Done with the path¶

You started this path being told that programming was something you could learn from scratch. You've now:

Installed Python and written your first program.
Learned every fundamental concept: variables, types, control flow, functions, classes, collections, errors, iterators, files, tests, packages.
Read a real Python OSS project and made sense of its layout.
Picked a project, prepared a change, submitted a pull request.

What you should not do next: feel like you "know Python" now. You know what you've been taught. There is much more - web frameworks, async programming, scientific computing, ML, packaging at scale. Each is a path of its own.

What you should do: keep contributing. The way you become an engineer is by doing real work on real codebases over time. There is no shortcut.

Two recommended next paths if you want to keep going on this site:

Python Mastery - the 24-week deep dive into CPython internals, performance, concurrency, AI runtimes. Assumes you're past where this path leaves you.
AI Expert Roadmap - Python is the dominant language for AI/ML. This 12-month companion takes you from math foundations through transformers, RAG, evals, and fine-tuning.

Or just go build something. Programming pays you back when you build, not when you read.

Congratulations. You are no longer a beginner.

Python From Scratch (Beginner)¶