Saltar a contenido

12 - Reading Other People's Code

What this session is

About 45 minutes. You'll learn the strategy for reading code you didn't write - a different skill from writing your own. This page has less code than usual; what it teaches is how to approach a new codebase without drowning.

The mistake most beginners make

When you open a new codebase, the temptation is to start reading the first file you see and try to understand every line. By line 50 you're lost; by line 200 you've given up.

This doesn't work because real code isn't a story - it's a graph. Every function calls others. Every class is defined somewhere else. Trying to load it all into your head at once is impossible, even for experienced engineers.

The trick is to not try. Pick a small thread; follow only it; let the rest stay fuzzy.

The five-minute orientation

Whenever you open a new Python project, do exactly this, in order:

  1. Read the README. What does this project DO? What is the one-sentence elevator pitch? If you can't answer this, the project is too unfinished - pick another.

  2. List the top-level directories and files. Common layout:

  3. README.md, LICENSE, pyproject.toml, requirements.txt, .gitignore - meta.
  4. src/<package>/ - the actual code. Modern projects use src/; older ones may have <package>/ at the top.
  5. tests/ - tests.
  6. docs/ - documentation source.
  7. examples/ - runnable usage examples.
  8. .github/ - GitHub workflows, issue templates.
  9. scripts/ - helper scripts.

  10. Open pyproject.toml (or setup.py/setup.cfg for older projects). What's the package name? What are the dependencies? This tells you which ecosystem the project lives in.

  11. Find the entry point. For a CLI tool, look at pyproject.toml's [project.scripts] section - it tells you where main lives. For a library, the entry point is the top-level package's __init__.py. Read that file - it often re-exports the public API and tells you the project's shape.

  12. Read one test file. Pick a small test_*.py and read it. Tests show you what the code is supposed to do, with concrete examples. Often clearer than the code being tested.

After this five-minute pass, you should be able to write a one-paragraph summary of what the project does. If you can't, repeat.

Tools for reading

A few things make reading 10× faster:

help(thing) - show docstrings inside the REPL.

>>> import json
>>> help(json.dumps)

python -m pydoc <name> - same docs from the command line.

Your editor's "Go to definition" / "Find references." Right-click a name → "Go to Definition" jumps to where it's defined. "Find All References" shows everywhere it's used. This is how you trace a name through a project quickly.

grep -r 'pattern' . - old-school but unbeatable. Find every place a string appears.

pytest -k <pattern> -v - run one specific test. Tests are the most reliable "what does this actually do?" diagnostic.

Reading recent merged PRs on GitHub. PRs are bite-sized - a few files, a clear description, a discussion. Often the best way to understand a project is to read its five most recent merged PRs.

A real session (worked example)

Let's read a piece of a real Python project: the standard library's json.dumps function. Pretend this is a project we just opened.

Step 1: what does it do?

>>> import json
>>> help(json.dumps)

Output starts: "Serialize obj to a JSON formatted str." Clear: takes a Python value, returns a JSON string.

Step 2: where is it defined?

Command-click on json.dumps in your editor, or look in the standard library: $(python -c "import json; print(json.__file__)") is the path. You'll find dumps defined in json/__init__.py. It's a thin wrapper that creates a JSONEncoder and calls its .encode().

Step 3: follow one thread.

Open json/encoder.py. Read the top docstring and the JSONEncoder.encode method. Don't try to understand the C-accelerated fast path at the bottom. Recognize: "it traverses the value tree and emits JSON text."

Step 4: confirm with a test.

Find Lib/test/test_json/. Open test_dump.py. Read a few test cases. Now you know - and have verified you know - what dumps does.

Step 5: write the one-line summary.

json.dumps(obj) returns a JSON string for any Python value that maps cleanly to JSON (dicts, lists, strings, numbers, booleans, None). Implementation is in json/encoder.py.

That whole investigation took ~5 minutes. You did not understand every byte of the C extension. That's fine. You understood enough to use it.

Things you will see that look scary

Real codebases use language features you haven't met yet. A few common ones with "don't panic" notes:

  • Decorators (@decorator) - a function that wraps another function. You met @dataclass and @pytest.fixture. There are many: @property (turn a method into an attribute-like access), @staticmethod/@classmethod, framework-specific decorators (@app.route in Flask, @app.command in Typer). For reading: a decorator is "this function gets wrapped by that one." Don't worry about the wrapping mechanics; recognize what each decorator commonly means.

  • Type hints with generics - list[int], dict[str, int], Optional[str], Union[X, Y], Callable[..., T]. Read them like declarations: "a list of ints," "a string-to-int dict," "a string or None."

  • async def / await - asynchronous code. Used heavily in modern web (FastAPI, Starlette) and async libraries (httpx async client). For reading: async def f is a function that returns a coroutine; await x waits for an async operation to finish. You can read async code linearly; just notice the awaits.

  • Context managers (with) - you met them. When you see with foo as x:, foo enters something, does its thing, cleans up at the end.

  • Magic methods (__getattr__, __call__, __iter__, ...) - special methods Python calls on your behalf. __init__ you know. The others customize how an object behaves with built-in syntax. Recognize them; look up specifics when you need to.

  • Metaclasses, __init_subclass__, descriptors - deep Python features used in frameworks (Django ORM, Pydantic). You will encounter them in major libraries; you almost never need to write them yourself. For reading: "this is doing something fancy at class creation time."

  • C extensions / Cython - files like _speedups.c or *.pyx. Performance-critical code. Read the Python wrappers, skip the C unless you specifically care.

You will hit things you don't recognize. That's normal. The skill is knowing when to dig in and when to skim past. Most of the time: skim past.

Reading vs understanding

A useful distinction:

  • Reading code means following what it does, line by line. You can read code without understanding it deeply.
  • Understanding code means knowing why it's shaped the way it is. You don't need to understand to contribute.

A first PR to a project often involves reading 1000 lines, understanding 100, modifying 5. That ratio is normal.

Exercise

No coding this time. Reading.

Pick a small Python project on GitHub. Three suggestions:

  • peterbourgon/pkg-template - no, that's Go. Let me suggest Python: hynek/structlog (~5k LOC), structured logging.
  • pallets/click - CLI library, well-documented, well-organized.
  • encode/httpx (~10k LOC) - modern HTTP client.

Pick one. Do the five-minute orientation:

  1. Read the README.
  2. List the top-level directories. What does the layout suggest?
  3. Open pyproject.toml. What does it depend on?
  4. Find the entry point. Trace the most-public function for 5 minutes.
  5. Open the test file for the main code file. Pick three test cases; understand them.

Write a paragraph (in a note file, for yourself) answering: - What does this project do? - How is it organized? - What's the most interesting thing you noticed?

That paragraph is your start point for everything in pages 13-15.

What you might wonder

"What if I don't understand something even after reading it three times?" Write down what you don't understand, skip it, keep going. Come back later. Often the thing that confused you on page 1 makes sense after you've seen page 50. If it still doesn't, ask in the project's discussion forum - but only after you've tried for an hour.

"What about huge projects like Django or Flask?" The same techniques work, scaled. You won't read all of Django; nobody has. Pick one sub-area (URL routing, ORM, middleware) and read just that slice.

"How do I know which tests are 'representative'?" The ones with the simplest names usually exercise the basic case. test_simple, test_basic, test_empty. Start there. Save test_edge_case_unicode_in_nested_url_with_query_params for later.

Done

You can now: - Apply a five-minute orientation to any new Python project. - Use help(), pydoc, pkg.go.dev's Python equivalent (the project's RTD or python -m pydoc), and editor navigation to read code efficiently. - Distinguish reading from understanding. - Recognize "looks scary, isn't" patterns: decorators, async, type hints, magic methods. - Pick a small project and write a one-paragraph summary.

The skill on this page is what separates "people who learned a language" from "people who can contribute to software." Practice it on three projects, not one.

Next page: how to choose a project worth your time.

Next: Picking a project →

Comments