Saltar a contenido

Python Mastery

CPython internals, performance, concurrency, AI runtimes.

Printing this page

Use your browser's PrintSave as PDF. The print stylesheet hides navigation, comments, and other site chrome; pages break cleanly at section boundaries; advanced content stays included regardless of beginner-mode state.


Python Mastery Blueprint - A 24-Week Beginner-to-Senior Syllabus (AI-Systems Track)

Authoring lens: Senior Staff AI/Platform Engineer. Target outcome: A graduate of this curriculum should be capable of (a) writing, reviewing, and shipping idiomatic, production-grade Python at a senior level, (b) reasoning about CPython internals well enough to debug GIL contention, allocator pathologies, and asyncio event-loop stalls in production, and (c) designing AI systems end-to-end - RAG services, agent orchestration platforms, training/inference pipelines - with clear opinions on the trade-offs.

This curriculum is not "Learn Python in 24 hours stretched to 24 weeks." It assumes the reader can write working code in some language. The premise: most Python performance and correctness bugs at scale are not language bugs - they are interpreter, GIL, allocator, and event-loop bugs in disguise, layered on top of glue-language assumptions about NumPy, PyTorch, and CUDA. This curriculum surfaces all of them.


Repository Layout

File Purpose
00_PRELUDE_AND_PHILOSOPHY.md The "Python-ness" of Python; the data model; the cost model; the reading list.
01_MONTH_FOUNDATIONS.md Weeks 1–4. Syntax, the data model (__dunder__), control flow, idioms, packaging basics.
02_MONTH_INTERMEDIATE_IDIOMS.md Weeks 5–8. Iterators, generators, decorators, context managers, dataclasses, the type system.
03_MONTH_RUNTIME_AND_PERFORMANCE.md Weeks 9–12. CPython internals: bytecode, eval loop, refcounting, GC, allocator, GIL, dis, sys, tracemalloc.
04_MONTH_CONCURRENCY_AND_PARALLELISM.md Weeks 13–16. threading, multiprocessing, asyncio, concurrent.futures, free-threaded 3.13+, subinterpreters, native extensions.
05_MONTH_PATTERNS_AND_ARCHITECTURE.md Weeks 17–20. Pythonic design patterns, data structures, packaging, testing, observability, FastAPI/Pydantic.
06_MONTH_AI_SYSTEMS_SENIOR.md Weeks 21–24. LLM-app architecture, RAG, agents, evals, training/serving, distributed inference, capstone.
APPENDIX_A_PRODUCTION_HARDENING.md ruff, mypy/pyright, pytest/hypothesis, profilers (py-spy, scalene, memray), packaging with uv/hatch.
APPENDIX_B_DATA_STRUCTURES_AND_PATTERNS.md Build-from-scratch reference: LRU, trie, bloom filter, ring buffer, async queue, vector index.
APPENDIX_C_DEEP_DIVE_CPYTHON_AND_AI_RUNTIMES.md The deep-dive session: CPython eval loop, asyncio internals, NumPy strides/buffer protocol, PyTorch autograd, CUDA streams.
CAPSTONE_PROJECTS.md Three terminal projects: production RAG service, agent orchestration platform, training/serving pipeline.

How Each Week Is Structured

Every weekly module follows the same five-section format so the reader can budget time:

  1. Conceptual Core - the why, with a mental model.
  2. Mechanical Detail - the how, down to CPython source where relevant (Python/ceval.c, Objects/dictobject.c, Modules/_asynciomodule.c, etc.) or to the relevant PEP.
  3. Lab - a hands-on exercise that cannot be completed without internalizing the concept.
  4. Idiomatic & Linter Drill - read 2–3 ruff/pyright rules, refactor a sample to silence them, understand why each rule exists.
  5. Production Hardening Slice - a profiling, typing, or testing micro-task that compounds into a publishable hardening template by week 24.

Each week is sized for ~12–16 focused hours. Skip the labs at your peril; the labs are the curriculum.


Progression Strategy

The phases form a dependency DAG, not a linear track:

Foundations ──► Intermediate Idioms ──► Runtime & Perf ──► Concurrency & Parallelism
     │                  │                      │                        │
     └──────────────────┴───────────┬──────────┴────────────────────────┘
                        Patterns & Architecture
                       AI Systems & Senior Design
                              Capstone Defense

The Production Hardening slice is intentionally orthogonal - it accumulates a hardening/ template that, by week 24, is a publishable Python project starter (uv-managed, ruff+pyright-clean, pytest+hypothesis, structured logging, OpenTelemetry, Dockerfile, CI).


Non-Goals

  • This curriculum does not teach data analysis as a primary subject. Pandas/Polars appear only as tools in service of AI pipelines.
  • Web-framework breadth is out of scope. We pick FastAPI + Pydantic v2 and go deep; Django/Flask appear only as comparison points.
  • "Why Python is better than X" advocacy is explicitly avoided. The reader should finish the program able to argue against using Python when it is the wrong tool (CPU-bound numeric kernels without NumPy/Cython, hard-real-time, mobile, anything where 200ms cold-start matters).

Capstone Tracks (pick one in Month 6)

  1. Production RAG Service - multi-tenant retrieval-augmented generation with hybrid search, reranking, streaming responses, evals, and a staged rollout harness.
  2. Agent Orchestration Platform - tool-using LLM agents with durable execution, retries, observability, cost ceilings, and a permissions model.
  3. Training/Serving Pipeline - fine-tune a small open model (LoRA), serve with vLLM or TGI behind a FastAPI gateway, with autoscaling, batching, and continuous evaluation.

Details in CAPSTONE_PROJECTS.md.


Versioning Note

This curriculum targets Python 3.13+ as the baseline (PEP 703 free-threaded build available, PEP 684 per-interpreter GIL stable, PEP 669 low-impact monitoring, faster CPython work from 3.11–3.13 fully landed, typing module modernized, match statements stable since 3.10, tomllib since 3.11). Where 3.14 features matter, they are flagged inline. Do not start this curriculum on a Python older than 3.12 - too many of the modern idioms and the new typing semantics will be unavailable.


Senior-Level Exit Criteria

By week 24, the graduate should be able to, in a design review:

  • Argue from CPython memory layout why a hot path allocates and how to fix it (__slots__, NumPy arrays, Cython, struct-of-arrays).
  • Diagnose GIL contention vs. I/O blocking vs. event-loop stalls from a single py-spy dump without re-running the program.
  • Design a RAG pipeline with explicit choices on chunking, embedding model, index type, reranker, and eval methodology - and defend each choice against an alternative.
  • Choose between threads, processes, asyncio, free-threaded, and subinterpreters with a one-paragraph justification per choice.
  • Run a fine-tune, evaluate it offline and online, and ship it behind a gradual rollout with cost and quality guardrails.

Prelude - The Philosophy Behind the Syllabus

Sit with this document for an evening before week 1. The rest of the curriculum is mechanically dense; this is the only chapter where we step back and define the shape of the discipline.


1. Python Is a Glue Language Riding on a Reference-Counted VM

The most damaging misconception a Python engineer can hold is that "Python is a slow scripting language with libraries." A working senior practitioner thinks the inverse:

Python is a glue language - a small, dynamically typed surface - bolted to a reference-counted bytecode VM (CPython) whose superpower is calling into native code (C, C++, Rust, Fortran, CUDA) without paying for a heavyweight FFI. That is why Python won data, ML, and AI: not because Python is fast, but because it makes fast things addressable from a REPL.

Almost every interesting performance question in production Python reduces to "does this loop stay in C, or does it cross back into Python bytecode?" Almost every elegant high-throughput Python architecture is a thin layer over numpy, torch, polars, asyncio, uvloop, or a C extension - with Python orchestrating, not computing.

Internalize this and the rest of the curriculum makes sense.


2. The Five-Axis Cost Model

A working senior Python engineer reasons about every line of code along five axes simultaneously:

Axis Question to ask
Allocation & object overhead Does this create Python objects in a hot loop? Could it stay as a NumPy/torch array, a bytes, or a memoryview?
Bytecode boundaries How many trips through the eval loop does this take? Can it be vectorized, pushed into C, or JITed (PyPy / Numba / Cython)?
Concurrency model Is this CPU-bound (→ processes / free-threaded / native release-the-GIL) or I/O-bound (→ asyncio / threads)?
Type integrity Will pyright --strict accept this? Are runtime contracts (Pydantic, attrs validators) enforced at the right boundary?
Failure What happens on KeyboardInterrupt? On asyncio.CancelledError? On a partially consumed generator that holds a file handle? On an OOM in a forked worker?

Beginner courses teach axis 1 only (and incompletely). This curriculum forces all five into your hands by week 12.


3. The "Pythonic Way" - Aesthetic as Engineering Constraint

Python's design ethic, captured in import this, is "explicit, simple, readable." That phrase is doing more work than newcomers think. Specifically:

  • Duck typing, then static typing. Protocols and structural typing (typing.Protocol) win over nominal hierarchies. Inheritance is fine, deep inheritance is not.
  • EAFP, not LBYL. "Easier to ask forgiveness than permission" - try/except is idiomatic, if hasattr(...) is usually a smell.
  • Comprehensions, generators, iterators. A for loop that builds a list with .append in idiomatic Python is almost always a comprehension or a generator expression in disguise.
  • The stdlib is enormous and underused. itertools, functools, collections, dataclasses, contextlib, pathlib, concurrent.futures, asyncio, logging, argparse, sqlite3, unittest.mock, typing - these cover ~70% of any service. Reach for third-party only when stdlib runs out, and know when it does.
  • Tooling is opinionated. ruff (lint+format), pyright/mypy (types), pytest (test), uv or hatch (build/dep), py-spy/scalene/memray (profile). A Python engineer who does not know these is half-trained.

If you fight these defaults, you will write Java in Python. If you internalize them, your code will look like the stdlib - which is the actual deliverable Python optimizes for.


4. The Reading List

These are referenced throughout the curriculum. You are not expected to read them cover-to-cover before starting; they are pinned tabs.

Primary - Fluent Python, 2nd ed. (Luciano Ramalho). The canonical text. Read chapters 1–6 in Month 1, 14–21 in Month 2, the rest as referenced. - Effective Python, 3rd ed. (Brett Slatkin). The single best companion to Fluent Python. - High Performance Python, 2nd ed. (Gorelick & Ozsvald). Read in Month 3 alongside the runtime chapter. - Architecture Patterns with Python (Percival & Gregory). Read in Month 5 alongside the patterns chapter.

Runtime & internals - The CPython source itself - treat as primary literature, not reference: - Python/ceval.c (the eval loop) - Objects/object.c, Objects/typeobject.c, Objects/dictobject.c, Objects/listobject.c, Objects/longobject.c - Python/gc.c (the cyclic GC) - Modules/_asynciomodule.c (the C accelerator for asyncio) - Include/internal/pycore_*.h (interpreter state, frame layout) - Brandt Bucher's "Python 3.11 specializing adaptive interpreter" talk and the PEP 659 text. - Anthony Shaw, CPython Internals (Real Python). The most accessible treatment. - PEPs that are mandatory reading (curriculum points to each at the right moment): 8, 20, 257, 318, 343, 380, 484, 492, 525, 530, 544, 557, 585, 593, 612, 634, 646, 654, 657, 659, 669, 684, 692, 695, 703, 709.

AI systems canon (not Python-specific, but mandatory by Month 6) - Lewis et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP. The original RAG paper. - Sumers et al., Cognitive Architectures for Language Agents (CoALA). - Designing Machine Learning Systems (Chip Huyen). Especially chapters 7–10. - AI Engineering (Chip Huyen, 2024). The most current treatment of LLM-app design. - The vLLM paper (Efficient Memory Management for LLM Serving with PagedAttention). - Anthropic's Building effective agents and OpenAI's A practical guide to building agents.

Adjacent canon - Drepper, What Every Programmer Should Know About Memory. Re-read in week 9. - Kleppmann, Designing Data-Intensive Applications. Read chapters 5–9 in Month 5.


5. Curriculum Philosophy: "Read the Source, Ship the Lab"

Three rules govern every module:

  1. Source first, blog second. When the curriculum says "study how dict resolves a key," it means open Objects/dictobject.c and read lookdict_unicode_nodummy. Blogs go stale; CPython commits are dated.
  2. One lab per concept, one artifact per phase. By the end of each month, the reader has produced one open-source-quality artifact (library, gist, or blog post) - not a notebook of toy snippets.
  3. py-spy, pytest -x, and pyright --strict are the teachers. When you do not understand why a program misbehaves, the first response is py-spy dump --pid <pid>, the second is a failing pytest with hypothesis, and only the third is to ask another human.

6. What Python Is Not For

A graduate of this curriculum should be able to argue these points in a design review without sounding ideological:

  • Tight CPU-bound loops without a vectorized library. The interpreter overhead is real. Either vectorize, drop to Cython/Rust/C, or use Numba/PyPy.
  • Hard-real-time systems. GC pauses are short but non-zero, refcount drops can cascade, and the GIL adds tail-latency variance. Wrong tool.
  • Mobile, sandboxed, or aggressively cold-started serverless. A Python interpreter + numpy + torch is a 1+ GB image and a 1+ second cold start. Choose Go, Rust, or a pre-warmed runtime.
  • Code where the team will not adopt typing. Untyped Python over ~5k lines becomes archaeology. A team that resists pyright --strict will fight Python at scale forever.

The signal that Python is the right tool: you have a glue, AI/data, or developer-velocity constraint that ranks above raw single-thread CPU efficiency.


7. A Note on AI-Assisted Workflows

Modern Python authors use LLM tooling. Three rules:

  1. Never accept generated async code without reading it. The most common failure mode of generated Python is "looks async, blocks the event loop" - time.sleep instead of asyncio.sleep, sync requests inside async def, blocking file I/O without run_in_executor.
  2. Verify generated type annotations. Models hallucinate from typing import paths and confuse list[int] (3.9+) with List[int]. Always run pyright.
  3. Treat suggested context-handling skeptically. Generators that hold file handles, async with mismatches, and unclosed httpx.AsyncClient instances are endemic in generated code. Use pytest --tb=short plus tracemalloc to catch leaks.

You are now ready for Week 1. Open 01_MONTH_FOUNDATIONS.md.

Month 1 - Foundations: The Data Model, Idioms, and Packaging

Goal: by the end of week 4 you can (a) explain the Python data model in terms of __dunder__ protocols and the type / object relationship, (b) write idiomatic comprehensions, generators, and iterators without reaching for indices, (c) wield try/except/else/finally and context managers correctly, and (d) ship a Python project as a uv-managed, ruff+pyright-clean, pytest-tested package installable via pipx.

This month is the only month aimed at the beginner. After week 4 the curriculum assumes fluency.


Weeks

Week 1 - Syntax, Values, Names, and the Data Model

1.1 Conceptual Core

  • Everything is an object, including types. type(int) is type and type(type) is type. Functions, modules, classes, exceptions, even None - all objects with attributes, addressable by name.
  • Names are not variables. A "variable" in Python is a binding from a name (in a namespace dict) to an object. Assignment never copies; it rebinds. Function arguments are pass-by-binding (often confusingly called "pass by object reference").
  • The data model is a protocol catalog. Every operator and built-in (len, iter, +, [], with, async with, repr) dispatches to a __dunder__ method. Mastering the data model is mastering the language.

1.2 Mechanical Detail

  • Built-in types you must internalize: int (arbitrary precision), float (IEEE 754 double), bool (subclass of int), str (Unicode, immutable), bytes (immutable), bytearray (mutable), list, tuple, dict (insertion-ordered since 3.7), set, frozenset, None, Ellipsis, NotImplemented.
  • Mutability vs. hashability: hashable ⇔ __hash__ defined and stable ⇔ usable as dict key / set element. Mutable built-ins (list, dict, set) are unhashable on purpose.
  • Identity vs. equality: is checks identity (id(a) == id(b)); == checks __eq__. Small int caching (-5..256) and string interning make is comparisons accidentally work - which is precisely why you must never use is for value comparison except against None, True, False.
  • Truthiness: __bool__ then __len__ then True. Falsy: 0, 0.0, "", b"", [], (), {}, set(), None, False.
  • f-strings (3.6+), debug f-strings (f"{x=}", 3.8+), =-format spec, lazy logging (logger.info("x=%s", x) not logger.info(f"x={x}") - the latter formats even at suppressed levels).

1.3 Lab - "The REPL Audit"

  1. In an interactive session, evaluate: a = [1,2,3]; b = a; b.append(4); print(a). Explain in writing.
  2. x = 256; y = 256; x is y → True. x = 257; y = 257; x is y → may be False. Explain.
  3. Write a class Money with __init__, __repr__, __eq__, __hash__, __lt__. Verify it sorts and deduplicates in a set. Add functools.total_ordering; observe what disappears.
  4. Write a class Vector2 with __add__, __sub__, __mul__ (scalar), __rmul__, __abs__, __iter__, __len__. Verify 2 * v works and list(v) works.

1.4 Idiomatic & Linter Drill

  • Install ruff. Configure with rule sets E, F, W, I, B, UP, SIM, RUF, PL. Run on a sample file; read each finding's URL.
  • Read PEP 8 once. Read PEP 20 (import this) and pin it.

1.5 Production Hardening Slice

  • Initialize a project with uv init. Add ruff and pyright as dev deps. Add pyproject.toml with [tool.ruff] and [tool.pyright] strict configurations. Add a Makefile target make check that runs ruff check, ruff format --check, pyright, pytest. This is the baseline; every subsequent week extends it.

Week 2 - Control Flow, Functions, Errors, and the Call Model

2.1 Conceptual Core

  • Functions are first-class objects with attributes (__name__, __doc__, __annotations__, __defaults__, __closure__, __code__).
  • Default arguments are evaluated once, at definition time. The single most common Python footgun: def f(x, acc=[]): - acc is shared across calls. Use None sentinel + body-side default.
  • EAFP over LBYL. try: d[k] over if k in d: d[k]. The exception path is fast in CPython when not raised, and the LBYL form races under threads.

2.2 Mechanical Detail

  • Argument passing: positional, keyword, *args, **kwargs, positional-only (/, PEP 570), keyword-only (*). Know the order: def f(pos1, pos2, /, both, *, kw_only).
  • Closures and the nonlocal keyword. Late binding in closures ([lambda: i for i in range(3)] returns three lambdas all returning 2); fix with default-arg trick lambda i=i: i or with comprehension scoping.
  • The exception hierarchy: BaseExceptionException → everything user-catchable. KeyboardInterrupt and SystemExit are siblings of Exception, not subclasses - except Exception does not catch them, by design.
  • Exception chaining: raise NewError("...") from cause, implicit chaining via __context__. Exception groups (PEP 654, except*) for concurrent code.
  • try / except / else / finally: the else runs only if no exception; finally always runs, even on return.

2.3 Lab - "The Calculator and the Cancel"

  1. Build a tiny expression evaluator over + - * / using ast.parse + a custom NodeVisitor. Reject anything else. (Do not use eval.)
  2. Add a --repl mode. Make Ctrl-C interrupt the current expression but not exit. Make Ctrl-D exit cleanly.
  3. Wrap division; raise a custom EvalError chained from ZeroDivisionError via from.
  4. Add a --time-budget flag using signal.SIGALRM (POSIX) or a watchdog thread (cross-platform). Document the trade-off.

2.4 Idiomatic & Linter Drill

  • Enable ruff rule set TRY (try/except hygiene). Refactor a sample with broad except: and bare raise into PEP-compliant code.

2.5 Production Hardening Slice

  • Add pytest with pytest-cov. Write tests for the calculator. Aim 100% line and branch coverage for the AST evaluator. Commit a coverage badge target.

Week 3 - Collections, Comprehensions, Iterators, and Generators

3.1 Conceptual Core

  • The four collection workhorses: list (dynamic array), tuple (immutable record), dict (open-addressed hash table, insertion-ordered), set (hash table). Internalize O() costs: list append amortized O(1), insert(0, x) O(n); dict in O(1) avg; list in O(n).
  • An iterator is a stateful cursor; an iterable is anything you can call iter() on. for x in xs: desugars to it = iter(xs); while True: try: x = next(it) except StopIteration: break.
  • Generators are coroutines that yield values, not just iterators. They preserve local state across yield; they accept values via .send(); they handle exceptions via .throw(); they clean up via .close().

3.2 Mechanical Detail

  • collections.deque (O(1) both ends), collections.Counter, collections.defaultdict, collections.ChainMap, collections.OrderedDict (now mostly redundant; still useful for move_to_end), collections.namedtuple (legacy; prefer dataclass(slots=True, frozen=True) or typing.NamedTuple).
  • itertools mastery: chain, islice, takewhile, dropwhile, groupby (note: requires sorted input), tee, product, permutations, combinations, accumulate, pairwise (3.10+), batched (3.12+).
  • Comprehensions in all four flavors: list, set, dict, generator. Generator expression vs. list comprehension: when the consumer is sum, any, all, min, max, or anything that doesn't need a materialized list, prefer the genexp - same syntax minus the brackets.
  • yield from (PEP 380): delegate to a sub-generator, propagate sends/throws.
  • The buffer protocol primer: memoryview over bytes/bytearray/array.array/numpy.ndarray lets you slice without copying. Critical in the AI months.

3.3 Lab - "Streaming Word Count"

  1. Implement wc -w over arbitrarily large files using a generator pipeline: file → lines → words → counts. Constant memory regardless of file size.
  2. Add a --top K flag using heapq.nlargest. Note that you must materialize the counter - discuss why.
  3. Replace your hand-rolled tokenizer with re.finditer and benchmark. Then benchmark a str.split() version. Explain the difference.
  4. Add a --parallel N flag using concurrent.futures.ProcessPoolExecutor and itertools.batched. (We will revisit in Month 4.)

3.4 Idiomatic & Linter Drill

  • Enable ruff C4 (comprehensions) and PERF rules. Refactor for-with-append patterns into comprehensions. Identify cases where the comprehension is less readable and document them.

3.5 Production Hardening Slice

  • Add hypothesis as a dev dep. Write property tests for your word counter: invariants like "total = sum of counts," "tokens are non-empty," "shuffling input lines doesn't change the count."

Week 4 - Modules, Packaging, Virtual Environments, and the Import System

4.1 Conceptual Core

  • A module is a .py file (or .so / .pyd extension) that becomes a singleton object on first import, cached in sys.modules. Re-importing returns the cached object; importlib.reload re-executes (with caveats - old references to old objects persist).
  • A package is a directory with an __init__.py (or a namespace package, PEP 420, with no __init__.py).
  • Virtual environments are not optional. A modern Python project lives in a per-project .venv/, managed by uv, hatch, poetry, or pip-tools. System Python is for the OS, not your code.

4.2 Mechanical Detail

  • Import resolution order: sys.modules cache → finders in sys.meta_path → loaders. The default finders are BuiltinImporter, FrozenImporter, PathFinder (which searches sys.path).
  • Absolute vs. relative imports (from . import sibling, from ..pkg import x). Prefer absolute.
  • __main__: python -m mypkg runs mypkg/__main__.py as __main__. The if __name__ == "__main__": idiom exists because a module imported as a library has a different __name__ than one run as a script.
  • pyproject.toml (PEP 517, 518, 621, 660): the single source of truth for project metadata, build backend, dependencies, and tool configuration. setup.py is dead for new projects.
  • Build backends: hatchling, setuptools, flit-core, poetry-core, maturin (for Rust extensions), scikit-build-core (for C/C++/CMake).
  • Dependency resolution: pip (legacy, slow), uv (fast, Rust, drop-in pip replacement and resolver), poetry (lockfile-first). The curriculum standardizes on uv for speed and ecosystem direction.

4.3 Lab - "Ship a CLI"

  1. Build a CLI tool - e.g., a Markdown table of contents generator. Project layout: src/toctool/{__init__.py,__main__.py,cli.py,core.py}, tests/, pyproject.toml.
  2. Configure [project.scripts] toctool = "toctool.cli:main". Verify pipx install . makes toctool available system-wide.
  3. Add a [project.optional-dependencies] dev = [...] group. uv sync --extra dev installs the dev tools.
  4. Tag v0.1.0. Build wheel + sdist with uv build. Inspect the wheel with unzip -l. Confirm no test files leaked in.
  5. (Optional, sets up later weeks) Publish to TestPyPI.

4.4 Idiomatic & Linter Drill

  • Enable ruff rule set TID (banned-imports), INP (implicit namespace packages). Configure your __init__.py to re-export a curated public API (__all__).

4.5 Production Hardening Slice

  • Add a pre-commit config running ruff check, ruff format, pyright, and pytest -x. Add a GitHub Actions (or equivalent) CI workflow that runs make check on push and matrix-tests over Python 3.12 and 3.13.

Month-1 Exit Criteria

Before starting Month 2, the reader should be able to, on a whiteboard:

  1. Diagram the namespace lookup order for a name in a function inside a class inside a module (LEGB, with the class scope wrinkle).
  2. Explain the difference between is, ==, and __eq__.
  3. Write a generator pipeline that processes a 100GB log file in constant memory.
  4. Bootstrap a publishable Python package with uv, ruff, pyright, pytest, and CI in under 30 minutes.

Month 2 - Intermediate Idioms: Decorators, Context Managers, Dataclasses, Typing

Goal: by the end of week 8 you can (a) write decorators that preserve type signatures and stack cleanly with functools.wraps and ParamSpec, (b) build correct context managers (sync and async) and reason about their teardown order, (c) model domain objects with dataclasses and pydantic knowing when each is appropriate, and (d) write pyright --strict-clean code using Protocol, generics, TypedDict, Literal, overload, and TypeGuard.


Weeks

Week 5 - Object Model Deep Dive: Classes, Descriptors, Metaclasses

5.1 Conceptual Core

  • Attribute lookup is a protocol, not a field read. obj.x calls type(obj).__getattribute__(obj, "x"), which checks the data-descriptor chain on the type, then the instance dict, then non-data descriptors, then __getattr__.
  • Descriptors (__get__, __set__, __delete__) are how @property, staticmethod, classmethod, and __slots__ actually work. Understanding descriptors is understanding 80% of Python's "magic."
  • Metaclasses (type is the default) intercept class creation. They are over-used; __init_subclass__ (PEP 487) and class decorators cover most legitimate use cases.

5.2 Mechanical Detail

  • __slots__: replaces the per-instance __dict__ with a fixed-size struct of slot descriptors. Saves ~40–60% memory per instance and speeds attribute access. Cost: no dynamic attributes, multiple-inheritance gotchas. Mandatory in hot-path data classes.
  • MRO (method resolution order) and the C3 linearization: MyClass.__mro__. Diamond inheritance is solvable but signals over-design.
  • super(): a proxy object that walks the MRO of type(self) starting after the current class. Always cooperative; do not pass super() __init__ arguments unless you know the MRO.
  • @property, @cached_property (3.8+, requires writable __dict__ - incompatible with default __slots__ unless you slot __dict__).
  • __init_subclass__(cls, **kwargs) runs at subclass creation. Used for plugin registration, validation of subclass invariants. The 90%-case alternative to a metaclass.

5.3 Lab - "Build a Tiny ORM"

  1. Implement a Field descriptor with type validation and a default. class User: name = Field(str); age = Field(int, default=0).
  2. Use __init_subclass__ to collect declared fields into cls._fields. Auto-generate __init__ and __repr__.
  3. Compare your hand-rolled version to @dataclass(slots=True). Note where dataclass is better (PEP-595 ordering, __eq__, __hash__).
  4. Implement a RegistryMeta metaclass that records every subclass in a class-level dict. Then re-implement using __init_subclass__. Defend the simpler version in writing.

5.4 Idiomatic & Linter Drill

  • Enable ruff SLOT rule. Add __slots__ to every internal data class in your project. Note size delta with pympler.asizeof.

5.5 Production Hardening Slice

  • Add pyright strict mode. Add from __future__ import annotations and switch to PEP 604 union syntax (X | Y). Resolve all type errors.

Week 6 - Decorators, functools, and contextlib

6.1 Conceptual Core

  • A decorator is just f = decorator(f). The @ is sugar.
  • A useful decorator preserves: name, docstring, signature, type annotations, async-ness, and __wrapped__ for introspection. functools.wraps handles the first three; preserving signature and type requires ParamSpec (PEP 612).
  • Class decorators decorate the class object itself. @dataclass is the canonical example.

6.2 Mechanical Detail

  • functools.wraps, functools.partial, functools.partialmethod, functools.lru_cache (and cache in 3.9+ for unbounded), functools.singledispatch, functools.singledispatchmethod, functools.reduce (rarely the right tool - usually a comprehension or sum).
  • Type-preserving decorators with ParamSpec and TypeVar:
    from typing import Callable, ParamSpec, TypeVar
    P = ParamSpec("P"); R = TypeVar("R")
    def timed(fn: Callable[P, R]) -> Callable[P, R]: ...
    
  • contextlib.contextmanager for generator-based context managers; contextlib.asynccontextmanager for async.
  • contextlib.ExitStack / AsyncExitStack: the right tool for a dynamic number of context managers (e.g., opening a list of files determined at runtime).
  • contextlib.suppress, contextlib.closing, contextlib.redirect_stdout.

6.3 Lab - "The Retry Decorator That Doesn't Lie About Its Type"

  1. Write @retry(times=3, on=(IOError,), backoff=0.1). Make it work on both sync and async functions (detect with asyncio.iscoroutinefunction).
  2. Use ParamSpec so that pyright --strict preserves the wrapped signature.
  3. Add structured logging on each retry. Add a tenacity-style backoff strategy (constant, exponential, jittered).
  4. Compare to tenacity library; document where yours is simpler / worse / better.

6.4 Idiomatic & Linter Drill

  • Enable ruff FBT (boolean-trap), ARG (unused arguments). Refactor decorators to take keyword-only configuration.

6.5 Production Hardening Slice

  • Add mypy (in addition to pyright) with strict_optional, disallow_any_generics. The two type checkers disagree on edge cases; configuring both surfaces those.

Week 7 - Dataclasses, attrs, Pydantic, and the Validation Boundary

7.1 Conceptual Core

  • The single most important architectural decision in a typed Python codebase: where is the validation boundary? Internal types should be cheap (@dataclass(slots=True, frozen=True)); boundary types (HTTP request bodies, message-bus payloads, LLM outputs) should validate (pydantic.BaseModel).
  • "Parse, don't validate." Once a value is past the boundary, it should be a typed object that cannot be malformed; checks afterward are dead code.

7.2 Mechanical Detail

  • dataclasses.dataclass parameters: frozen, slots, kw_only, eq, order, repr, match_args. Defaults that should be field(default_factory=list) - never bare [].
  • attrs (the original): faster validators, evolve, slots by default. Still relevant; dataclass won the stdlib slot but attrs keeps innovating.
  • pydantic v2 (Rust core, ~10x faster than v1): BaseModel, Field(..., gt=0, le=100), model_validator, field_validator, discriminated unions, Annotated[..., AfterValidator(...)]. JSON schema export for free.
  • TypedDict (PEP 589): for dict-shaped data with known keys (e.g., LLM tool-call payloads). Cheaper than Pydantic, no runtime validation. Pair with cast at the boundary or with pydantic.TypeAdapter.

7.3 Lab - "The Three-Layer Cake"

  1. Build an HTTP service (FastAPI, but kept small):
  2. Boundary layer: Pydantic RequestModel / ResponseModel.
  3. Domain layer: @dataclass(slots=True, frozen=True) value objects.
  4. Persistence layer: TypedDict rows from sqlite3.
  5. Write explicit converters between each layer. Resist the urge to make them the same type.
  6. Benchmark a 10k-request loop with Pydantic v1 (if installed) vs. v2. Document the 10x.

7.4 Idiomatic & Linter Drill

  • Enable ruff D (pydocstyle). Document every public class and function. Enforce Google or NumPy docstring style.

7.5 Production Hardening Slice

  • Add schemathesis or property-based tests against your FastAPI app. Generate inputs from the OpenAPI schema; confirm 5xx never occurs on valid input shapes.

Week 8 - The Type System: Generics, Protocols, Variance, and typing.*

8.1 Conceptual Core

  • Python's type system is gradual and structural-where-it-matters. Protocol lets duck typing meet static checking - the type system catches up to the language's actual semantics.
  • Variance: list[Cat] is not a list[Animal] (mutable → invariant). Sequence[Cat] is a Sequence[Animal] (read-only → covariant). Callable[[Animal], None] accepts Callable[[Cat], None]? No (parameters are contravariant).

8.2 Mechanical Detail

  • Generics (PEP 695, 3.12+): the new clean syntax - def first[T](xs: list[T]) -> T: ... and class Box[T]: .... Old TypeVar syntax still works.
  • Protocol, runtime_checkable. Structural typing for the Iterable, Sized, SupportsLen, etc., families.
  • Literal, LiteralString (PEP 675, security-relevant), Final, NewType, TypeAlias (PEP 695: type Vector = list[float]).
  • overload: multiple stubs for one implementation. Use sparingly; usually a sign of conflated responsibilities, sometimes legitimately needed (e.g., typing.cast, JSON parser return).
  • TypeGuard (3.10) and TypeIs (3.13): user-defined narrowing predicates. TypeIs is the strictly better one going forward - it narrows in the negative branch too.
  • Annotated[T, metadata] (PEP 593): the foundation of FastAPI/Pydantic field metadata, validators, and dependency injection.

8.3 Lab - "Make Pyright Strict"

  1. Take a 500-LOC module of your existing code. Run pyright --strict. Resolve every error.
  2. Add a Protocol for a "thing-with-an-id" and refactor a function that previously took Any.
  3. Use TypeIs to narrow dict | list returned from json.loads into safe shapes for downstream use.
  4. Where you find yourself reaching for cast, document why and consider whether the boundary belongs at a Pydantic model.

8.4 Idiomatic & Linter Drill

  • Enable ruff ANN (annotation hygiene), PYI (stub files). Aim for 100% annotated public surface; private may use inference.

8.5 Production Hardening Slice

  • Add mypy --strict to CI. Generate Sphinx docs from docstrings. Publish to GitHub Pages. By end of week 8, the project has a public docs site.

Month-2 Exit Criteria

Before starting Month 3, the reader should be able to:

  1. Write a decorator that wraps both sync and async functions and preserves their type signatures under pyright --strict.
  2. Choose between dataclass, attrs, pydantic, TypedDict, and NamedTuple for a given use case and defend the choice.
  3. Add Protocols to make an old codebase amenable to dependency injection without any code change at call sites.
  4. Articulate the validation boundary in their own architecture and where the parse-don't-validate principle is or isn't held.

Month 3 - Runtime and Performance: CPython Internals, GIL, GC, the Allocator

Goal: by the end of week 12 you can (a) read CPython bytecode and predict where the eval loop will spend its time, (b) explain refcounting, the cyclic GC, and generational thresholds, (c) characterize a workload as GIL-bound, allocator-bound, or I/O-bound from py-spy/scalene/memray output, and (d) apply the four tiers of optimization (algorithmic → vectorize → C extension → JIT) with judgment.

This is the hardest month of the curriculum. Take it seriously.


Weeks

Week 9 - The CPython VM: Objects, Bytecode, the Eval Loop

9.1 Conceptual Core

  • CPython is a stack-based bytecode interpreter with reference counting + a generational cyclic GC. Every PyObject is a 16-byte header (ob_refcnt, ob_type) + type-specific tail.
  • The eval loop (Python/ceval.c::_PyEval_EvalFrameDefault) is a giant computed-goto dispatch over opcodes. Since 3.11, the loop is specializing and adaptive (PEP 659): hot opcodes get rewritten in place to type-specialized variants (LOAD_ATTR_INSTANCE_VALUE, BINARY_OP_ADD_INT).

9.2 Mechanical Detail

  • dis.dis(fn): disassemble a function. Memorize the common opcodes: LOAD_FAST, STORE_FAST, LOAD_GLOBAL, LOAD_CONST, CALL, RETURN_VALUE, BINARY_OP, COMPARE_OP, FOR_ITER, POP_JUMP_IF_FALSE, LOAD_ATTR, STORE_SUBSCR.
  • Why local lookups are fast and global lookups are slow: locals are a fixed-size array indexed by integer (fast locals), globals are a dict lookup. Hot functions often hoist globals to locals (def f(_len=len): ...).
  • Frame objects, code objects, and the difference. func.__code__.co_consts, co_names, co_varnames, co_flags.
  • The specializing interpreter: read PEP 659 once. Use python -X opt -c "import dis; dis.dis(fn, adaptive=True)" to see specialized opcodes after warm-up.
  • Free lists and small-int / interned-string caches.

9.3 Lab - "Bytecode Forensics"

  1. Write three implementations of "sum of squares": a for loop, a sum() + genexp, and numpy.dot(a, a). dis.dis each. Benchmark with timeit. Explain the gap.
  2. Take a function with a global lookup in its hot loop. Refactor to a default-argument cache. Re-bench. Quantify the win.
  3. Use sys.setprofile to count opcode-level events on a small program. Compare counts before and after warm-up to observe specialization.

9.4 Idiomatic & Linter Drill

  • Enable ruff PERF. Read every rule. Identify cases in your codebase where the rule applies but readability suffers.

9.5 Production Hardening Slice

  • Add pytest-benchmark to CI as a non-failing job that publishes JSON results. Build a script that flags >10% regressions on PRs.

Week 10 - Memory: Refcounts, Cyclic GC, the pymalloc Allocator

10.1 Conceptual Core

  • Reference counting is eager - most objects die at refcount 0, deterministically, often without invoking the GC at all. This is why Python file handles can be closed by del f and why context managers are the right answer for resources that cannot tolerate non-determinism.
  • The cyclic GC handles only objects that might form cycles (containers). It runs in three generations with thresholds. It does not free memory; it breaks cycles so refcounting can free memory.
  • The CPython allocator (pymalloc) is an arena/pool/block allocator tuned for small (<512B) objects. Large allocations go to the system malloc.

10.2 Mechanical Detail

  • sys.getrefcount(obj): returns refcount + 1 (the temporary on the call stack). weakref.ref to break cycles.
  • gc.set_threshold, gc.disable, gc.collect. Disabling GC during a known short-lived high-allocation phase (e.g., model loading) and re-enabling after is a real production technique.
  • Memory leaks in pure Python are almost always (a) caches without bounds, (b) closures capturing large objects, (c) __del__ methods on cyclic objects (legacy issue; mostly fixed since 3.4 / PEP 442). Find with tracemalloc or memray.
  • __slots__ revisited: per-instance memory savings, attribute-access speed-ups, the inheritance gotcha.
  • array.array, bytes, bytearray, memoryview, numpy.ndarray: when not to make Python objects in the first place.

10.3 Lab - "Find the Leak"

  1. Write a service that has a deliberate leak: an unbounded dict cache, a leaking closure, and a circular reference with a __del__. Run under memray and tracemalloc. Identify each leak from the output.
  2. Bound the cache with functools.lru_cache(maxsize=...). Confirm with memray that growth flatlines.
  3. Profile a NumPy-heavy workload. Observe that pymalloc and Python refcounts are largely unused - most memory is in NumPy buffers. Internalize: "NumPy is a different memory world."

10.4 Idiomatic & Linter Drill

  • Enable ruff B008, B023. Catch closure-capture bugs at lint time.

10.5 Production Hardening Slice

  • Add a memray smoke job to CI: run the service against a fixture, fail if peak RSS exceeds a threshold.

Week 11 - The GIL, Free-Threaded Python, and the Concurrency Model

11.1 Conceptual Core

  • The Global Interpreter Lock serializes Python bytecode execution. It does not serialize C extensions that release it (NumPy, PyTorch, time.sleep, most I/O). This is why "Python is single-threaded" is wrong on the parts that matter.
  • CPU-bound, pure-Python → multiprocessing or free-threaded build (PEP 703).
  • CPU-bound, native (NumPy/torch) → threads are fine; the GIL is released.
  • I/O-bound → asyncio (preferred) or threads.

11.2 Mechanical Detail

  • The GIL is a single mutex around the interpreter state. Released on I/O syscalls and on every ~5ms timeslice (sys.setswitchinterval).
  • Py_BEGIN_ALLOW_THREADS / Py_END_ALLOW_THREADS in C extensions: how NumPy/torch escape.
  • PEP 703 free-threaded build (3.13 experimental, 3.14 stable target): per-object locking, biased reference counting, deferred reference counting for immortal objects. Trade-off: ~10–20% single-threaded slowdown, true parallel threads. Build with --disable-gil or use python3.13t.
  • PEP 684 per-interpreter GIL: subinterpreters with their own GIL, interpreters module (PEP 734, 3.13+). Promising for embedded multitenancy.
  • threading.Lock, RLock, Semaphore, Condition, Event, Barrier. The atomicity guarantees of CPython on dict/list ops are implementation details, not language-level - under free-threaded these change. Always use a lock if you require atomicity.

11.3 Lab - "GIL Awareness"

  1. Compute primes up to 1M three ways: (a) single thread, (b) threading with 8 threads, (c) multiprocessing with 8 procs. Bench all three on stock CPython.
  2. Run (b) on python3.13t (free-threaded). Compare.
  3. Replace the prime-test inner loop with a NumPy expression. Re-run (b) on stock CPython. Note the GIL-release effect.
  4. Capture py-spy record flame graphs for each. Identify GIL contention visually.

11.4 Idiomatic & Linter Drill

  • Enable ruff ASYNC. Catch time.sleep inside async def, blocking I/O inside async, etc.

11.5 Production Hardening Slice

  • Add py-spy continuous profiling to your CI'd service. On every PR, attach a flame graph artifact.

Week 12 - The Optimization Ladder: Algorithm → Vectorize → Native → JIT

12.1 Conceptual Core

The ladder, in order of expected ROI per hour:

  1. Algorithm and data structure. Big-O still wins by orders of magnitude. set membership over list membership. heapq over re-sorting. bisect over linear scan.
  2. Stay in the C layer. Replace Python loops with NumPy / Polars / itertools / built-in sum/min/max/map. The eval loop is your enemy; comprehensions and built-ins minimize trips through it.
  3. Native extension. Cython, mypyc, Rust+PyO3, C+Pybind11. Write Python, profile, then rewrite the hot 5%.
  4. JIT. PyPy (drop-in for many workloads, no NumPy if you use cpyext), Numba (NumPy-aware JIT), or wait for the upcoming CPython JIT (PEP 744 tier-2 + tier-3 copy-and-patch JIT, experimental in 3.13).

12.2 Mechanical Detail

  • Profilers, in order of use:
  • cProfile + snakeviz: function-level, deterministic, ~10% overhead.
  • py-spy: sampling, attaches to running process, no code changes, ~1% overhead. Use this in production.
  • scalene: CPU + memory + GPU, line-level, low overhead.
  • memray: memory-focused, flamegraphs.
  • pyinstrument: low-overhead sampling, beautiful HTML output.
  • Vectorization patterns: replace for x in xs: ys.append(x*2 + 1) with np.asarray(xs) * 2 + 1. Replace for row in df.iterrows(): with column expressions in Polars / Pandas.
  • Cython mental model: write Python, add cdef int i, recompile, get 10–100x. Worth it for hot inner kernels.
  • mypyc (compiles type-annotated Python to C extensions; powers mypy itself, black).
  • PyO3 + maturin: Rust extensions with maturin develop. The right tool when you also need true threads and predictable memory.

12.3 Lab - "Climb the Ladder"

Take a deliberately slow workload - e.g., compute pairwise cosine similarity between 10k 768-dim vectors with a pure-Python triple loop. Time it. Then climb: 1. Algorithmic: skip pairs already computed. 2. Vectorize: NumPy batched matmul with norm. 3. Cython rewrite of the inner kernel. 4. Numba @njit on the same. 5. (Stretch) Rust + PyO3 implementation. 6. Compare to faiss / hnswlib.

Tabulate speedups in NOTES.md. The lesson is that step 2 usually wins by 100x and step 3+ by ~2x more - but step 6 (use the right library) wins by 1000x. Algorithm > implementation > tuning.

12.4 Idiomatic & Linter Drill

  • Enable ruff NPY (NumPy-specific rules). Refactor numerical code to use modern NumPy idioms.

12.5 Production Hardening Slice

  • Add pytest-benchmark regression gates. Add a perf/ directory of benchmarks tracked in CI with historical data.

Month-3 Exit Criteria

Before starting Month 4:

  1. Read dis.dis output and predict relative cost of two Python implementations.
  2. Identify a real memory leak from a memray flamegraph.
  3. Choose between threads / processes / asyncio / free-threaded for a given workload, with one paragraph of justification.
  4. Apply the optimization ladder in order - and refuse to skip step 1.

Month 4 - Concurrency and Parallelism: asyncio, Threads, Processes, Free-Threaded, Subinterpreters

Goal: by the end of week 16 you can (a) build a non-trivial asyncio service without ever blocking the event loop, (b) reason about cancellation, timeouts, and structured concurrency in both asyncio and anyio, (c) move work between threads, processes, and subinterpreters with clear cost/benefit, and (d) write a C-extension or Rust binding that releases the GIL and parallelizes correctly.

This month and Month 6 are where the senior-level signal really lives.


Weeks

Week 13 - asyncio Foundations: Event Loop, Tasks, Coroutines

13.1 Conceptual Core

  • An async function is a function that returns a coroutine object. Awaiting yields control to the event loop. The event loop drives many coroutines, switching at every await.
  • The cardinal sin: blocking the event loop. A single time.sleep(1), sync DB call, or CPU-heavy loop in a coroutine stalls every other task. This is the most common production asyncio bug.
  • Task vs. Coroutine. A coroutine is a description; a Task is a coroutine scheduled on the loop. await coro runs it inline; asyncio.create_task(coro) runs it concurrently and returns a handle.

13.2 Mechanical Detail

  • asyncio.run, asyncio.create_task, asyncio.gather, asyncio.wait, asyncio.as_completed, asyncio.wait_for, asyncio.TaskGroup (3.11+, the way to write structured concurrency since 3.11).
  • async with, async for, async generators, __aenter__/__aexit__, __aiter__/__anext__.
  • asyncio.Queue, asyncio.Lock, asyncio.Event, asyncio.Semaphore, asyncio.Condition. None are thread-safe; for thread-safe inter-loop comm, use asyncio.run_coroutine_threadsafe or janus.
  • Cancellation is cooperative and exception-based: task.cancel() injects CancelledError at the next await. Code that catches Exception swallowing CancelledError is the asyncio anti-pattern; in 3.11+, CancelledError is no longer a subclass of Exception - but old code remains.
  • Timeouts: async with asyncio.timeout(5): (3.11+) is the idiomatic form. asyncio.wait_for is older and has subtle cancellation pitfalls.
  • loop.run_in_executor(None, blocking_fn, args): the escape hatch for blocking calls. Use for legacy DB drivers, file I/O if not using aiofiles, and CPU work.

13.3 Lab - "The Crawler That Doesn't Lie"

  1. Build an async HTTP crawler with httpx.AsyncClient and a TaskGroup. Limit concurrency with a Semaphore(N).
  2. Add a 5-second per-request timeout using asyncio.timeout. Verify cancellation propagates cleanly to the httpx request.
  3. Inject a deliberately blocking time.sleep(2) somewhere. Detect it with asyncio.get_event_loop().slow_callback_duration = 0.1 and the resulting log warnings.
  4. Replace the blocker with asyncio.sleep. Confirm via py-spy dump that the loop never stalls.

13.4 Idiomatic & Linter Drill

  • Enable ruff ASYNC rule set in full. Catch every blocking call inside async def.

13.5 Production Hardening Slice

  • Add aiomonitor or aiodebug to your dev environment. Add a request-id ContextVar and structured logging that propagates across await boundaries.

Week 14 - Structured Concurrency, Cancellation, ExceptionGroups, anyio

14.1 Conceptual Core

  • Structured concurrency: a parent task does not exit until its children have finished. No orphaned tasks, no leaked work. asyncio.TaskGroup (3.11+) and anyio.create_task_group are the canonical implementations; both inspired by Trio.
  • Cancellation must be honored. A coroutine that catches BaseException (or worse, Exception in <3.11) and ignores it breaks structured concurrency. The contract: if you must catch, re-raise CancelledError.
  • ExceptionGroup (PEP 654, 3.11+): when multiple sibling tasks fail, you get an ExceptionGroup containing all of them. except* ValueError: matches the subset.

14.2 Mechanical Detail

  • anyio as the portable abstraction: works on asyncio or trio backends, gives you create_task_group, move_on_after, fail_after, to_thread.run_sync, from_thread.run. If you write libraries, prefer anyio over raw asyncio.
  • contextvars.ContextVar: the async-safe replacement for threading.local. Used by tracing libraries, request-id propagation, FastAPI dependencies.
  • Backpressure: bounded asyncio.Queue is your friend. Unbounded queues are how async services OOM in production.
  • The eager_task_factory (3.12+): start tasks eagerly when possible, reducing scheduling overhead.

14.3 Lab - "The Fan-Out That Cleans Up After Itself"

  1. Refactor your week-13 crawler to use TaskGroup (or anyio task group).
  2. Add a "first-error wins" mode: as soon as any task raises, all siblings are cancelled and the group raises an ExceptionGroup.
  3. Add a "best-effort" mode: collect all results and exceptions, return both.
  4. Verify via test that cancelling the parent cancels every in-flight HTTP request within 100ms.

14.4 Idiomatic & Linter Drill

  • Add ruff RUF006 (asyncio dangling tasks). Refactor any create_task not held in a TaskGroup or kept-reference set.

14.5 Production Hardening Slice

  • Add OpenTelemetry instrumentation. Verify trace context propagates across TaskGroup boundaries.

Week 15 - Threads, Processes, Subinterpreters, concurrent.futures

15.1 Conceptual Core

  • concurrent.futures is the unified high-level API. ThreadPoolExecutor for I/O or GIL-releasing C; ProcessPoolExecutor for pure-Python CPU.
  • multiprocessing start methods: fork (fast, dangerous with threads/locks/CUDA), spawn (safe default on macOS/Windows, slower), forkserver (good middle ground on Linux).
  • Pickle is the IPC currency for multiprocessing. Things that can't be pickled (lambdas, locally defined classes, open file handles) cannot cross the boundary. cloudpickle is the third-party escape hatch.
  • Subinterpreters (PEP 684/734, 3.13+): each interpreter has its own GIL, its own modules, its own sys. Communication via interpreters.Queue or shared memory. Lighter than processes, heavier than threads.

15.2 Mechanical Detail

  • multiprocessing.shared_memory.SharedMemory (3.8+): zero-copy buffers across processes. Pair with numpy.ndarray(buffer=shm.buf) for big-array IPC.
  • multiprocessing.Manager: proxy objects for list, dict, etc. Convenient but slow - every op is an IPC.
  • os.fork() directly is rarely correct in modern Python; use multiprocessing or subprocess.
  • The free-threaded build (PEP 703): with python3.13t, ThreadPoolExecutor becomes a true parallel CPU executor for pure-Python code. The future-state replacement for many ProcessPoolExecutor use cases.

15.3 Lab - "Pick Your Parallelism"

For each workload, pick a model and justify: 1. Compress 10k JPEGs in parallel. 2. Run 10k HTTP requests against an external API (rate-limited). 3. Compute SHA-256 of 10k 1MB blobs. 4. Train 10 small models concurrently sharing a GPU.

Implement at least two of them three ways: threads, processes, asyncio. Bench. Write up the right answer.

15.4 Idiomatic & Linter Drill

  • Add ruff S (security, bandit-style). Catch subprocess.run(..., shell=True) and the unpickling of untrusted input.

15.5 Production Hardening Slice

  • Add a deadlock-detection probe: a watchdog thread that dumps py-spy if the main loop hasn't ticked in 30s. Ship it as part of the hardening template.

Week 16 - Native Extensions, Releasing the GIL, FFI

16.1 Conceptual Core

  • The fastest Python is Python that calls into C. The fastest correct Python is Python that calls into C and releases the GIL while it's there. NumPy, PyTorch, and hashlib do this; many third-party C extensions don't.
  • Three FFI options today:
  • Cython for inner-loop kernels (write Python with type hints, compile to C).
  • Rust + PyO3 + maturin for everything else: thread-safe, memory-safe, modern build.
  • ctypes / cffi for calling existing .so/.dll libraries without writing an extension.

16.2 Mechanical Detail

  • The CPython C API: PyObject *, refcount discipline (Py_INCREF/Py_DECREF), the GIL macros (Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS), exception handling (PyErr_SetString).
  • The Stable ABI (PEP 384) and the Limited API. Building wheels that work across CPython versions.
  • numpy's C API: PyArrayObject, PyArray_DATA, contiguity flags. The buffer protocol (PEP 3118): memoryview, __buffer__ on Python types.
  • PyO3 idioms: #[pyfunction], #[pyclass], Bound<'py, PyAny>, py.allow_threads(|| {...}) to release the GIL.
  • HPy: a successor C API, portable across PyPy/CPython/GraalPy. Worth knowing about; not yet load-bearing.

16.3 Lab - "Write the Hot Kernel in Rust"

  1. Take the cosine-similarity workload from week 12. Implement it in Rust with PyO3.
  2. Use py.allow_threads(|| ...) around the SIMD loop. Verify with a Python ThreadPoolExecutor(8) that you get ~8x speedup.
  3. Compare to NumPy and to your Cython version. Write up the cost in code complexity.
  4. Bonus: expose a Vector #[pyclass] and benchmark crossing the FFI per-call vs. per-batch. Internalize the per-call FFI cost.

16.4 Idiomatic & Linter Drill

  • Add cargo clippy to your Rust crate. Add maturin develop --release to your dev workflow.

16.5 Production Hardening Slice

  • Build manylinux wheels with cibuildwheel. Add a CI matrix: cp312, cp313, cp313t, cp314 across linux/x86_64, linux/aarch64, macos/arm64. This is the modern wheel-distribution baseline.

Month-4 Exit Criteria

Before starting Month 5:

  1. Build an asyncio service that survives kill -INT mid-flight without dropping requests or leaking tasks.
  2. Pick - and justify - between threads, processes, asyncio, free-threaded, and subinterpreters for any workload.
  3. Write a Rust extension that releases the GIL and verify parallel scaling.
  4. Diagnose an event-loop stall from a py-spy dump alone.

Month 5 - Patterns and Architecture: Pythonic Design, Testing, Observability, Service Shape

Goal: by the end of week 20 you can (a) translate the Gang-of-Four catalog into Pythonic forms (and reject the ones that don't translate), (b) choose the right data structure for a given problem from a much larger menu than list/dict, (c) ship a FastAPI service with structured logging, metrics, traces, and a credible test pyramid, and (d) lay out a multi-package monorepo whose import graph doesn't cycle.


Weeks

Week 17 - Pythonic Design Patterns

17.1 Conceptual Core

The GoF book describes patches around C++ and Java limitations: no first-class functions, no closures, no duck typing, mandatory class hierarchies. Many "patterns" in Python collapse to language features. Some still apply. Knowing which is which is senior-level taste.

17.2 The Catalog, Translated

GoF Pattern Pythonic Form
Strategy A function passed as an argument. Or a Protocol.
Template Method A function with hooks; ABC + abstractmethod only when you need enforcement.
Factory / Abstract Factory A function. Or __init_subclass__ registry. Or functools.singledispatch.
Singleton A module. import is the singleton. @lru_cache(maxsize=None) on a constructor for parameterized singletons.
Observer signal/blinker, asyncio Queue, or just a list of callbacks.
Iterator Built into the language (__iter__).
Decorator (GoF) Decorators (Python).
Adapter A function. Or Protocol + a thin wrapper class.
Visitor functools.singledispatch for type dispatch; match statement for ADTs.
Command A callable. functools.partial for binding.
Chain of Responsibility Middleware. ASGI/WSGI middleware is exactly this.
State Functions returning functions; or a match over an Enum.
Builder Keyword args + dataclass. Rarely a builder class.
Flyweight sys.intern for strings; __slots__ + class-level constants.
Proxy __getattr__-based forwarding; or weakref.proxy.
Composite A type that contains itself: tree: Node | list[Node].
Memento dataclasses.replace + immutability.
Mediator An event bus or a domain service.

The patterns that survive idiomatically: Strategy (via Protocol), Observer (via async queues), Chain (via middleware), Visitor (via singledispatch / match).

17.3 Architectural Patterns

  • Hexagonal / Ports and Adapters. Domain at the core, adapters at the edge (HTTP, DB, message bus). Test the domain in isolation. Architecture Patterns with Python (Percival/Gregory) is the canonical Python treatment.
  • Repository pattern: abstract persistence behind a Protocol. Tests use an in-memory fake; production uses SQL.
  • Unit of Work: collect domain mutations, commit atomically. Pairs with SQLAlchemy session.
  • CQRS-lite: separate read models from write models when their shapes diverge. Don't over-apply.

17.4 Lab - "Refactor a Junk Drawer"

  1. Take a 1k-LOC script of mixed responsibilities. Extract: domain/, adapters/, service/, entrypoints/. Write Protocols for the seams.
  2. Add a fake repository for tests; the real one talks to SQLite. Run the same test suite against both.
  3. Document, in a docs/architecture.md, why each module exists and what it depends on.

17.5 Idiomatic & Linter Drill

  • import-linter to enforce the dependency direction (entrypoints → service → domain ← adapters).

Week 18 - Data Structures Beyond list/dict

18.1 The Menu

Need Structure
Ordered, indexable, mutable list
Immutable record tuple, NamedTuple, frozen dataclass
Membership / dedup set / frozenset
Key-value dict
Counts collections.Counter
Default-on-miss collections.defaultdict
FIFO / deque collections.deque
Priority queue heapq (no class - operates on list)
Sorted container sortedcontainers.SortedList/SortedDict (third-party but de facto standard)
Disjoint set hand-rolled (~20 lines) or networkx.utils.UnionFind
LRU cache functools.lru_cache for functions; cachetools.LRUCache for objects
Bloom filter pybloom-live or roll your own (Appendix B)
Trie pygtrie or roll your own
Interval tree intervaltree
Graph networkx (development), igraph/graph-tool (scale)
DataFrame pandas (legacy), polars (modern, lazy, multi-threaded)
Tensor numpy (CPU), torch.Tensor (CPU/GPU/autograd)
Sparse vector scipy.sparse, torch.sparse
Vector index faiss, hnswlib, usearch

18.2 Lab - "Right Tool for Right Workload"

  1. A leaderboard with frequent insert + top-K query: implement with list (naive), heapq (better), SortedList (best). Bench at 10k/100k/1M elements.
  2. A rolling-window deduplicator: set (memory-unbounded), Bloom filter (memory-bounded, false positives), cachetools.TTLCache. Pick one with justification.
  3. A nearest-neighbor lookup over 1M 768-dim vectors: brute-force NumPy, hnswlib, faiss. Note recall/latency trade-offs.

18.3 Production Hardening Slice

  • Add a "data-structure decision log" to docs/. Every non-trivial collection choice gets one paragraph: what was rejected and why.

Week 19 - Testing, Property-Based Testing, Mutation Testing, Fakes vs. Mocks

19.1 Conceptual Core

  • The test pyramid still applies: many fast unit tests, some integration, few end-to-end. In Python, the unit tier is so cheap there's almost never an excuse for skipping it.
  • Prefer fakes to mocks. A fake is a working implementation with simpler internals (in-memory repo, fake clock). A mock is a script of expected calls. Mocks couple tests to implementation; fakes don't.

19.2 Mechanical Detail

  • pytest: fixtures, conftest.py, parametrize, marks, pytest.raises, pytest.warns, tmp_path, caplog, capsys. Plugin ecosystem: pytest-asyncio, pytest-cov, pytest-xdist (parallel), pytest-benchmark, pytest-mock, pytest-randomly.
  • hypothesis: property-based testing. Strategies, @given, @settings, assume, stateful tests with RuleBasedStateMachine. The single biggest force-multiplier in this curriculum.
  • mutmut / cosmic-ray: mutation testing. Verifies that your tests fail when production code is broken - surfaces vacuous tests.
  • unittest.mock: when you must. Prefer monkeypatch fixture and respx/vcr.py for HTTP.
  • Test doubles taxonomy: dummy, stub, spy, mock, fake. Know the difference.

19.3 Lab - "The Tests Find Bugs You Didn't Know You Had"

  1. Add hypothesis property tests to your week-3 word counter. Watch them find a UTF-8 boundary bug or an empty-input issue.
  2. Add a stateful hypothesis test against your tiny ORM from week 5.
  3. Run mutmut. Identify untested branches.
  4. Replace any Mock you used with a fake implementing a Protocol.

19.4 Idiomatic & Linter Drill

  • Enable ruff PT (pytest style). Refactor assert x == 1; assert y == 2 patterns; prefer one assertion per test.

19.5 Production Hardening Slice

  • CI gate: coverage ≥ 90%, mutmut killed-mutant ratio ≥ 80%, hypothesis profile --ci.

Week 20 - Observability, FastAPI, Production Service Shape

20.1 Conceptual Core

A production Python service has, at minimum: structured logs, metrics, distributed traces, health checks, graceful shutdown, configuration via env, secrets via a vault, and a dependency-injection seam for tests. None are optional.

20.2 Mechanical Detail

  • Logging: logging stdlib, configured once at startup, JSON formatter (python-json-logger or structlog). Attach request_id, user_id, trace_id via ContextVar.
  • Metrics: prometheus_client for pull; OpenTelemetry metrics for push. Histograms over averages - averages lie about tail latency.
  • Traces: opentelemetry-api + opentelemetry-sdk + opentelemetry-instrumentation-fastapi/-httpx/-sqlalchemy. Auto-instrumentation gets you 80% for free.
  • FastAPI: ASGI app, Pydantic-typed request/response, dependencies as Annotated[T, Depends(...)], lifespan context manager for startup/shutdown, BackgroundTasks only for fire-and-forget (use a real queue for durable work).
  • Configuration: pydantic-settings reading .env + env vars. Never hardcode. Never read env vars directly in domain code.
  • Graceful shutdown: SIGTERM → drain in-flight → close DB pools → exit. uvicorn --graceful-timeout.
  • Health: /healthz (liveness, returns 200 if the process is up) vs. /readyz (readiness, returns 200 only if dependencies are reachable). Distinct, not interchangeable.

20.3 Lab - "Production-Shaped Service"

Build a FastAPI service that: 1. Accepts a POST /jobs, persists to SQLite, returns a job ID. 2. Processes jobs in an asyncio.TaskGroup background worker with bounded concurrency. 3. Emits structured JSON logs with trace correlation. 4. Exposes /metrics (Prometheus) and /healthz//readyz. 5. Handles SIGTERM by draining in-flight jobs. 6. Runs under uvicorn with --workers 4 (multi-process). Document why workers > 1 for CPU-light I/O-bound services on stock CPython. 7. Has a docker-compose stack including Prometheus, Grafana, and Jaeger. 8. Has a k6 or locust load test in loadtest/ reproducing the latency SLO.

20.4 Idiomatic & Linter Drill

  • Add ruff LOG rules. Catch logger.info(f"...") (use % formatting for lazy interpolation).

20.5 Production Hardening Slice

  • Deploy to a free-tier cloud (Fly.io, Render, or a Hetzner VM with Caddy). Run for a week, watch the dashboard, write a one-page postmortem of what the dashboard taught you.

Month-5 Exit Criteria

Before starting Month 6:

  1. Translate any GoF pattern to its Pythonic form, or argue it doesn't apply.
  2. Pick the right data structure from the menu without defaulting to dict/list.
  3. Ship a FastAPI service with full observability and graceful shutdown in under a day.
  4. Defend a hexagonal architecture in a code review.

Month 6 - AI Systems at a Senior Level: RAG, Agents, Evals, Training, Serving

Goal: by the end of week 24 you can (a) design a RAG pipeline end-to-end with explicit choices on chunking, embedding, indexing, retrieval, reranking, and prompting, (b) build a tool-using agent with durable execution, observability, and cost ceilings, (c) run an offline + online evaluation harness that catches regressions before users do, and (d) fine-tune, serve, and roll out a small open-weights model behind a FastAPI gateway.

This is the synthesis month. Every prior month feeds in.


Weeks

Week 21 - LLM-App Foundations: Prompts, Tokens, Streaming, Cost

21.1 Conceptual Core

  • An LLM call is an autoregressive generation over a token stream, billed per token, latency-bound by output length, and probabilistic in output. All four facts shape the system around it.
  • Tokens, not characters or words. Costs, context windows, and rate limits are all in tokens. Always budget in tokens. tiktoken or the model's own tokenizer is the source of truth.
  • Streaming is product-critical. Time-to-first-token (TTFT) usually matters more than total time. Design APIs streaming-first; convert to batch only if needed.
  • Caching the prompt prefix is free latency. Anthropic and OpenAI both expose prompt caching; structure prompts so the cacheable prefix is large and stable.

21.2 Mechanical Detail

  • SDKs: anthropic, openai, plus litellm as a normalization layer if you need provider portability. Async clients always - sync clients block your event loop and waste capacity.
  • Streaming: SSE on the wire; in code, an async iterator of events. Render incrementally. Handle mid-stream errors and retries (resume is hard; usually you re-issue from scratch).
  • Structured output: JSON mode, tool use / function calling, or constrained decoding (outlines, instructor, lm-format-enforcer). Pydantic models as the schema.
  • Failure modes: rate limits (429), token limits, content filters, schema-violating outputs, hallucinated tool arguments. Each has a distinct retry strategy.

21.3 Lab - "A Disciplined LLM Client"

  1. Build an LLMClient abstraction over anthropic and openai async SDKs. Methods: generate, stream, with_tools.
  2. Add token accounting: pre-call estimate, post-call actual, running cost meter.
  3. Add caching headers (Anthropic prompt caching). Measure latency delta.
  4. Add structured-output mode using instructor + a Pydantic schema. Test on a deliberately ambiguous prompt; observe schema enforcement.
  5. Add timeout, retry-with-backoff, and circuit breaker (pybreaker or hand-rolled).

21.4 Production Hardening Slice

  • Add per-request trace_id, model, prompt_tokens, completion_tokens, cached_tokens, cost_usd to your structured logs. This is the only way you'll keep cost under control in production.

Week 22 - Retrieval-Augmented Generation: Doing It Properly

22.1 Conceptual Core

RAG fails in seven places, and a senior engineer must know each:

  1. Ingestion: garbage-in (bad PDFs, lost layout, OCR errors).
  2. Chunking: too big → diluted relevance; too small → loss of context. Try semantic / recursive / sentence-window strategies; benchmark them.
  3. Embedding: model choice (text-embedding-3-large, bge-large, nomic-embed, voyage-3), normalization, dimension, multilingual support.
  4. Indexing: HNSW (hnswlib, faiss, usearch), IVF-PQ for scale, keyword (bm25s, tantivy), hybrid (dense + sparse + reranker).
  5. Retrieval: top-K, MMR for diversity, query rewriting / HyDE, query routing.
  6. Reranking: a cross-encoder reranker (bge-reranker, Cohere rerank, Voyage rerank) on the top-50 → top-5. Often the single biggest quality win.
  7. Prompting: how the chunks are presented, citation format, instructions for "don't answer if not in context."

22.2 Mechanical Detail

  • Vector DBs: pgvector (Postgres extension, the boring-and-correct choice), qdrant, weaviate, milvus, chroma (dev), lance/lancedb (good for local), turbopuffer (cheap, serverless).
  • Hybrid search: RRF (reciprocal rank fusion) over dense + BM25.
  • Embedding pipelines with backpressure: don't OOM your provider, batch carefully, retry idempotently.
  • Evals for RAG: retrieval recall@K, answer faithfulness (LLM-as-judge), answer relevance, context precision (ragas, trulens, custom).

22.3 Lab - "End-to-End RAG with Honest Evals"

  1. Pick a corpus (your own docs, a Wikipedia subset, or a publicly available QA dataset). Ingest with at least two chunking strategies.
  2. Stand up pgvector or qdrant. Index with two embedding models.
  3. Implement hybrid retrieval (dense + BM25 + RRF) and add a reranker.
  4. Build a 50-question gold eval set with reference answers. Score with ragas. Iterate retrieval until faithfulness > 0.85.
  5. Plot the impact of each pipeline change in a results table. Resist the urge to tune blindly.

22.4 Production Hardening Slice

  • Add eval-on-CI: every PR runs the gold set against the changed pipeline; regressions block merge.

Week 23 - Agents, Tools, Durable Execution, Cost & Safety

23.1 Conceptual Core

An "agent" is an LLM in a loop over a tool-use protocol with state, exit conditions, and observability. The dangerous failure modes: - Runaway loops: turn caps, cost caps, time caps. All three. - Bad tool inputs: validate aggressively at the tool boundary; treat the LLM as untrusted input. - Silent quality drift: log every step; replay traces in tests. - Permission escalation: an agent with bash is a remote-code-execution surface. Sandbox.

Read Anthropic's Building effective agents once. The taxonomy (workflows vs. agents; chains, routers, parallelization, orchestrator-workers, evaluator-optimizer) is load-bearing.

23.2 Mechanical Detail

  • Frameworks worth knowing (in 2026): pydantic-ai, dspy, instructor, langgraph (for graph-shaped flows), and the "build your own" path. Default to "build your own" until you've felt the pain - most frameworks add accidental complexity.
  • Durable execution: temporal, inngest, or a state-machine table in Postgres. Critical when agents take minutes-to-hours and processes can crash.
  • Tool definitions: Pydantic schemas → JSON Schema → tool spec. Use pydantic-ai or instructor to generate.
  • Sandboxing: e2b, Docker, gVisor, Firecracker. Never exec LLM-generated code on the host.
  • Observability for agents: langfuse, arize phoenix, or roll your own with OpenTelemetry spans-per-step.

23.3 Lab - "An Agent That Doesn't Burn Money"

  1. Build a research agent: takes a question, plans, calls web_search and fetch_url tools, synthesizes an answer with citations.
  2. Add: max-turns=10, max-tokens=200k, max-wall-time=120s, max-cost=$0.50. Verify each cap fires correctly.
  3. Persist agent state (turn-by-turn) to Postgres. Recover after a kill -9.
  4. Write replay tests: feed a saved trace to a test, mock the LLM, assert tool calls happen in the right order.
  5. Add an evaluator-optimizer loop: a critic LLM grades the answer; if score < threshold, revise once.

23.4 Production Hardening Slice

  • Add a "kill switch": a feature flag that immediately disables agent execution. Verify it works via end-to-end test.

Week 24 - Training, Serving, Rollout, and the Capstone Defense

24.1 Conceptual Core

You are unlikely to pretrain a foundation model. You will, repeatedly: (a) fine-tune with LoRA/QLoRA, (b) serve open-weights models, (c) roll out behind a gateway with shadow / canary / staged-percent traffic.

24.2 Mechanical Detail

  • Fine-tuning stack: transformers, peft (LoRA/QLoRA), trl (SFT, DPO), bitsandbytes (4-bit), accelerate (multi-GPU), unsloth (faster LoRA). Datasets via datasets (HuggingFace).
  • Serving stack: vLLM (PagedAttention, the default choice for throughput), TGI, SGLang, llama.cpp/ollama (for tiny / local), Triton Inference Server (when you need the matrix). Quantization: GPTQ, AWQ, GGUF.
  • Gateway shape: FastAPI in front of vLLM. Streaming passthrough. Per-tenant rate limits. Cost accounting per request. Model routing (route cheap queries to small models).
  • Rollout: shadow (mirror traffic, compare), canary (1% → 10% → 50% → 100%), feature flags per cohort. Eval-on-rollout: keep the offline eval running against the canary.
  • Continuous evaluation: a daily replay of N production samples (PII-scrubbed) against the new model. Block promotion on regression.

24.3 Capstone Defense

You picked a track from CAPSTONE_PROJECTS.md at the start of Month 6. You have been building it incrementally. Week 24 is the defense:

  1. Architecture review. Whiteboard the system. Defend each component choice.
  2. Performance review. py-spy flame graph, vLLM throughput numbers, end-to-end p50/p95/p99 latency.
  3. Eval review. Show the eval harness, the regressions caught, the rollout policy.
  4. Cost review. $/request, $/user, projected $/month at 10x scale.
  5. Failure-mode review. What happens on: provider outage, vector-DB down, OOM in worker, agent runaway, prompt injection, tokenizer mismatch.

Pass = you can answer every question without hand-waving.


Month-6 Exit Criteria - and the Senior Bar

A graduate of this curriculum, in a senior-AI-engineer interview loop, should be able to:

  1. Whiteboard a RAG service for 1M docs / 1k QPS in 30 minutes, with explicit pgvector vs. qdrant trade-offs, hybrid retrieval, reranking, eval methodology, and cost projection.
  2. Diagnose a production agent that's burning $1k/hr by reading traces, identifying the runaway loop, and shipping a fix with caps and a kill switch - same day.
  3. Fine-tune a 7B model with LoRA on a domain dataset, evaluate offline, serve with vLLM, and roll out behind a canary in under a week.
  4. Defend the choice not to use Python for a given component - model routing, GPU scheduler, streaming proxy - when Go or Rust is the right answer.

That last bullet is the actual signal of seniority: you have stopped being a Python advocate and started being an engineer.

Appendix A - Production Hardening Toolkit

The hardening template you accumulate over 24 weeks. By the end, this should be a publishable python-project-template repo.


A.1 Project Layout

my-project/
├── pyproject.toml          # PEP 621 metadata, ruff/pyright/pytest config
├── uv.lock                 # uv-managed lockfile
├── src/
│   └── my_project/
│       ├── __init__.py
│       ├── __main__.py
│       ├── domain/         # pure, no I/O
│       ├── adapters/       # DB, HTTP, LLM clients
│       ├── service/        # orchestration
│       └── entrypoints/    # FastAPI, CLI
├── tests/
│   ├── unit/
│   ├── integration/
│   └── property/
├── perf/                   # pytest-benchmark suites
├── loadtest/               # k6 / locust
├── docs/
└── .github/workflows/ci.yml

A.2 Tooling Stack (canonical 2026)

Concern Tool
Build / dep uv (primary), hatch (alt)
Lint + format ruff
Type check pyright (strict), mypy (secondary)
Test pytest, pytest-asyncio, pytest-cov, pytest-xdist, pytest-benchmark, pytest-randomly
Property test hypothesis
Mutation test mutmut
Profile (CPU) py-spy, scalene, pyinstrument
Profile (mem) memray, tracemalloc
Security bandit (via ruff S), pip-audit, safety
Docs mkdocs-material + mkdocstrings
Pre-commit pre-commit
Container distroless or python:3.13-slim, multi-stage with uv pip install --system
Observability structlog, prometheus_client, opentelemetry-*

A.3 The make check Target

check: lint format-check typecheck test
lint:
    ruff check src tests
format-check:
    ruff format --check src tests
typecheck:
    pyright src
test:
    pytest -x --cov=src --cov-report=term-missing
bench:
    pytest perf/ --benchmark-only
load:
    k6 run loadtest/scenario.js

A.4 CI Matrix

  • Python: 3.12, 3.13, 3.13t (free-threaded), 3.14 (when stable).
  • OS: ubuntu-latest, macos-latest.
  • Steps: make check, make bench (non-failing, archived), mutmut run --max-children 4 (weekly cron).

A.5 Profiling Recipes

  • "Why is my service slow?" → py-spy record -o flame.svg -- python -m my_project
  • "Where is my memory going?" → memray run --live python -m my_project
  • "Is the event loop stalling?" → set loop.slow_callback_duration = 0.05; watch logs.
  • "Why is import slow?" → python -X importtime -c "import my_project" 2> import.log
  • "What's the GC doing?" → gc.set_debug(gc.DEBUG_STATS) for an hour in staging.

A.6 Deployment Hardening

  • python -O strips asserts; never rely on asserts for security checks.
  • PYTHONHASHSEED=random is default in 3.3+; do not unset.
  • PYTHONFAULTHANDLER=1 for crash tracebacks on segfault from C extensions.
  • PYTHONMALLOC=malloc if running under valgrind.
  • Drop privileges (gosu, setuid) before exec'ing the Python process.
  • Distroless or slim base; pin via SHA, not tag.
  • One worker per CPU for CPU-light I/O-bound on stock CPython; one process for free-threaded once stable.

Appendix B - Build-From-Scratch Data Structures and Patterns

A working Python engineer should have implemented each of the following at least once, with pyright-clean types, pytest + hypothesis tests, and a pytest-benchmark micro-bench. This appendix sketches the minimal-viable design.


B.1 LRU Cache (with TTL)

When: function memoization, decoded-payload caches, embedding caches.

Design: - OrderedDict from collections. move_to_end on hit; popitem(last=False) on evict. - Optional TTL via (value, expiry_monotonic) tuples; lazy expiration on access. - Concurrent variant: a threading.Lock (or asyncio.Lock for async) around mutations.

Lab: compare to functools.lru_cache and cachetools.TTLCache. Bench miss/hit costs.


B.2 Trie (Prefix Tree)

When: autocomplete, IP routing tables, tokenizer prefix lookup, dictionary spell-check.

Design: - Node = dict[str, Node] + is_end: bool (+ optional payload). - Insert/lookup O(len(key)); prefix iteration O(prefix + matches).

Lab: implement add, contains, iter_prefix. Bench against set for membership and against linear scan for prefix queries.


B.3 Bloom Filter

When: dedup at scale, "definitely-not-seen" checks before expensive lookups.

Design: - bitarray of size m; k hash functions derived from one (mmh3 or hashlib) via double hashing. - Sized for target false-positive rate p over expected n: m = -n*ln(p)/(ln(2)^2), k = m/n * ln(2).

Lab: empirically verify FP rate matches predicted. Compare memory to set for 10M items.


B.4 SPSC Ring Buffer (asyncio)

When: backpressure between a producer task and a consumer task, fixed-memory pipelines.

Design: - list[T | None] of capacity N (power of two). - head/tail integers; full when tail - head == N; empty when equal. - asyncio.Event for "not full" and "not empty"; set() on transition.

Lab: compare to asyncio.Queue(maxsize=N). The stdlib version is fine; build this once to internalize the contract.


B.5 Bounded Concurrent Map

When: caches with tight memory budgets, multi-writer state.

Design: - N shards of (threading.RLock, dict). Hash key, mod N, lock the shard. - Eviction: per-shard LRU list.

Lab: compare to a single-lock dict and to a CAS-y "lock-free" attempt. The simple sharded design wins almost always.


B.6 Vector Index (Brute-Force, Then HNSW Wrapper)

When: nearest-neighbor over embeddings.

Design - brute force: - np.ndarray of shape (N, D), L2-normalized. - Query: (N, D) @ (D,) dot product, np.argpartition for top-K.

Design - HNSW wrapper: - Wrap hnswlib.Index with a typed Pythonic API. - Persist with index.save_index(path).

Lab: Build both. Verify recall@10 vs. brute-force ground truth. Plot recall/QPS trade-off.


B.7 Token Bucket Rate Limiter (asyncio)

When: client-side rate limiting against an LLM API.

Design: - tokens: float, last_refill: float (monotonic). - On acquire(n): refill (now - last_refill) * rate, cap at capacity. If tokens >= n, deduct and return; else await asyncio.sleep for the deficit.

Lab: ensure bursts don't exceed capacity; ensure long-run average matches rate. Compare to aiolimiter.


B.8 Circuit Breaker

When: protecting downstream services from cascading failure.

Design: - States: CLOSED (normal), OPEN (fail fast), HALF_OPEN (probe). - Counters: consecutive failures threshold, reset timeout. - On call: if OPEN, raise immediately; if HALF_OPEN, allow one probe.

Lab: integrate with the LLMClient from week 21. Verify behavior under simulated 503 storms.


B.9 Async Worker Pool with Backpressure

When: ingestion pipelines, batch-embedding workloads.

Design: - Bounded asyncio.Queue, N worker tasks consuming, producer awaits put. - TaskGroup owns the workers; cancellation cleanly drains. - Each worker: try/except around the unit of work; metrics per outcome.

Lab: process 1M items at a controlled QPS without OOM. Tune N and queue size.


B.10 Domain Patterns Worth Building

  • Repository + UnitOfWork over SQLite, with an in-memory fake for tests.
  • Result type (Ok[T] | Err[E]) for code where exceptions obscure flow. Don't over-apply - Python has exceptions for a reason.
  • Saga / state machine for multi-step durable workflows. State table in Postgres + idempotency keys.
  • Outbox pattern for reliable event publishing alongside DB writes.

Appendix C - Deep-Dive Session: CPython Internals and the AI Runtime Stack

This is the single-sit deep dive the curriculum promises. Schedule a full day (8 hours, with breaks) at the end of Month 3 (after the runtime chapter) and re-read it at the end of Month 6. The goal: see clearly through every layer between print("hello") and the silicon.

The format is six "stations." Each station has: what to read, what to run, what you should be able to explain afterward.


Station 1 - From python foo.py to a Frame on the Stack (90 min)

Read: - Python/pythonrun.c::_PyRun_SimpleFileObject - the entry path. - Python/compile.c (skim) - AST → bytecode. - Include/internal/pycore_frame.h - frame layout in 3.11+. - Python/ceval.c::_PyEval_EvalFrameDefault - the interpreter loop.

Run:

python -c "
import dis, sys
def f(x): return x*x + 1
dis.dis(f, adaptive=True)
print(f.__code__.co_consts, f.__code__.co_names, f.__code__.co_varnames)
"

Explain afterwards: - Why LOAD_FAST is faster than LOAD_GLOBAL. - What "specialization" means in PEP 659 and how to observe it. - The lifecycle of a frame object - when it's allocated, when it's freed, why exception tracebacks pin frames.


Station 2 - Memory: Refcount, Cyclic GC, pymalloc, Arenas (75 min)

Read: - Include/object.h - PyObject header. - Objects/obmalloc.c - the small-object allocator. - Modules/gcmodule.c / Python/gc.c - the cyclic GC.

Run:

import sys, gc, tracemalloc
tracemalloc.start()
xs = [object() for _ in range(10_000)]
print(sys.getsizeof(xs), sum(sys.getsizeof(x) for x in xs))
print(gc.get_count(), gc.get_threshold())

Explain: - Why del xs deterministically frees memory but gc.collect() is needed for cycles. - Why __slots__ saves ~40% per instance. - The interaction between refcounts and the free-threaded build's "biased reference counting."


Station 3 - The GIL and Its Successors (60 min)

Read: - Python/ceval_gil.c - the GIL implementation. - PEP 703 (no-GIL) and PEP 684 (per-interpreter GIL). - The "biased reference counting" paper (Choi et al.).

Run:

# stock
python -c "import threading, time; ..."
# free-threaded (3.13+)
python3.13t -c "..."

Reproduce the prime-counting benchmark from Month 4, Week 11.

Explain: - Why a NumPy-heavy ThreadPoolExecutor scales on stock CPython. - What changes for pure-Python code under python3.13t. - When subinterpreters beat both threads and processes.


Station 4 - asyncio Internals (75 min)

Read: - Lib/asyncio/base_events.py - the loop's run_forever. - Lib/asyncio/tasks.py - Task machinery, cancellation. - Modules/_asynciomodule.c - the C accelerator (Tasks, Futures). - The selectors module - epoll/kqueue glue.

Run:

import asyncio, sys
async def main():
    loop = asyncio.get_running_loop()
    loop.slow_callback_duration = 0.05
    # deliberately stall
    import time; time.sleep(0.2)
asyncio.run(main())

Watch the warning fire.

Explain: - The exact path from await coro to a callback scheduled on the loop. - How Task cancellation delivers CancelledError precisely at the next await. - Why uvloop is faster (libuv, C event loop, fewer Python frames per I/O).


Station 5 - NumPy and the Buffer Protocol (60 min)

Read: - PEP 3118 - buffer protocol. - NumPy's numpy/core/src/multiarray/arrayobject.c (skim). - The strides/shape/dtype model: NumPy User Guide → Internals.

Run:

import numpy as np
a = np.arange(12).reshape(3, 4)
print(a.strides, a.flags['C_CONTIGUOUS'])
b = a.T
print(b.strides, b.flags['C_CONTIGUOUS'])
mv = memoryview(a)
print(mv.format, mv.itemsize, mv.shape, mv.strides)

Explain: - Why a transpose is O(1) - it changes strides, not data. - Why a.T.copy() is sometimes necessary before passing to a C library. - How the buffer protocol lets bytes, array.array, numpy.ndarray, and torch.Tensor share memory without copies.


Station 6 - PyTorch, Autograd, CUDA Streams (90 min)

Read: - PyTorch internals (Edward Yang's blog). - torch.autograd overview docs; the Function / Variable machinery. - vLLM PagedAttention paper (sets up serving questions in Month 6).

Run:

import torch
x = torch.randn(4, 4, requires_grad=True)
y = (x * x).sum()
y.backward()
print(x.grad)
print(torch.cuda.is_available(), torch.cuda.current_stream() if torch.cuda.is_available() else None)

Explain: - The autograd tape: forward builds the graph, backward walks it. - Why .detach() and with torch.no_grad(): matter for inference latency. - CPU↔GPU synchronization: when .item() blocks, why torch.cuda.synchronize() exists. - How vLLM's PagedAttention reduces KV-cache fragmentation and why that translates directly to throughput.


Synthesis: The Mental Model

After this deep dive, hold this picture:

   your code ──► AST ──► bytecode ──► eval loop (specializing) ──► C function ──► syscall / GPU kernel
        │              │                    │                          │                    │
        └─ ruff/pyright └─ dis              └─ py-spy                   └─ py-spy --native   └─ nsys / nvprof
                                            └─ GIL / free-threaded                          
                                            └─ refcount / GC

Every performance question maps to one of those columns. Every correctness question maps to a boundary between two of them. That is what "senior" looks like.

Capstone Projects - Three Tracks

Pick one in week 21 and build incrementally through Month 6. Defend in week 24.

Every track must, by week 24, exhibit: - pyright --strict clean. - ruff clean with the curriculum's full rule set. - pytest with ≥85% coverage and a hypothesis test suite. - Structured logs, Prometheus metrics, OpenTelemetry traces, /healthz+/readyz. - Containerized, deployed somewhere reachable, with a load test and a postmortem doc. - A docs/architecture.md that another senior engineer could read in 30 minutes.


Track 1 - Production RAG Service

Pitch: a multi-tenant retrieval-augmented generation service over a 100k–1M-document corpus with hybrid search, reranking, streaming responses, and an honest eval harness.

Must-have: - Ingestion pipeline: PDF/HTML/Markdown → chunks → embeddings → pgvector (or qdrant). - Retrieval: dense + BM25 + RRF, then a cross-encoder reranker. - Streaming SSE answers with citations linking back to source chunks. - Per-tenant isolation (row-level filters, separate collections, or both). - Eval harness (ragas or custom): faithfulness, answer relevance, context precision, retrieval recall@K. CI gate on regressions. - Cost accounting per request; per-tenant rate limits; cache (prompt prefix + retrieval result).

Stretch: - Query rewriting (HyDE) and routing (small queries → small model). - Multimodal: support image-bearing PDFs via VLM-extracted captions. - Continuous learning: a feedback loop that promotes/demotes chunks based on user signal.


Track 2 - Agent Orchestration Platform

Pitch: a platform for running tool-using LLM agents reliably - with durable execution, observability, cost ceilings, and a permissions model.

Must-have: - Agent definitions as Pydantic schemas: tools, system prompt, model, caps (turns, tokens, cost, wall-time). - Durable execution: state machine in Postgres; recover after process kill. - Tool sandbox: at minimum, an e2b-or-Docker-isolated bash tool with allowlist. - Permissions model: per-agent, per-tenant tool access. Audit log. - Observability: per-step spans, full trace replay in tests. - Kill switch: a feature flag that immediately halts execution. End-to-end test for it. - Replay testing: saved traces become regression tests.

Stretch: - Multi-agent orchestration (orchestrator + workers). - Evaluator-optimizer loops with automated prompt revision. - A small UI (Streamlit or Next.js) for inspecting runs.


Track 3 - Training & Serving Pipeline

Pitch: fine-tune a small open-weights model with LoRA, evaluate it, serve it with vLLM behind a FastAPI gateway, with autoscaling and continuous eval.

Must-have: - Dataset prep: HuggingFace datasets, schema validation with Pydantic, dedup, deterministic train/val/test split with hash-based assignment. - LoRA fine-tune (peft + trl) on a 7B–8B base. Document VRAM math. - Offline eval: at minimum, a held-out set with task-appropriate metrics; ideally lm-eval-harness on relevant subsets. - Serve: vLLM behind FastAPI gateway. Streaming, batching, structured output. - Routing: cheap queries → small model; complex → large; A/B harness. - Continuous eval: daily replay of N production samples (PII-scrubbed) against the new checkpoint; block promotion on regression. - Rollout: shadow → canary 1% → 10% → 50% → 100% via feature flag.

Stretch: - DPO / KTO post-training on preference data. - Quantization (GPTQ/AWQ) and a serving comparison. - Multi-GPU serving with tensor parallelism.


Defense (Week 24)

Each track defends the same five reviews:

  1. Architecture review - whiteboard, defend each component.
  2. Performance review - flame graphs, throughput, p50/p95/p99.
  3. Eval review - harness, regressions caught, rollout policy.
  4. Cost review - $/request, $/user, projected $/month at 10x.
  5. Failure-mode review - provider outage, vector DB down, OOM, runaway agent, prompt injection, tokenizer mismatch.

The bar: every question gets a substantive answer without hand-waving. That is the senior signal.