Python Mastery¶
CPython internals, performance, concurrency, AI runtimes.
Printing this page
Use your browser's Print → Save as PDF. The print stylesheet hides navigation, comments, and other site chrome; pages break cleanly at section boundaries; advanced content stays included regardless of beginner-mode state.
Python Mastery Blueprint - A 24-Week Beginner-to-Senior Syllabus (AI-Systems Track)¶
Authoring lens: Senior Staff AI/Platform Engineer. Target outcome: A graduate of this curriculum should be capable of (a) writing, reviewing, and shipping idiomatic, production-grade Python at a senior level, (b) reasoning about CPython internals well enough to debug GIL contention, allocator pathologies, and asyncio event-loop stalls in production, and (c) designing AI systems end-to-end - RAG services, agent orchestration platforms, training/inference pipelines - with clear opinions on the trade-offs.
This curriculum is not "Learn Python in 24 hours stretched to 24 weeks." It assumes the reader can write working code in some language. The premise: most Python performance and correctness bugs at scale are not language bugs - they are interpreter, GIL, allocator, and event-loop bugs in disguise, layered on top of glue-language assumptions about NumPy, PyTorch, and CUDA. This curriculum surfaces all of them.
Repository Layout¶
| File | Purpose |
|---|---|
00_PRELUDE_AND_PHILOSOPHY.md |
The "Python-ness" of Python; the data model; the cost model; the reading list. |
01_MONTH_FOUNDATIONS.md |
Weeks 1–4. Syntax, the data model (__dunder__), control flow, idioms, packaging basics. |
02_MONTH_INTERMEDIATE_IDIOMS.md |
Weeks 5–8. Iterators, generators, decorators, context managers, dataclasses, the type system. |
03_MONTH_RUNTIME_AND_PERFORMANCE.md |
Weeks 9–12. CPython internals: bytecode, eval loop, refcounting, GC, allocator, GIL, dis, sys, tracemalloc. |
04_MONTH_CONCURRENCY_AND_PARALLELISM.md |
Weeks 13–16. threading, multiprocessing, asyncio, concurrent.futures, free-threaded 3.13+, subinterpreters, native extensions. |
05_MONTH_PATTERNS_AND_ARCHITECTURE.md |
Weeks 17–20. Pythonic design patterns, data structures, packaging, testing, observability, FastAPI/Pydantic. |
06_MONTH_AI_SYSTEMS_SENIOR.md |
Weeks 21–24. LLM-app architecture, RAG, agents, evals, training/serving, distributed inference, capstone. |
APPENDIX_A_PRODUCTION_HARDENING.md |
ruff, mypy/pyright, pytest/hypothesis, profilers (py-spy, scalene, memray), packaging with uv/hatch. |
APPENDIX_B_DATA_STRUCTURES_AND_PATTERNS.md |
Build-from-scratch reference: LRU, trie, bloom filter, ring buffer, async queue, vector index. |
APPENDIX_C_DEEP_DIVE_CPYTHON_AND_AI_RUNTIMES.md |
The deep-dive session: CPython eval loop, asyncio internals, NumPy strides/buffer protocol, PyTorch autograd, CUDA streams. |
CAPSTONE_PROJECTS.md |
Three terminal projects: production RAG service, agent orchestration platform, training/serving pipeline. |
How Each Week Is Structured¶
Every weekly module follows the same five-section format so the reader can budget time:
- Conceptual Core - the why, with a mental model.
- Mechanical Detail - the how, down to CPython source where relevant (
Python/ceval.c,Objects/dictobject.c,Modules/_asynciomodule.c, etc.) or to the relevant PEP. - Lab - a hands-on exercise that cannot be completed without internalizing the concept.
- Idiomatic & Linter Drill - read 2–3
ruff/pyrightrules, refactor a sample to silence them, understand why each rule exists. - Production Hardening Slice - a profiling, typing, or testing micro-task that compounds into a publishable hardening template by week 24.
Each week is sized for ~12–16 focused hours. Skip the labs at your peril; the labs are the curriculum.
Progression Strategy¶
The phases form a dependency DAG, not a linear track:
Foundations ──► Intermediate Idioms ──► Runtime & Perf ──► Concurrency & Parallelism
│ │ │ │
└──────────────────┴───────────┬──────────┴────────────────────────┘
▼
Patterns & Architecture
│
▼
AI Systems & Senior Design
│
▼
Capstone Defense
The Production Hardening slice is intentionally orthogonal - it accumulates a hardening/ template that, by week 24, is a publishable Python project starter (uv-managed, ruff+pyright-clean, pytest+hypothesis, structured logging, OpenTelemetry, Dockerfile, CI).
Non-Goals¶
- This curriculum does not teach data analysis as a primary subject. Pandas/Polars appear only as tools in service of AI pipelines.
- Web-framework breadth is out of scope. We pick FastAPI + Pydantic v2 and go deep; Django/Flask appear only as comparison points.
- "Why Python is better than X" advocacy is explicitly avoided. The reader should finish the program able to argue against using Python when it is the wrong tool (CPU-bound numeric kernels without NumPy/Cython, hard-real-time, mobile, anything where 200ms cold-start matters).
Capstone Tracks (pick one in Month 6)¶
- Production RAG Service - multi-tenant retrieval-augmented generation with hybrid search, reranking, streaming responses, evals, and a staged rollout harness.
- Agent Orchestration Platform - tool-using LLM agents with durable execution, retries, observability, cost ceilings, and a permissions model.
- Training/Serving Pipeline - fine-tune a small open model (LoRA), serve with vLLM or TGI behind a FastAPI gateway, with autoscaling, batching, and continuous evaluation.
Details in CAPSTONE_PROJECTS.md.
Versioning Note¶
This curriculum targets Python 3.13+ as the baseline (PEP 703 free-threaded build available, PEP 684 per-interpreter GIL stable, PEP 669 low-impact monitoring, faster CPython work from 3.11–3.13 fully landed, typing module modernized, match statements stable since 3.10, tomllib since 3.11). Where 3.14 features matter, they are flagged inline. Do not start this curriculum on a Python older than 3.12 - too many of the modern idioms and the new typing semantics will be unavailable.
Senior-Level Exit Criteria¶
By week 24, the graduate should be able to, in a design review:
- Argue from CPython memory layout why a hot path allocates and how to fix it (
__slots__, NumPy arrays, Cython, struct-of-arrays). - Diagnose GIL contention vs. I/O blocking vs. event-loop stalls from a single
py-spy dumpwithout re-running the program. - Design a RAG pipeline with explicit choices on chunking, embedding model, index type, reranker, and eval methodology - and defend each choice against an alternative.
- Choose between threads, processes, asyncio, free-threaded, and subinterpreters with a one-paragraph justification per choice.
- Run a fine-tune, evaluate it offline and online, and ship it behind a gradual rollout with cost and quality guardrails.
Prelude - The Philosophy Behind the Syllabus¶
Sit with this document for an evening before week 1. The rest of the curriculum is mechanically dense; this is the only chapter where we step back and define the shape of the discipline.
1. Python Is a Glue Language Riding on a Reference-Counted VM¶
The most damaging misconception a Python engineer can hold is that "Python is a slow scripting language with libraries." A working senior practitioner thinks the inverse:
Python is a glue language - a small, dynamically typed surface - bolted to a reference-counted bytecode VM (CPython) whose superpower is calling into native code (C, C++, Rust, Fortran, CUDA) without paying for a heavyweight FFI. That is why Python won data, ML, and AI: not because Python is fast, but because it makes fast things addressable from a REPL.
Almost every interesting performance question in production Python reduces to "does this loop stay in C, or does it cross back into Python bytecode?" Almost every elegant high-throughput Python architecture is a thin layer over numpy, torch, polars, asyncio, uvloop, or a C extension - with Python orchestrating, not computing.
Internalize this and the rest of the curriculum makes sense.
2. The Five-Axis Cost Model¶
A working senior Python engineer reasons about every line of code along five axes simultaneously:
| Axis | Question to ask |
|---|---|
| Allocation & object overhead | Does this create Python objects in a hot loop? Could it stay as a NumPy/torch array, a bytes, or a memoryview? |
| Bytecode boundaries | How many trips through the eval loop does this take? Can it be vectorized, pushed into C, or JITed (PyPy / Numba / Cython)? |
| Concurrency model | Is this CPU-bound (→ processes / free-threaded / native release-the-GIL) or I/O-bound (→ asyncio / threads)? |
| Type integrity | Will pyright --strict accept this? Are runtime contracts (Pydantic, attrs validators) enforced at the right boundary? |
| Failure | What happens on KeyboardInterrupt? On asyncio.CancelledError? On a partially consumed generator that holds a file handle? On an OOM in a forked worker? |
Beginner courses teach axis 1 only (and incompletely). This curriculum forces all five into your hands by week 12.
3. The "Pythonic Way" - Aesthetic as Engineering Constraint¶
Python's design ethic, captured in import this, is "explicit, simple, readable." That phrase is doing more work than newcomers think. Specifically:
- Duck typing, then static typing. Protocols and structural typing (
typing.Protocol) win over nominal hierarchies. Inheritance is fine, deep inheritance is not. - EAFP, not LBYL. "Easier to ask forgiveness than permission" -
try/exceptis idiomatic,if hasattr(...)is usually a smell. - Comprehensions, generators, iterators. A
forloop that builds a list with.appendin idiomatic Python is almost always a comprehension or a generator expression in disguise. - The stdlib is enormous and underused.
itertools,functools,collections,dataclasses,contextlib,pathlib,concurrent.futures,asyncio,logging,argparse,sqlite3,unittest.mock,typing- these cover ~70% of any service. Reach for third-party only when stdlib runs out, and know when it does. - Tooling is opinionated.
ruff(lint+format),pyright/mypy(types),pytest(test),uvorhatch(build/dep),py-spy/scalene/memray(profile). A Python engineer who does not know these is half-trained.
If you fight these defaults, you will write Java in Python. If you internalize them, your code will look like the stdlib - which is the actual deliverable Python optimizes for.
4. The Reading List¶
These are referenced throughout the curriculum. You are not expected to read them cover-to-cover before starting; they are pinned tabs.
Primary - Fluent Python, 2nd ed. (Luciano Ramalho). The canonical text. Read chapters 1–6 in Month 1, 14–21 in Month 2, the rest as referenced. - Effective Python, 3rd ed. (Brett Slatkin). The single best companion to Fluent Python. - High Performance Python, 2nd ed. (Gorelick & Ozsvald). Read in Month 3 alongside the runtime chapter. - Architecture Patterns with Python (Percival & Gregory). Read in Month 5 alongside the patterns chapter.
Runtime & internals
- The CPython source itself - treat as primary literature, not reference:
- Python/ceval.c (the eval loop)
- Objects/object.c, Objects/typeobject.c, Objects/dictobject.c, Objects/listobject.c, Objects/longobject.c
- Python/gc.c (the cyclic GC)
- Modules/_asynciomodule.c (the C accelerator for asyncio)
- Include/internal/pycore_*.h (interpreter state, frame layout)
- Brandt Bucher's "Python 3.11 specializing adaptive interpreter" talk and the PEP 659 text.
- Anthony Shaw, CPython Internals (Real Python). The most accessible treatment.
- PEPs that are mandatory reading (curriculum points to each at the right moment): 8, 20, 257, 318, 343, 380, 484, 492, 525, 530, 544, 557, 585, 593, 612, 634, 646, 654, 657, 659, 669, 684, 692, 695, 703, 709.
AI systems canon (not Python-specific, but mandatory by Month 6) - Lewis et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP. The original RAG paper. - Sumers et al., Cognitive Architectures for Language Agents (CoALA). - Designing Machine Learning Systems (Chip Huyen). Especially chapters 7–10. - AI Engineering (Chip Huyen, 2024). The most current treatment of LLM-app design. - The vLLM paper (Efficient Memory Management for LLM Serving with PagedAttention). - Anthropic's Building effective agents and OpenAI's A practical guide to building agents.
Adjacent canon - Drepper, What Every Programmer Should Know About Memory. Re-read in week 9. - Kleppmann, Designing Data-Intensive Applications. Read chapters 5–9 in Month 5.
5. Curriculum Philosophy: "Read the Source, Ship the Lab"¶
Three rules govern every module:
- Source first, blog second. When the curriculum says "study how dict resolves a key," it means open
Objects/dictobject.cand readlookdict_unicode_nodummy. Blogs go stale; CPython commits are dated. - One lab per concept, one artifact per phase. By the end of each month, the reader has produced one open-source-quality artifact (library, gist, or blog post) - not a notebook of toy snippets.
py-spy,pytest -x, andpyright --strictare the teachers. When you do not understand why a program misbehaves, the first response ispy-spy dump --pid <pid>, the second is a failing pytest withhypothesis, and only the third is to ask another human.
6. What Python Is Not For¶
A graduate of this curriculum should be able to argue these points in a design review without sounding ideological:
- Tight CPU-bound loops without a vectorized library. The interpreter overhead is real. Either vectorize, drop to Cython/Rust/C, or use Numba/PyPy.
- Hard-real-time systems. GC pauses are short but non-zero, refcount drops can cascade, and the GIL adds tail-latency variance. Wrong tool.
- Mobile, sandboxed, or aggressively cold-started serverless. A Python interpreter + numpy + torch is a 1+ GB image and a 1+ second cold start. Choose Go, Rust, or a pre-warmed runtime.
- Code where the team will not adopt typing. Untyped Python over ~5k lines becomes archaeology. A team that resists
pyright --strictwill fight Python at scale forever.
The signal that Python is the right tool: you have a glue, AI/data, or developer-velocity constraint that ranks above raw single-thread CPU efficiency.
7. A Note on AI-Assisted Workflows¶
Modern Python authors use LLM tooling. Three rules:
- Never accept generated async code without reading it. The most common failure mode of generated Python is "looks async, blocks the event loop" -
time.sleepinstead ofasyncio.sleep, syncrequestsinsideasync def, blocking file I/O withoutrun_in_executor. - Verify generated type annotations. Models hallucinate
from typing importpaths and confuselist[int](3.9+) withList[int]. Always runpyright. - Treat suggested context-handling skeptically. Generators that hold file handles,
async withmismatches, and unclosedhttpx.AsyncClientinstances are endemic in generated code. Usepytest --tb=shortplustracemallocto catch leaks.
You are now ready for Week 1. Open 01_MONTH_FOUNDATIONS.md.
Month 1 - Foundations: The Data Model, Idioms, and Packaging¶
Goal: by the end of week 4 you can (a) explain the Python data model in terms of __dunder__ protocols and the type / object relationship, (b) write idiomatic comprehensions, generators, and iterators without reaching for indices, (c) wield try/except/else/finally and context managers correctly, and (d) ship a Python project as a uv-managed, ruff+pyright-clean, pytest-tested package installable via pipx.
This month is the only month aimed at the beginner. After week 4 the curriculum assumes fluency.
Weeks¶
- Week 1 - Syntax, Values, Names, and the Data Model
- Week 2 - Control Flow, Functions, Errors, and the Call Model
- Week 3 - Collections, Comprehensions, Iterators, and Generators
- Week 4 - Modules, Packaging, Virtual Environments, and the Import System
Week 1 - Syntax, Values, Names, and the Data Model¶
1.1 Conceptual Core¶
- Everything is an object, including types.
type(int) is typeandtype(type) is type. Functions, modules, classes, exceptions, evenNone- all objects with attributes, addressable by name. - Names are not variables. A "variable" in Python is a binding from a name (in a namespace dict) to an object. Assignment never copies; it rebinds. Function arguments are pass-by-binding (often confusingly called "pass by object reference").
- The data model is a protocol catalog. Every operator and built-in (
len,iter,+,[],with,async with,repr) dispatches to a__dunder__method. Mastering the data model is mastering the language.
1.2 Mechanical Detail¶
- Built-in types you must internalize:
int(arbitrary precision),float(IEEE 754 double),bool(subclass ofint),str(Unicode, immutable),bytes(immutable),bytearray(mutable),list,tuple,dict(insertion-ordered since 3.7),set,frozenset,None,Ellipsis,NotImplemented. - Mutability vs. hashability: hashable ⇔
__hash__defined and stable ⇔ usable as dict key / set element. Mutable built-ins (list,dict,set) are unhashable on purpose. - Identity vs. equality:
ischecks identity (id(a) == id(b));==checks__eq__. Small int caching (-5..256) and string interning makeiscomparisons accidentally work - which is precisely why you must never useisfor value comparison except againstNone,True,False. - Truthiness:
__bool__then__len__thenTrue. Falsy:0,0.0,"",b"",[],(),{},set(),None,False. - f-strings (3.6+), debug f-strings (
f"{x=}", 3.8+),=-format spec, lazy logging (logger.info("x=%s", x)notlogger.info(f"x={x}")- the latter formats even at suppressed levels).
1.3 Lab - "The REPL Audit"¶
- In an interactive session, evaluate:
a = [1,2,3]; b = a; b.append(4); print(a). Explain in writing. x = 256; y = 256; x is y→ True.x = 257; y = 257; x is y→ may be False. Explain.- Write a class
Moneywith__init__,__repr__,__eq__,__hash__,__lt__. Verify it sorts and deduplicates in aset. Addfunctools.total_ordering; observe what disappears. - Write a class
Vector2with__add__,__sub__,__mul__(scalar),__rmul__,__abs__,__iter__,__len__. Verify2 * vworks andlist(v)works.
1.4 Idiomatic & Linter Drill¶
- Install
ruff. Configure with rule setsE,F,W,I,B,UP,SIM,RUF,PL. Run on a sample file; read each finding's URL. - Read PEP 8 once. Read PEP 20 (
import this) and pin it.
1.5 Production Hardening Slice¶
- Initialize a project with
uv init. Addruffandpyrightas dev deps. Addpyproject.tomlwith[tool.ruff]and[tool.pyright]strict configurations. Add aMakefiletargetmake checkthat runsruff check,ruff format --check,pyright,pytest. This is the baseline; every subsequent week extends it.
Week 2 - Control Flow, Functions, Errors, and the Call Model¶
2.1 Conceptual Core¶
- Functions are first-class objects with attributes (
__name__,__doc__,__annotations__,__defaults__,__closure__,__code__). - Default arguments are evaluated once, at definition time. The single most common Python footgun:
def f(x, acc=[]):-accis shared across calls. UseNonesentinel + body-side default. - EAFP over LBYL.
try: d[k]overif k in d: d[k]. The exception path is fast in CPython when not raised, and the LBYL form races under threads.
2.2 Mechanical Detail¶
- Argument passing: positional, keyword,
*args,**kwargs, positional-only (/, PEP 570), keyword-only (*). Know the order:def f(pos1, pos2, /, both, *, kw_only). - Closures and the
nonlocalkeyword. Late binding in closures ([lambda: i for i in range(3)]returns three lambdas all returning2); fix with default-arg tricklambda i=i: ior with comprehension scoping. - The exception hierarchy:
BaseException→Exception→ everything user-catchable.KeyboardInterruptandSystemExitare siblings ofException, not subclasses -except Exceptiondoes not catch them, by design. - Exception chaining:
raise NewError("...") from cause, implicit chaining via__context__. Exception groups (PEP 654,except*) for concurrent code. try / except / else / finally: theelseruns only if no exception;finallyalways runs, even onreturn.
2.3 Lab - "The Calculator and the Cancel"¶
- Build a tiny expression evaluator over
+ - * /usingast.parse+ a customNodeVisitor. Reject anything else. (Do not useeval.) - Add a
--replmode. MakeCtrl-Cinterrupt the current expression but not exit. MakeCtrl-Dexit cleanly. - Wrap division; raise a custom
EvalErrorchained fromZeroDivisionErrorviafrom. - Add a
--time-budgetflag usingsignal.SIGALRM(POSIX) or a watchdog thread (cross-platform). Document the trade-off.
2.4 Idiomatic & Linter Drill¶
- Enable
ruffrule setTRY(try/except hygiene). Refactor a sample with broadexcept:and bareraiseinto PEP-compliant code.
2.5 Production Hardening Slice¶
- Add
pytestwithpytest-cov. Write tests for the calculator. Aim 100% line and branch coverage for the AST evaluator. Commit a coverage badge target.
Week 3 - Collections, Comprehensions, Iterators, and Generators¶
3.1 Conceptual Core¶
- The four collection workhorses:
list(dynamic array),tuple(immutable record),dict(open-addressed hash table, insertion-ordered),set(hash table). Internalize O() costs: listappendamortized O(1),insert(0, x)O(n); dictinO(1) avg; listinO(n). - An iterator is a stateful cursor; an iterable is anything you can call
iter()on.for x in xs:desugars toit = iter(xs); while True: try: x = next(it) except StopIteration: break. - Generators are coroutines that yield values, not just iterators. They preserve local state across
yield; they accept values via.send(); they handle exceptions via.throw(); they clean up via.close().
3.2 Mechanical Detail¶
collections.deque(O(1) both ends),collections.Counter,collections.defaultdict,collections.ChainMap,collections.OrderedDict(now mostly redundant; still useful formove_to_end),collections.namedtuple(legacy; preferdataclass(slots=True, frozen=True)ortyping.NamedTuple).itertoolsmastery:chain,islice,takewhile,dropwhile,groupby(note: requires sorted input),tee,product,permutations,combinations,accumulate,pairwise(3.10+),batched(3.12+).- Comprehensions in all four flavors: list, set, dict, generator. Generator expression vs. list comprehension: when the consumer is
sum,any,all,min,max, or anything that doesn't need a materialized list, prefer the genexp - same syntax minus the brackets. yield from(PEP 380): delegate to a sub-generator, propagate sends/throws.- The buffer protocol primer:
memoryviewoverbytes/bytearray/array.array/numpy.ndarraylets you slice without copying. Critical in the AI months.
3.3 Lab - "Streaming Word Count"¶
- Implement
wc -wover arbitrarily large files using a generator pipeline: file → lines → words → counts. Constant memory regardless of file size. - Add a
--top Kflag usingheapq.nlargest. Note that you must materialize the counter - discuss why. - Replace your hand-rolled tokenizer with
re.finditerand benchmark. Then benchmark astr.split()version. Explain the difference. - Add a
--parallel Nflag usingconcurrent.futures.ProcessPoolExecutoranditertools.batched. (We will revisit in Month 4.)
3.4 Idiomatic & Linter Drill¶
- Enable
ruffC4(comprehensions) andPERFrules. Refactorfor-with-appendpatterns into comprehensions. Identify cases where the comprehension is less readable and document them.
3.5 Production Hardening Slice¶
- Add
hypothesisas a dev dep. Write property tests for your word counter: invariants like "total = sum of counts," "tokens are non-empty," "shuffling input lines doesn't change the count."
Week 4 - Modules, Packaging, Virtual Environments, and the Import System¶
4.1 Conceptual Core¶
- A module is a
.pyfile (or.so/.pydextension) that becomes a singleton object on first import, cached insys.modules. Re-importing returns the cached object;importlib.reloadre-executes (with caveats - old references to old objects persist). - A package is a directory with an
__init__.py(or a namespace package, PEP 420, with no__init__.py). - Virtual environments are not optional. A modern Python project lives in a per-project
.venv/, managed byuv,hatch,poetry, orpip-tools. System Python is for the OS, not your code.
4.2 Mechanical Detail¶
- Import resolution order:
sys.modulescache → finders insys.meta_path→ loaders. The default finders areBuiltinImporter,FrozenImporter,PathFinder(which searchessys.path). - Absolute vs. relative imports (
from . import sibling,from ..pkg import x). Prefer absolute. __main__:python -m mypkgrunsmypkg/__main__.pyas__main__. Theif __name__ == "__main__":idiom exists because a module imported as a library has a different__name__than one run as a script.pyproject.toml(PEP 517, 518, 621, 660): the single source of truth for project metadata, build backend, dependencies, and tool configuration.setup.pyis dead for new projects.- Build backends:
hatchling,setuptools,flit-core,poetry-core,maturin(for Rust extensions),scikit-build-core(for C/C++/CMake). - Dependency resolution:
pip(legacy, slow),uv(fast, Rust, drop-inpipreplacement and resolver),poetry(lockfile-first). The curriculum standardizes onuvfor speed and ecosystem direction.
4.3 Lab - "Ship a CLI"¶
- Build a CLI tool - e.g., a Markdown table of contents generator. Project layout:
src/toctool/{__init__.py,__main__.py,cli.py,core.py},tests/,pyproject.toml. - Configure
[project.scripts] toctool = "toctool.cli:main". Verifypipx install .makestoctoolavailable system-wide. - Add a
[project.optional-dependencies] dev = [...]group.uv sync --extra devinstalls the dev tools. - Tag
v0.1.0. Build wheel + sdist withuv build. Inspect the wheel withunzip -l. Confirm no test files leaked in. - (Optional, sets up later weeks) Publish to TestPyPI.
4.4 Idiomatic & Linter Drill¶
- Enable
ruffrule setTID(banned-imports),INP(implicit namespace packages). Configure your__init__.pyto re-export a curated public API (__all__).
4.5 Production Hardening Slice¶
- Add a
pre-commitconfig runningruff check,ruff format,pyright, andpytest -x. Add a GitHub Actions (or equivalent) CI workflow that runsmake checkon push and matrix-tests over Python 3.12 and 3.13.
Month-1 Exit Criteria¶
Before starting Month 2, the reader should be able to, on a whiteboard:
- Diagram the namespace lookup order for a name in a function inside a class inside a module (LEGB, with the class scope wrinkle).
- Explain the difference between
is,==, and__eq__. - Write a generator pipeline that processes a 100GB log file in constant memory.
- Bootstrap a publishable Python package with
uv,ruff,pyright,pytest, and CI in under 30 minutes.
Month 2 - Intermediate Idioms: Decorators, Context Managers, Dataclasses, Typing¶
Goal: by the end of week 8 you can (a) write decorators that preserve type signatures and stack cleanly with functools.wraps and ParamSpec, (b) build correct context managers (sync and async) and reason about their teardown order, (c) model domain objects with dataclasses and pydantic knowing when each is appropriate, and (d) write pyright --strict-clean code using Protocol, generics, TypedDict, Literal, overload, and TypeGuard.
Weeks¶
- Week 5 - Object Model Deep Dive: Classes, Descriptors, Metaclasses
- Week 6 - Decorators,
functools, andcontextlib - Week 7 - Dataclasses,
attrs, Pydantic, and the Validation Boundary - Week 8 - The Type System: Generics, Protocols, Variance, and
typing.*
Week 5 - Object Model Deep Dive: Classes, Descriptors, Metaclasses¶
5.1 Conceptual Core¶
- Attribute lookup is a protocol, not a field read.
obj.xcallstype(obj).__getattribute__(obj, "x"), which checks the data-descriptor chain on the type, then the instance dict, then non-data descriptors, then__getattr__. - Descriptors (
__get__,__set__,__delete__) are how@property,staticmethod,classmethod, and__slots__actually work. Understanding descriptors is understanding 80% of Python's "magic." - Metaclasses (
typeis the default) intercept class creation. They are over-used;__init_subclass__(PEP 487) and class decorators cover most legitimate use cases.
5.2 Mechanical Detail¶
__slots__: replaces the per-instance__dict__with a fixed-size struct of slot descriptors. Saves ~40–60% memory per instance and speeds attribute access. Cost: no dynamic attributes, multiple-inheritance gotchas. Mandatory in hot-path data classes.- MRO (method resolution order) and the C3 linearization:
MyClass.__mro__. Diamond inheritance is solvable but signals over-design. super(): a proxy object that walks the MRO oftype(self)starting after the current class. Always cooperative; do not passsuper()__init__arguments unless you know the MRO.@property,@cached_property(3.8+, requires writable__dict__- incompatible with default__slots__unless you slot__dict__).__init_subclass__(cls, **kwargs)runs at subclass creation. Used for plugin registration, validation of subclass invariants. The 90%-case alternative to a metaclass.
5.3 Lab - "Build a Tiny ORM"¶
- Implement a
Fielddescriptor with type validation and adefault.class User: name = Field(str); age = Field(int, default=0). - Use
__init_subclass__to collect declared fields intocls._fields. Auto-generate__init__and__repr__. - Compare your hand-rolled version to
@dataclass(slots=True). Note where dataclass is better (PEP-595 ordering,__eq__,__hash__). - Implement a
RegistryMetametaclass that records every subclass in a class-level dict. Then re-implement using__init_subclass__. Defend the simpler version in writing.
5.4 Idiomatic & Linter Drill¶
- Enable
ruffSLOTrule. Add__slots__to every internal data class in your project. Note size delta withpympler.asizeof.
5.5 Production Hardening Slice¶
- Add
pyrightstrict mode. Addfrom __future__ import annotationsand switch to PEP 604 union syntax (X | Y). Resolve all type errors.
Week 6 - Decorators, functools, and contextlib¶
6.1 Conceptual Core¶
- A decorator is just
f = decorator(f). The@is sugar. - A useful decorator preserves: name, docstring, signature, type annotations, async-ness, and
__wrapped__for introspection.functools.wrapshandles the first three; preserving signature and type requiresParamSpec(PEP 612). - Class decorators decorate the class object itself.
@dataclassis the canonical example.
6.2 Mechanical Detail¶
functools.wraps,functools.partial,functools.partialmethod,functools.lru_cache(andcachein 3.9+ for unbounded),functools.singledispatch,functools.singledispatchmethod,functools.reduce(rarely the right tool - usually a comprehension orsum).- Type-preserving decorators with
ParamSpecandTypeVar: contextlib.contextmanagerfor generator-based context managers;contextlib.asynccontextmanagerfor async.contextlib.ExitStack/AsyncExitStack: the right tool for a dynamic number of context managers (e.g., opening a list of files determined at runtime).contextlib.suppress,contextlib.closing,contextlib.redirect_stdout.
6.3 Lab - "The Retry Decorator That Doesn't Lie About Its Type"¶
- Write
@retry(times=3, on=(IOError,), backoff=0.1). Make it work on both sync and async functions (detect withasyncio.iscoroutinefunction). - Use
ParamSpecso thatpyright --strictpreserves the wrapped signature. - Add structured logging on each retry. Add a
tenacity-style backoff strategy (constant, exponential, jittered). - Compare to
tenacitylibrary; document where yours is simpler / worse / better.
6.4 Idiomatic & Linter Drill¶
- Enable
ruffFBT(boolean-trap),ARG(unused arguments). Refactor decorators to take keyword-only configuration.
6.5 Production Hardening Slice¶
- Add
mypy(in addition topyright) withstrict_optional,disallow_any_generics. The two type checkers disagree on edge cases; configuring both surfaces those.
Week 7 - Dataclasses, attrs, Pydantic, and the Validation Boundary¶
7.1 Conceptual Core¶
- The single most important architectural decision in a typed Python codebase: where is the validation boundary? Internal types should be cheap (
@dataclass(slots=True, frozen=True)); boundary types (HTTP request bodies, message-bus payloads, LLM outputs) should validate (pydantic.BaseModel). - "Parse, don't validate." Once a value is past the boundary, it should be a typed object that cannot be malformed; checks afterward are dead code.
7.2 Mechanical Detail¶
dataclasses.dataclassparameters:frozen,slots,kw_only,eq,order,repr,match_args. Defaults that should befield(default_factory=list)- never bare[].attrs(the original): faster validators,evolve, slots by default. Still relevant;dataclasswon the stdlib slot butattrskeeps innovating.pydanticv2 (Rust core, ~10x faster than v1):BaseModel,Field(..., gt=0, le=100),model_validator,field_validator, discriminated unions,Annotated[..., AfterValidator(...)]. JSON schema export for free.TypedDict(PEP 589): for dict-shaped data with known keys (e.g., LLM tool-call payloads). Cheaper than Pydantic, no runtime validation. Pair withcastat the boundary or withpydantic.TypeAdapter.
7.3 Lab - "The Three-Layer Cake"¶
- Build an HTTP service (FastAPI, but kept small):
- Boundary layer: Pydantic
RequestModel/ResponseModel. - Domain layer:
@dataclass(slots=True, frozen=True)value objects. - Persistence layer:
TypedDictrows fromsqlite3. - Write explicit converters between each layer. Resist the urge to make them the same type.
- Benchmark a 10k-request loop with Pydantic v1 (if installed) vs. v2. Document the 10x.
7.4 Idiomatic & Linter Drill¶
- Enable
ruffD(pydocstyle). Document every public class and function. Enforce Google or NumPy docstring style.
7.5 Production Hardening Slice¶
- Add
schemathesisor property-based tests against your FastAPI app. Generate inputs from the OpenAPI schema; confirm 5xx never occurs on valid input shapes.
Week 8 - The Type System: Generics, Protocols, Variance, and typing.*¶
8.1 Conceptual Core¶
- Python's type system is gradual and structural-where-it-matters.
Protocollets duck typing meet static checking - the type system catches up to the language's actual semantics. - Variance:
list[Cat]is not alist[Animal](mutable → invariant).Sequence[Cat]is aSequence[Animal](read-only → covariant).Callable[[Animal], None]acceptsCallable[[Cat], None]? No (parameters are contravariant).
8.2 Mechanical Detail¶
- Generics (PEP 695, 3.12+): the new clean syntax -
def first[T](xs: list[T]) -> T: ...andclass Box[T]: .... OldTypeVarsyntax still works. Protocol,runtime_checkable. Structural typing for theIterable,Sized,SupportsLen, etc., families.Literal,LiteralString(PEP 675, security-relevant),Final,NewType,TypeAlias(PEP 695:type Vector = list[float]).overload: multiple stubs for one implementation. Use sparingly; usually a sign of conflated responsibilities, sometimes legitimately needed (e.g.,typing.cast, JSON parser return).TypeGuard(3.10) andTypeIs(3.13): user-defined narrowing predicates.TypeIsis the strictly better one going forward - it narrows in the negative branch too.Annotated[T, metadata](PEP 593): the foundation of FastAPI/Pydantic field metadata, validators, and dependency injection.
8.3 Lab - "Make Pyright Strict"¶
- Take a 500-LOC module of your existing code. Run
pyright --strict. Resolve every error. - Add a
Protocolfor a "thing-with-an-id" and refactor a function that previously tookAny. - Use
TypeIsto narrowdict | listreturned fromjson.loadsinto safe shapes for downstream use. - Where you find yourself reaching for
cast, document why and consider whether the boundary belongs at a Pydantic model.
8.4 Idiomatic & Linter Drill¶
- Enable
ruffANN(annotation hygiene),PYI(stub files). Aim for 100% annotated public surface; private may use inference.
8.5 Production Hardening Slice¶
- Add
mypy --strictto CI. Generate Sphinx docs from docstrings. Publish to GitHub Pages. By end of week 8, the project has a public docs site.
Month-2 Exit Criteria¶
Before starting Month 3, the reader should be able to:
- Write a decorator that wraps both sync and async functions and preserves their type signatures under
pyright --strict. - Choose between
dataclass,attrs,pydantic,TypedDict, andNamedTuplefor a given use case and defend the choice. - Add
Protocols to make an old codebase amenable to dependency injection without any code change at call sites. - Articulate the validation boundary in their own architecture and where the parse-don't-validate principle is or isn't held.
Month 3 - Runtime and Performance: CPython Internals, GIL, GC, the Allocator¶
Goal: by the end of week 12 you can (a) read CPython bytecode and predict where the eval loop will spend its time, (b) explain refcounting, the cyclic GC, and generational thresholds, (c) characterize a workload as GIL-bound, allocator-bound, or I/O-bound from py-spy/scalene/memray output, and (d) apply the four tiers of optimization (algorithmic → vectorize → C extension → JIT) with judgment.
This is the hardest month of the curriculum. Take it seriously.
Weeks¶
- Week 9 - The CPython VM: Objects, Bytecode, the Eval Loop
- Week 10 - Memory: Refcounts, Cyclic GC, the
pymallocAllocator - Week 11 - The GIL, Free-Threaded Python, and the Concurrency Model
- Week 12 - The Optimization Ladder: Algorithm → Vectorize → Native → JIT
Week 9 - The CPython VM: Objects, Bytecode, the Eval Loop¶
9.1 Conceptual Core¶
- CPython is a stack-based bytecode interpreter with reference counting + a generational cyclic GC. Every
PyObjectis a 16-byte header (ob_refcnt,ob_type) + type-specific tail. - The eval loop (
Python/ceval.c::_PyEval_EvalFrameDefault) is a giant computed-goto dispatch over opcodes. Since 3.11, the loop is specializing and adaptive (PEP 659): hot opcodes get rewritten in place to type-specialized variants (LOAD_ATTR_INSTANCE_VALUE,BINARY_OP_ADD_INT).
9.2 Mechanical Detail¶
dis.dis(fn): disassemble a function. Memorize the common opcodes:LOAD_FAST,STORE_FAST,LOAD_GLOBAL,LOAD_CONST,CALL,RETURN_VALUE,BINARY_OP,COMPARE_OP,FOR_ITER,POP_JUMP_IF_FALSE,LOAD_ATTR,STORE_SUBSCR.- Why local lookups are fast and global lookups are slow: locals are a fixed-size array indexed by integer (
fast locals), globals are a dict lookup. Hot functions often hoist globals to locals (def f(_len=len): ...). - Frame objects, code objects, and the difference.
func.__code__.co_consts,co_names,co_varnames,co_flags. - The specializing interpreter: read PEP 659 once. Use
python -X opt -c "import dis; dis.dis(fn, adaptive=True)"to see specialized opcodes after warm-up. - Free lists and small-int / interned-string caches.
9.3 Lab - "Bytecode Forensics"¶
- Write three implementations of "sum of squares": a
forloop, asum()+ genexp, andnumpy.dot(a, a).dis.diseach. Benchmark withtimeit. Explain the gap. - Take a function with a global lookup in its hot loop. Refactor to a default-argument cache. Re-bench. Quantify the win.
- Use
sys.setprofileto count opcode-level events on a small program. Compare counts before and after warm-up to observe specialization.
9.4 Idiomatic & Linter Drill¶
- Enable
ruffPERF. Read every rule. Identify cases in your codebase where the rule applies but readability suffers.
9.5 Production Hardening Slice¶
- Add
pytest-benchmarkto CI as a non-failing job that publishes JSON results. Build a script that flags >10% regressions on PRs.
Week 10 - Memory: Refcounts, Cyclic GC, the pymalloc Allocator¶
10.1 Conceptual Core¶
- Reference counting is eager - most objects die at refcount 0, deterministically, often without invoking the GC at all. This is why Python file handles can be closed by
del fand why context managers are the right answer for resources that cannot tolerate non-determinism. - The cyclic GC handles only objects that might form cycles (containers). It runs in three generations with thresholds. It does not free memory; it breaks cycles so refcounting can free memory.
- The CPython allocator (
pymalloc) is an arena/pool/block allocator tuned for small (<512B) objects. Large allocations go to the system malloc.
10.2 Mechanical Detail¶
sys.getrefcount(obj): returns refcount + 1 (the temporary on the call stack).weakref.refto break cycles.gc.set_threshold,gc.disable,gc.collect. Disabling GC during a known short-lived high-allocation phase (e.g., model loading) and re-enabling after is a real production technique.- Memory leaks in pure Python are almost always (a) caches without bounds, (b) closures capturing large objects, (c)
__del__methods on cyclic objects (legacy issue; mostly fixed since 3.4 / PEP 442). Find withtracemallocormemray. __slots__revisited: per-instance memory savings, attribute-access speed-ups, the inheritance gotcha.array.array,bytes,bytearray,memoryview,numpy.ndarray: when not to make Python objects in the first place.
10.3 Lab - "Find the Leak"¶
- Write a service that has a deliberate leak: an unbounded
dictcache, a leaking closure, and a circular reference with a__del__. Run undermemrayandtracemalloc. Identify each leak from the output. - Bound the cache with
functools.lru_cache(maxsize=...). Confirm withmemraythat growth flatlines. - Profile a NumPy-heavy workload. Observe that pymalloc and Python refcounts are largely unused - most memory is in NumPy buffers. Internalize: "NumPy is a different memory world."
10.4 Idiomatic & Linter Drill¶
- Enable
ruffB008,B023. Catch closure-capture bugs at lint time.
10.5 Production Hardening Slice¶
- Add a
memraysmoke job to CI: run the service against a fixture, fail if peak RSS exceeds a threshold.
Week 11 - The GIL, Free-Threaded Python, and the Concurrency Model¶
11.1 Conceptual Core¶
- The Global Interpreter Lock serializes Python bytecode execution. It does not serialize C extensions that release it (NumPy, PyTorch,
time.sleep, most I/O). This is why "Python is single-threaded" is wrong on the parts that matter. - CPU-bound, pure-Python → multiprocessing or free-threaded build (PEP 703).
- CPU-bound, native (NumPy/torch) → threads are fine; the GIL is released.
- I/O-bound → asyncio (preferred) or threads.
11.2 Mechanical Detail¶
- The GIL is a single mutex around the interpreter state. Released on I/O syscalls and on every ~5ms timeslice (
sys.setswitchinterval). Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADSin C extensions: how NumPy/torch escape.- PEP 703 free-threaded build (3.13 experimental, 3.14 stable target): per-object locking, biased reference counting, deferred reference counting for immortal objects. Trade-off: ~10–20% single-threaded slowdown, true parallel threads. Build with
--disable-gilor usepython3.13t. - PEP 684 per-interpreter GIL: subinterpreters with their own GIL,
interpretersmodule (PEP 734, 3.13+). Promising for embedded multitenancy. threading.Lock,RLock,Semaphore,Condition,Event,Barrier. The atomicity guarantees of CPython ondict/listops are implementation details, not language-level - under free-threaded these change. Always use a lock if you require atomicity.
11.3 Lab - "GIL Awareness"¶
- Compute primes up to 1M three ways: (a) single thread, (b)
threadingwith 8 threads, (c)multiprocessingwith 8 procs. Bench all three on stock CPython. - Run (b) on
python3.13t(free-threaded). Compare. - Replace the prime-test inner loop with a NumPy expression. Re-run (b) on stock CPython. Note the GIL-release effect.
- Capture
py-spy recordflame graphs for each. Identify GIL contention visually.
11.4 Idiomatic & Linter Drill¶
- Enable
ruffASYNC. Catchtime.sleepinsideasync def, blocking I/O inside async, etc.
11.5 Production Hardening Slice¶
- Add
py-spycontinuous profiling to your CI'd service. On every PR, attach a flame graph artifact.
Week 12 - The Optimization Ladder: Algorithm → Vectorize → Native → JIT¶
12.1 Conceptual Core¶
The ladder, in order of expected ROI per hour:
- Algorithm and data structure. Big-O still wins by orders of magnitude.
setmembership overlistmembership.heapqover re-sorting.bisectover linear scan. - Stay in the C layer. Replace Python loops with NumPy / Polars / itertools / built-in
sum/min/max/map. The eval loop is your enemy; comprehensions and built-ins minimize trips through it. - Native extension. Cython, mypyc, Rust+PyO3, C+Pybind11. Write Python, profile, then rewrite the hot 5%.
- JIT. PyPy (drop-in for many workloads, no NumPy if you use
cpyext), Numba (NumPy-aware JIT), or wait for the upcoming CPython JIT (PEP 744 tier-2 + tier-3 copy-and-patch JIT, experimental in 3.13).
12.2 Mechanical Detail¶
- Profilers, in order of use:
cProfile+snakeviz: function-level, deterministic, ~10% overhead.py-spy: sampling, attaches to running process, no code changes, ~1% overhead. Use this in production.scalene: CPU + memory + GPU, line-level, low overhead.memray: memory-focused, flamegraphs.pyinstrument: low-overhead sampling, beautiful HTML output.- Vectorization patterns: replace
for x in xs: ys.append(x*2 + 1)withnp.asarray(xs) * 2 + 1. Replacefor row in df.iterrows():with column expressions in Polars / Pandas. - Cython mental model: write Python, add
cdef int i, recompile, get 10–100x. Worth it for hot inner kernels. mypyc(compiles type-annotated Python to C extensions; powersmypyitself,black).- PyO3 + maturin: Rust extensions with
maturin develop. The right tool when you also need true threads and predictable memory.
12.3 Lab - "Climb the Ladder"¶
Take a deliberately slow workload - e.g., compute pairwise cosine similarity between 10k 768-dim vectors with a pure-Python triple loop. Time it. Then climb:
1. Algorithmic: skip pairs already computed.
2. Vectorize: NumPy batched matmul with norm.
3. Cython rewrite of the inner kernel.
4. Numba @njit on the same.
5. (Stretch) Rust + PyO3 implementation.
6. Compare to faiss / hnswlib.
Tabulate speedups in NOTES.md. The lesson is that step 2 usually wins by 100x and step 3+ by ~2x more - but step 6 (use the right library) wins by 1000x. Algorithm > implementation > tuning.
12.4 Idiomatic & Linter Drill¶
- Enable
ruffNPY(NumPy-specific rules). Refactor numerical code to use modern NumPy idioms.
12.5 Production Hardening Slice¶
- Add
pytest-benchmarkregression gates. Add aperf/directory of benchmarks tracked in CI with historical data.
Month-3 Exit Criteria¶
Before starting Month 4:
- Read
dis.disoutput and predict relative cost of two Python implementations. - Identify a real memory leak from a
memrayflamegraph. - Choose between threads / processes / asyncio / free-threaded for a given workload, with one paragraph of justification.
- Apply the optimization ladder in order - and refuse to skip step 1.
Month 4 - Concurrency and Parallelism: asyncio, Threads, Processes, Free-Threaded, Subinterpreters¶
Goal: by the end of week 16 you can (a) build a non-trivial asyncio service without ever blocking the event loop, (b) reason about cancellation, timeouts, and structured concurrency in both asyncio and anyio, (c) move work between threads, processes, and subinterpreters with clear cost/benefit, and (d) write a C-extension or Rust binding that releases the GIL and parallelizes correctly.
This month and Month 6 are where the senior-level signal really lives.
Weeks¶
- Week 13 -
asyncioFoundations: Event Loop, Tasks, Coroutines - Week 14 - Structured Concurrency, Cancellation, ExceptionGroups,
anyio - Week 15 - Threads, Processes, Subinterpreters,
concurrent.futures - Week 16 - Native Extensions, Releasing the GIL, FFI
Week 13 - asyncio Foundations: Event Loop, Tasks, Coroutines¶
13.1 Conceptual Core¶
- An async function is a function that returns a coroutine object. Awaiting yields control to the event loop. The event loop drives many coroutines, switching at every
await. - The cardinal sin: blocking the event loop. A single
time.sleep(1), sync DB call, or CPU-heavy loop in a coroutine stalls every other task. This is the most common production asyncio bug. - Task vs. Coroutine. A coroutine is a description; a
Taskis a coroutine scheduled on the loop.await cororuns it inline;asyncio.create_task(coro)runs it concurrently and returns a handle.
13.2 Mechanical Detail¶
asyncio.run,asyncio.create_task,asyncio.gather,asyncio.wait,asyncio.as_completed,asyncio.wait_for,asyncio.TaskGroup(3.11+, the way to write structured concurrency since 3.11).async with,async for, async generators,__aenter__/__aexit__,__aiter__/__anext__.asyncio.Queue,asyncio.Lock,asyncio.Event,asyncio.Semaphore,asyncio.Condition. None are thread-safe; for thread-safe inter-loop comm, useasyncio.run_coroutine_threadsafeorjanus.- Cancellation is cooperative and exception-based:
task.cancel()injectsCancelledErrorat the nextawait. Code that catchesExceptionswallowingCancelledErroris the asyncio anti-pattern; in 3.11+,CancelledErroris no longer a subclass ofException- but old code remains. - Timeouts:
async with asyncio.timeout(5):(3.11+) is the idiomatic form.asyncio.wait_foris older and has subtle cancellation pitfalls. loop.run_in_executor(None, blocking_fn, args): the escape hatch for blocking calls. Use for legacy DB drivers, file I/O if not usingaiofiles, and CPU work.
13.3 Lab - "The Crawler That Doesn't Lie"¶
- Build an async HTTP crawler with
httpx.AsyncClientand aTaskGroup. Limit concurrency with aSemaphore(N). - Add a 5-second per-request timeout using
asyncio.timeout. Verify cancellation propagates cleanly to thehttpxrequest. - Inject a deliberately blocking
time.sleep(2)somewhere. Detect it withasyncio.get_event_loop().slow_callback_duration = 0.1and the resulting log warnings. - Replace the blocker with
asyncio.sleep. Confirm viapy-spy dumpthat the loop never stalls.
13.4 Idiomatic & Linter Drill¶
- Enable
ruffASYNCrule set in full. Catch every blocking call insideasync def.
13.5 Production Hardening Slice¶
- Add
aiomonitororaiodebugto your dev environment. Add a request-idContextVarand structured logging that propagates acrossawaitboundaries.
Week 14 - Structured Concurrency, Cancellation, ExceptionGroups, anyio¶
14.1 Conceptual Core¶
- Structured concurrency: a parent task does not exit until its children have finished. No orphaned tasks, no leaked work.
asyncio.TaskGroup(3.11+) andanyio.create_task_groupare the canonical implementations; both inspired by Trio. - Cancellation must be honored. A coroutine that catches
BaseException(or worse,Exceptionin <3.11) and ignores it breaks structured concurrency. The contract: if you must catch, re-raiseCancelledError. ExceptionGroup(PEP 654, 3.11+): when multiple sibling tasks fail, you get anExceptionGroupcontaining all of them.except* ValueError:matches the subset.
14.2 Mechanical Detail¶
anyioas the portable abstraction: works onasyncioortriobackends, gives youcreate_task_group,move_on_after,fail_after,to_thread.run_sync,from_thread.run. If you write libraries, preferanyioover rawasyncio.contextvars.ContextVar: the async-safe replacement forthreading.local. Used by tracing libraries, request-id propagation, FastAPI dependencies.- Backpressure: bounded
asyncio.Queueis your friend. Unbounded queues are how async services OOM in production. - The
eager_task_factory(3.12+): start tasks eagerly when possible, reducing scheduling overhead.
14.3 Lab - "The Fan-Out That Cleans Up After Itself"¶
- Refactor your week-13 crawler to use
TaskGroup(oranyiotask group). - Add a "first-error wins" mode: as soon as any task raises, all siblings are cancelled and the group raises an
ExceptionGroup. - Add a "best-effort" mode: collect all results and exceptions, return both.
- Verify via test that cancelling the parent cancels every in-flight HTTP request within 100ms.
14.4 Idiomatic & Linter Drill¶
- Add
ruffRUF006(asyncio dangling tasks). Refactor anycreate_tasknot held in aTaskGroupor kept-reference set.
14.5 Production Hardening Slice¶
- Add OpenTelemetry instrumentation. Verify trace context propagates across
TaskGroupboundaries.
Week 15 - Threads, Processes, Subinterpreters, concurrent.futures¶
15.1 Conceptual Core¶
concurrent.futuresis the unified high-level API.ThreadPoolExecutorfor I/O or GIL-releasing C;ProcessPoolExecutorfor pure-Python CPU.multiprocessingstart methods:fork(fast, dangerous with threads/locks/CUDA),spawn(safe default on macOS/Windows, slower),forkserver(good middle ground on Linux).- Pickle is the IPC currency for
multiprocessing. Things that can't be pickled (lambdas, locally defined classes, open file handles) cannot cross the boundary.cloudpickleis the third-party escape hatch. - Subinterpreters (PEP 684/734, 3.13+): each interpreter has its own GIL, its own modules, its own
sys. Communication viainterpreters.Queueor shared memory. Lighter than processes, heavier than threads.
15.2 Mechanical Detail¶
multiprocessing.shared_memory.SharedMemory(3.8+): zero-copy buffers across processes. Pair withnumpy.ndarray(buffer=shm.buf)for big-array IPC.multiprocessing.Manager: proxy objects forlist,dict, etc. Convenient but slow - every op is an IPC.os.fork()directly is rarely correct in modern Python; usemultiprocessingorsubprocess.- The free-threaded build (PEP 703): with
python3.13t,ThreadPoolExecutorbecomes a true parallel CPU executor for pure-Python code. The future-state replacement for manyProcessPoolExecutoruse cases.
15.3 Lab - "Pick Your Parallelism"¶
For each workload, pick a model and justify: 1. Compress 10k JPEGs in parallel. 2. Run 10k HTTP requests against an external API (rate-limited). 3. Compute SHA-256 of 10k 1MB blobs. 4. Train 10 small models concurrently sharing a GPU.
Implement at least two of them three ways: threads, processes, asyncio. Bench. Write up the right answer.
15.4 Idiomatic & Linter Drill¶
- Add
ruffS(security, bandit-style). Catchsubprocess.run(..., shell=True)and the unpickling of untrusted input.
15.5 Production Hardening Slice¶
- Add a deadlock-detection probe: a watchdog thread that dumps
py-spyif the main loop hasn't ticked in 30s. Ship it as part of the hardening template.
Week 16 - Native Extensions, Releasing the GIL, FFI¶
16.1 Conceptual Core¶
- The fastest Python is Python that calls into C. The fastest correct Python is Python that calls into C and releases the GIL while it's there. NumPy, PyTorch, and
hashlibdo this; many third-party C extensions don't. - Three FFI options today:
- Cython for inner-loop kernels (write Python with type hints, compile to C).
- Rust + PyO3 + maturin for everything else: thread-safe, memory-safe, modern build.
ctypes/cffifor calling existing.so/.dlllibraries without writing an extension.
16.2 Mechanical Detail¶
- The CPython C API:
PyObject *, refcount discipline (Py_INCREF/Py_DECREF), the GIL macros (Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS), exception handling (PyErr_SetString). - The Stable ABI (PEP 384) and the Limited API. Building wheels that work across CPython versions.
numpy's C API:PyArrayObject,PyArray_DATA, contiguity flags. The buffer protocol (PEP 3118):memoryview,__buffer__on Python types.- PyO3 idioms:
#[pyfunction],#[pyclass],Bound<'py, PyAny>,py.allow_threads(|| {...})to release the GIL. - HPy: a successor C API, portable across PyPy/CPython/GraalPy. Worth knowing about; not yet load-bearing.
16.3 Lab - "Write the Hot Kernel in Rust"¶
- Take the cosine-similarity workload from week 12. Implement it in Rust with PyO3.
- Use
py.allow_threads(|| ...)around the SIMD loop. Verify with a PythonThreadPoolExecutor(8)that you get ~8x speedup. - Compare to NumPy and to your Cython version. Write up the cost in code complexity.
- Bonus: expose a
Vector#[pyclass]and benchmark crossing the FFI per-call vs. per-batch. Internalize the per-call FFI cost.
16.4 Idiomatic & Linter Drill¶
- Add
cargo clippyto your Rust crate. Addmaturin develop --releaseto your dev workflow.
16.5 Production Hardening Slice¶
- Build manylinux wheels with
cibuildwheel. Add a CI matrix: cp312, cp313, cp313t, cp314 across linux/x86_64, linux/aarch64, macos/arm64. This is the modern wheel-distribution baseline.
Month-4 Exit Criteria¶
Before starting Month 5:
- Build an asyncio service that survives
kill -INTmid-flight without dropping requests or leaking tasks. - Pick - and justify - between threads, processes, asyncio, free-threaded, and subinterpreters for any workload.
- Write a Rust extension that releases the GIL and verify parallel scaling.
- Diagnose an event-loop stall from a
py-spy dumpalone.
Month 5 - Patterns and Architecture: Pythonic Design, Testing, Observability, Service Shape¶
Goal: by the end of week 20 you can (a) translate the Gang-of-Four catalog into Pythonic forms (and reject the ones that don't translate), (b) choose the right data structure for a given problem from a much larger menu than list/dict, (c) ship a FastAPI service with structured logging, metrics, traces, and a credible test pyramid, and (d) lay out a multi-package monorepo whose import graph doesn't cycle.
Weeks¶
- Week 17 - Pythonic Design Patterns
- Week 18 - Data Structures Beyond
list/dict - Week 19 - Testing, Property-Based Testing, Mutation Testing, Fakes vs. Mocks
- Week 20 - Observability, FastAPI, Production Service Shape
Week 17 - Pythonic Design Patterns¶
17.1 Conceptual Core¶
The GoF book describes patches around C++ and Java limitations: no first-class functions, no closures, no duck typing, mandatory class hierarchies. Many "patterns" in Python collapse to language features. Some still apply. Knowing which is which is senior-level taste.
17.2 The Catalog, Translated¶
| GoF Pattern | Pythonic Form |
|---|---|
| Strategy | A function passed as an argument. Or a Protocol. |
| Template Method | A function with hooks; ABC + abstractmethod only when you need enforcement. |
| Factory / Abstract Factory | A function. Or __init_subclass__ registry. Or functools.singledispatch. |
| Singleton | A module. import is the singleton. @lru_cache(maxsize=None) on a constructor for parameterized singletons. |
| Observer | signal/blinker, asyncio Queue, or just a list of callbacks. |
| Iterator | Built into the language (__iter__). |
| Decorator (GoF) | Decorators (Python). |
| Adapter | A function. Or Protocol + a thin wrapper class. |
| Visitor | functools.singledispatch for type dispatch; match statement for ADTs. |
| Command | A callable. functools.partial for binding. |
| Chain of Responsibility | Middleware. ASGI/WSGI middleware is exactly this. |
| State | Functions returning functions; or a match over an Enum. |
| Builder | Keyword args + dataclass. Rarely a builder class. |
| Flyweight | sys.intern for strings; __slots__ + class-level constants. |
| Proxy | __getattr__-based forwarding; or weakref.proxy. |
| Composite | A type that contains itself: tree: Node | list[Node]. |
| Memento | dataclasses.replace + immutability. |
| Mediator | An event bus or a domain service. |
The patterns that survive idiomatically: Strategy (via Protocol), Observer (via async queues), Chain (via middleware), Visitor (via singledispatch / match).
17.3 Architectural Patterns¶
- Hexagonal / Ports and Adapters. Domain at the core, adapters at the edge (HTTP, DB, message bus). Test the domain in isolation. Architecture Patterns with Python (Percival/Gregory) is the canonical Python treatment.
- Repository pattern: abstract persistence behind a
Protocol. Tests use an in-memory fake; production uses SQL. - Unit of Work: collect domain mutations, commit atomically. Pairs with SQLAlchemy session.
- CQRS-lite: separate read models from write models when their shapes diverge. Don't over-apply.
17.4 Lab - "Refactor a Junk Drawer"¶
- Take a 1k-LOC script of mixed responsibilities. Extract:
domain/,adapters/,service/,entrypoints/. WriteProtocols for the seams. - Add a fake repository for tests; the real one talks to SQLite. Run the same test suite against both.
- Document, in a
docs/architecture.md, why each module exists and what it depends on.
17.5 Idiomatic & Linter Drill¶
import-linterto enforce the dependency direction (entrypoints → service → domain ← adapters).
Week 18 - Data Structures Beyond list/dict¶
18.1 The Menu¶
| Need | Structure |
|---|---|
| Ordered, indexable, mutable | list |
| Immutable record | tuple, NamedTuple, frozen dataclass |
| Membership / dedup | set / frozenset |
| Key-value | dict |
| Counts | collections.Counter |
| Default-on-miss | collections.defaultdict |
| FIFO / deque | collections.deque |
| Priority queue | heapq (no class - operates on list) |
| Sorted container | sortedcontainers.SortedList/SortedDict (third-party but de facto standard) |
| Disjoint set | hand-rolled (~20 lines) or networkx.utils.UnionFind |
| LRU cache | functools.lru_cache for functions; cachetools.LRUCache for objects |
| Bloom filter | pybloom-live or roll your own (Appendix B) |
| Trie | pygtrie or roll your own |
| Interval tree | intervaltree |
| Graph | networkx (development), igraph/graph-tool (scale) |
| DataFrame | pandas (legacy), polars (modern, lazy, multi-threaded) |
| Tensor | numpy (CPU), torch.Tensor (CPU/GPU/autograd) |
| Sparse vector | scipy.sparse, torch.sparse |
| Vector index | faiss, hnswlib, usearch |
18.2 Lab - "Right Tool for Right Workload"¶
- A leaderboard with frequent insert + top-K query: implement with
list(naive),heapq(better),SortedList(best). Bench at 10k/100k/1M elements. - A rolling-window deduplicator:
set(memory-unbounded), Bloom filter (memory-bounded, false positives),cachetools.TTLCache. Pick one with justification. - A nearest-neighbor lookup over 1M 768-dim vectors: brute-force NumPy,
hnswlib,faiss. Note recall/latency trade-offs.
18.3 Production Hardening Slice¶
- Add a "data-structure decision log" to
docs/. Every non-trivial collection choice gets one paragraph: what was rejected and why.
Week 19 - Testing, Property-Based Testing, Mutation Testing, Fakes vs. Mocks¶
19.1 Conceptual Core¶
- The test pyramid still applies: many fast unit tests, some integration, few end-to-end. In Python, the unit tier is so cheap there's almost never an excuse for skipping it.
- Prefer fakes to mocks. A fake is a working implementation with simpler internals (in-memory repo, fake clock). A mock is a script of expected calls. Mocks couple tests to implementation; fakes don't.
19.2 Mechanical Detail¶
pytest: fixtures,conftest.py, parametrize, marks,pytest.raises,pytest.warns,tmp_path,caplog,capsys. Plugin ecosystem:pytest-asyncio,pytest-cov,pytest-xdist(parallel),pytest-benchmark,pytest-mock,pytest-randomly.hypothesis: property-based testing. Strategies,@given,@settings,assume, stateful tests withRuleBasedStateMachine. The single biggest force-multiplier in this curriculum.mutmut/cosmic-ray: mutation testing. Verifies that your tests fail when production code is broken - surfaces vacuous tests.unittest.mock: when you must. Prefermonkeypatchfixture andrespx/vcr.pyfor HTTP.- Test doubles taxonomy: dummy, stub, spy, mock, fake. Know the difference.
19.3 Lab - "The Tests Find Bugs You Didn't Know You Had"¶
- Add
hypothesisproperty tests to your week-3 word counter. Watch them find a UTF-8 boundary bug or an empty-input issue. - Add a stateful
hypothesistest against your tiny ORM from week 5. - Run
mutmut. Identify untested branches. - Replace any
Mockyou used with a fake implementing aProtocol.
19.4 Idiomatic & Linter Drill¶
- Enable
ruffPT(pytest style). Refactorassert x == 1; assert y == 2patterns; prefer one assertion per test.
19.5 Production Hardening Slice¶
- CI gate: coverage ≥ 90%,
mutmutkilled-mutant ratio ≥ 80%,hypothesisprofile--ci.
Week 20 - Observability, FastAPI, Production Service Shape¶
20.1 Conceptual Core¶
A production Python service has, at minimum: structured logs, metrics, distributed traces, health checks, graceful shutdown, configuration via env, secrets via a vault, and a dependency-injection seam for tests. None are optional.
20.2 Mechanical Detail¶
- Logging:
loggingstdlib, configured once at startup, JSON formatter (python-json-loggerorstructlog). Attachrequest_id,user_id,trace_idviaContextVar. - Metrics:
prometheus_clientfor pull; OpenTelemetry metrics for push. Histograms over averages - averages lie about tail latency. - Traces:
opentelemetry-api+opentelemetry-sdk+opentelemetry-instrumentation-fastapi/-httpx/-sqlalchemy. Auto-instrumentation gets you 80% for free. - FastAPI: ASGI app, Pydantic-typed request/response, dependencies as
Annotated[T, Depends(...)], lifespan context manager for startup/shutdown,BackgroundTasksonly for fire-and-forget (use a real queue for durable work). - Configuration:
pydantic-settingsreading.env+ env vars. Never hardcode. Never read env vars directly in domain code. - Graceful shutdown: SIGTERM → drain in-flight → close DB pools → exit.
uvicorn --graceful-timeout. - Health:
/healthz(liveness, returns 200 if the process is up) vs./readyz(readiness, returns 200 only if dependencies are reachable). Distinct, not interchangeable.
20.3 Lab - "Production-Shaped Service"¶
Build a FastAPI service that:
1. Accepts a POST /jobs, persists to SQLite, returns a job ID.
2. Processes jobs in an asyncio.TaskGroup background worker with bounded concurrency.
3. Emits structured JSON logs with trace correlation.
4. Exposes /metrics (Prometheus) and /healthz//readyz.
5. Handles SIGTERM by draining in-flight jobs.
6. Runs under uvicorn with --workers 4 (multi-process). Document why workers > 1 for CPU-light I/O-bound services on stock CPython.
7. Has a docker-compose stack including Prometheus, Grafana, and Jaeger.
8. Has a k6 or locust load test in loadtest/ reproducing the latency SLO.
20.4 Idiomatic & Linter Drill¶
- Add
ruffLOGrules. Catchlogger.info(f"...")(use%formatting for lazy interpolation).
20.5 Production Hardening Slice¶
- Deploy to a free-tier cloud (Fly.io, Render, or a Hetzner VM with Caddy). Run for a week, watch the dashboard, write a one-page postmortem of what the dashboard taught you.
Month-5 Exit Criteria¶
Before starting Month 6:
- Translate any GoF pattern to its Pythonic form, or argue it doesn't apply.
- Pick the right data structure from the menu without defaulting to
dict/list. - Ship a FastAPI service with full observability and graceful shutdown in under a day.
- Defend a hexagonal architecture in a code review.
Month 6 - AI Systems at a Senior Level: RAG, Agents, Evals, Training, Serving¶
Goal: by the end of week 24 you can (a) design a RAG pipeline end-to-end with explicit choices on chunking, embedding, indexing, retrieval, reranking, and prompting, (b) build a tool-using agent with durable execution, observability, and cost ceilings, (c) run an offline + online evaluation harness that catches regressions before users do, and (d) fine-tune, serve, and roll out a small open-weights model behind a FastAPI gateway.
This is the synthesis month. Every prior month feeds in.
Weeks¶
- Week 21 - LLM-App Foundations: Prompts, Tokens, Streaming, Cost
- Week 22 - Retrieval-Augmented Generation: Doing It Properly
- Week 23 - Agents, Tools, Durable Execution, Cost & Safety
- Week 24 - Training, Serving, Rollout, and the Capstone Defense
Week 21 - LLM-App Foundations: Prompts, Tokens, Streaming, Cost¶
21.1 Conceptual Core¶
- An LLM call is an autoregressive generation over a token stream, billed per token, latency-bound by output length, and probabilistic in output. All four facts shape the system around it.
- Tokens, not characters or words. Costs, context windows, and rate limits are all in tokens. Always budget in tokens.
tiktokenor the model's own tokenizer is the source of truth. - Streaming is product-critical. Time-to-first-token (TTFT) usually matters more than total time. Design APIs streaming-first; convert to batch only if needed.
- Caching the prompt prefix is free latency. Anthropic and OpenAI both expose prompt caching; structure prompts so the cacheable prefix is large and stable.
21.2 Mechanical Detail¶
- SDKs:
anthropic,openai, pluslitellmas a normalization layer if you need provider portability. Async clients always - sync clients block your event loop and waste capacity. - Streaming: SSE on the wire; in code, an async iterator of events. Render incrementally. Handle mid-stream errors and retries (resume is hard; usually you re-issue from scratch).
- Structured output: JSON mode, tool use / function calling, or constrained decoding (
outlines,instructor,lm-format-enforcer). Pydantic models as the schema. - Failure modes: rate limits (429), token limits, content filters, schema-violating outputs, hallucinated tool arguments. Each has a distinct retry strategy.
21.3 Lab - "A Disciplined LLM Client"¶
- Build an
LLMClientabstraction overanthropicandopenaiasync SDKs. Methods:generate,stream,with_tools. - Add token accounting: pre-call estimate, post-call actual, running cost meter.
- Add caching headers (Anthropic prompt caching). Measure latency delta.
- Add structured-output mode using
instructor+ a Pydantic schema. Test on a deliberately ambiguous prompt; observe schema enforcement. - Add timeout, retry-with-backoff, and circuit breaker (
pybreakeror hand-rolled).
21.4 Production Hardening Slice¶
- Add per-request
trace_id,model,prompt_tokens,completion_tokens,cached_tokens,cost_usdto your structured logs. This is the only way you'll keep cost under control in production.
Week 22 - Retrieval-Augmented Generation: Doing It Properly¶
22.1 Conceptual Core¶
RAG fails in seven places, and a senior engineer must know each:
- Ingestion: garbage-in (bad PDFs, lost layout, OCR errors).
- Chunking: too big → diluted relevance; too small → loss of context. Try semantic / recursive / sentence-window strategies; benchmark them.
- Embedding: model choice (
text-embedding-3-large,bge-large,nomic-embed,voyage-3), normalization, dimension, multilingual support. - Indexing: HNSW (
hnswlib,faiss,usearch), IVF-PQ for scale, keyword (bm25s,tantivy), hybrid (dense + sparse + reranker). - Retrieval: top-K, MMR for diversity, query rewriting / HyDE, query routing.
- Reranking: a cross-encoder reranker (
bge-reranker, Cohere rerank, Voyage rerank) on the top-50 → top-5. Often the single biggest quality win. - Prompting: how the chunks are presented, citation format, instructions for "don't answer if not in context."
22.2 Mechanical Detail¶
- Vector DBs:
pgvector(Postgres extension, the boring-and-correct choice),qdrant,weaviate,milvus,chroma(dev),lance/lancedb(good for local),turbopuffer(cheap, serverless). - Hybrid search:
RRF(reciprocal rank fusion) over dense + BM25. - Embedding pipelines with backpressure: don't OOM your provider, batch carefully, retry idempotently.
- Evals for RAG: retrieval recall@K, answer faithfulness (LLM-as-judge), answer relevance, context precision (
ragas,trulens, custom).
22.3 Lab - "End-to-End RAG with Honest Evals"¶
- Pick a corpus (your own docs, a Wikipedia subset, or a publicly available QA dataset). Ingest with at least two chunking strategies.
- Stand up
pgvectororqdrant. Index with two embedding models. - Implement hybrid retrieval (dense + BM25 + RRF) and add a reranker.
- Build a 50-question gold eval set with reference answers. Score with
ragas. Iterate retrieval until faithfulness > 0.85. - Plot the impact of each pipeline change in a results table. Resist the urge to tune blindly.
22.4 Production Hardening Slice¶
- Add eval-on-CI: every PR runs the gold set against the changed pipeline; regressions block merge.
Week 23 - Agents, Tools, Durable Execution, Cost & Safety¶
23.1 Conceptual Core¶
An "agent" is an LLM in a loop over a tool-use protocol with state, exit conditions, and observability. The dangerous failure modes:
- Runaway loops: turn caps, cost caps, time caps. All three.
- Bad tool inputs: validate aggressively at the tool boundary; treat the LLM as untrusted input.
- Silent quality drift: log every step; replay traces in tests.
- Permission escalation: an agent with bash is a remote-code-execution surface. Sandbox.
Read Anthropic's Building effective agents once. The taxonomy (workflows vs. agents; chains, routers, parallelization, orchestrator-workers, evaluator-optimizer) is load-bearing.
23.2 Mechanical Detail¶
- Frameworks worth knowing (in 2026):
pydantic-ai,dspy,instructor,langgraph(for graph-shaped flows), and the "build your own" path. Default to "build your own" until you've felt the pain - most frameworks add accidental complexity. - Durable execution:
temporal,inngest, or a state-machine table in Postgres. Critical when agents take minutes-to-hours and processes can crash. - Tool definitions: Pydantic schemas → JSON Schema → tool spec. Use
pydantic-aiorinstructorto generate. - Sandboxing:
e2b, Docker, gVisor, Firecracker. NeverexecLLM-generated code on the host. - Observability for agents:
langfuse,arize phoenix, or roll your own with OpenTelemetry spans-per-step.
23.3 Lab - "An Agent That Doesn't Burn Money"¶
- Build a research agent: takes a question, plans, calls
web_searchandfetch_urltools, synthesizes an answer with citations. - Add: max-turns=10, max-tokens=200k, max-wall-time=120s, max-cost=$0.50. Verify each cap fires correctly.
- Persist agent state (turn-by-turn) to Postgres. Recover after a kill -9.
- Write replay tests: feed a saved trace to a test, mock the LLM, assert tool calls happen in the right order.
- Add an evaluator-optimizer loop: a critic LLM grades the answer; if score < threshold, revise once.
23.4 Production Hardening Slice¶
- Add a "kill switch": a feature flag that immediately disables agent execution. Verify it works via end-to-end test.
Week 24 - Training, Serving, Rollout, and the Capstone Defense¶
24.1 Conceptual Core¶
You are unlikely to pretrain a foundation model. You will, repeatedly: (a) fine-tune with LoRA/QLoRA, (b) serve open-weights models, (c) roll out behind a gateway with shadow / canary / staged-percent traffic.
24.2 Mechanical Detail¶
- Fine-tuning stack:
transformers,peft(LoRA/QLoRA),trl(SFT, DPO),bitsandbytes(4-bit),accelerate(multi-GPU),unsloth(faster LoRA). Datasets viadatasets(HuggingFace). - Serving stack:
vLLM(PagedAttention, the default choice for throughput),TGI,SGLang,llama.cpp/ollama(for tiny / local),Triton Inference Server(when you need the matrix). Quantization: GPTQ, AWQ, GGUF. - Gateway shape: FastAPI in front of vLLM. Streaming passthrough. Per-tenant rate limits. Cost accounting per request. Model routing (route cheap queries to small models).
- Rollout: shadow (mirror traffic, compare), canary (1% → 10% → 50% → 100%), feature flags per cohort. Eval-on-rollout: keep the offline eval running against the canary.
- Continuous evaluation: a daily replay of N production samples (PII-scrubbed) against the new model. Block promotion on regression.
24.3 Capstone Defense¶
You picked a track from CAPSTONE_PROJECTS.md at the start of Month 6. You have been building it incrementally. Week 24 is the defense:
- Architecture review. Whiteboard the system. Defend each component choice.
- Performance review.
py-spyflame graph,vLLMthroughput numbers, end-to-end p50/p95/p99 latency. - Eval review. Show the eval harness, the regressions caught, the rollout policy.
- Cost review. $/request, $/user, projected $/month at 10x scale.
- Failure-mode review. What happens on: provider outage, vector-DB down, OOM in worker, agent runaway, prompt injection, tokenizer mismatch.
Pass = you can answer every question without hand-waving.
Month-6 Exit Criteria - and the Senior Bar¶
A graduate of this curriculum, in a senior-AI-engineer interview loop, should be able to:
- Whiteboard a RAG service for 1M docs / 1k QPS in 30 minutes, with explicit pgvector vs. qdrant trade-offs, hybrid retrieval, reranking, eval methodology, and cost projection.
- Diagnose a production agent that's burning $1k/hr by reading traces, identifying the runaway loop, and shipping a fix with caps and a kill switch - same day.
- Fine-tune a 7B model with LoRA on a domain dataset, evaluate offline, serve with vLLM, and roll out behind a canary in under a week.
- Defend the choice not to use Python for a given component - model routing, GPU scheduler, streaming proxy - when Go or Rust is the right answer.
That last bullet is the actual signal of seniority: you have stopped being a Python advocate and started being an engineer.
Appendix A - Production Hardening Toolkit¶
The hardening template you accumulate over 24 weeks. By the end, this should be a publishable python-project-template repo.
A.1 Project Layout¶
my-project/
├── pyproject.toml # PEP 621 metadata, ruff/pyright/pytest config
├── uv.lock # uv-managed lockfile
├── src/
│ └── my_project/
│ ├── __init__.py
│ ├── __main__.py
│ ├── domain/ # pure, no I/O
│ ├── adapters/ # DB, HTTP, LLM clients
│ ├── service/ # orchestration
│ └── entrypoints/ # FastAPI, CLI
├── tests/
│ ├── unit/
│ ├── integration/
│ └── property/
├── perf/ # pytest-benchmark suites
├── loadtest/ # k6 / locust
├── docs/
└── .github/workflows/ci.yml
A.2 Tooling Stack (canonical 2026)¶
| Concern | Tool |
|---|---|
| Build / dep | uv (primary), hatch (alt) |
| Lint + format | ruff |
| Type check | pyright (strict), mypy (secondary) |
| Test | pytest, pytest-asyncio, pytest-cov, pytest-xdist, pytest-benchmark, pytest-randomly |
| Property test | hypothesis |
| Mutation test | mutmut |
| Profile (CPU) | py-spy, scalene, pyinstrument |
| Profile (mem) | memray, tracemalloc |
| Security | bandit (via ruff S), pip-audit, safety |
| Docs | mkdocs-material + mkdocstrings |
| Pre-commit | pre-commit |
| Container | distroless or python:3.13-slim, multi-stage with uv pip install --system |
| Observability | structlog, prometheus_client, opentelemetry-* |
A.3 The make check Target¶
check: lint format-check typecheck test
lint:
ruff check src tests
format-check:
ruff format --check src tests
typecheck:
pyright src
test:
pytest -x --cov=src --cov-report=term-missing
bench:
pytest perf/ --benchmark-only
load:
k6 run loadtest/scenario.js
A.4 CI Matrix¶
- Python: 3.12, 3.13, 3.13t (free-threaded), 3.14 (when stable).
- OS: ubuntu-latest, macos-latest.
- Steps:
make check,make bench(non-failing, archived),mutmut run --max-children 4(weekly cron).
A.5 Profiling Recipes¶
- "Why is my service slow?" →
py-spy record -o flame.svg -- python -m my_project - "Where is my memory going?" →
memray run --live python -m my_project - "Is the event loop stalling?" → set
loop.slow_callback_duration = 0.05; watch logs. - "Why is import slow?" →
python -X importtime -c "import my_project" 2> import.log - "What's the GC doing?" →
gc.set_debug(gc.DEBUG_STATS)for an hour in staging.
A.6 Deployment Hardening¶
python -Ostrips asserts; never rely on asserts for security checks.PYTHONHASHSEED=randomis default in 3.3+; do not unset.PYTHONFAULTHANDLER=1for crash tracebacks on segfault from C extensions.PYTHONMALLOC=mallocif running undervalgrind.- Drop privileges (
gosu,setuid) before exec'ing the Python process. - Distroless or slim base; pin via SHA, not tag.
- One worker per CPU for CPU-light I/O-bound on stock CPython; one process for free-threaded once stable.
Appendix B - Build-From-Scratch Data Structures and Patterns¶
A working Python engineer should have implemented each of the following at least once, with pyright-clean types, pytest + hypothesis tests, and a pytest-benchmark micro-bench. This appendix sketches the minimal-viable design.
B.1 LRU Cache (with TTL)¶
When: function memoization, decoded-payload caches, embedding caches.
Design:
- OrderedDict from collections. move_to_end on hit; popitem(last=False) on evict.
- Optional TTL via (value, expiry_monotonic) tuples; lazy expiration on access.
- Concurrent variant: a threading.Lock (or asyncio.Lock for async) around mutations.
Lab: compare to functools.lru_cache and cachetools.TTLCache. Bench miss/hit costs.
B.2 Trie (Prefix Tree)¶
When: autocomplete, IP routing tables, tokenizer prefix lookup, dictionary spell-check.
Design:
- Node = dict[str, Node] + is_end: bool (+ optional payload).
- Insert/lookup O(len(key)); prefix iteration O(prefix + matches).
Lab: implement add, contains, iter_prefix. Bench against set for membership and against linear scan for prefix queries.
B.3 Bloom Filter¶
When: dedup at scale, "definitely-not-seen" checks before expensive lookups.
Design:
- bitarray of size m; k hash functions derived from one (mmh3 or hashlib) via double hashing.
- Sized for target false-positive rate p over expected n: m = -n*ln(p)/(ln(2)^2), k = m/n * ln(2).
Lab: empirically verify FP rate matches predicted. Compare memory to set for 10M items.
B.4 SPSC Ring Buffer (asyncio)¶
When: backpressure between a producer task and a consumer task, fixed-memory pipelines.
Design:
- list[T | None] of capacity N (power of two).
- head/tail integers; full when tail - head == N; empty when equal.
- asyncio.Event for "not full" and "not empty"; set() on transition.
Lab: compare to asyncio.Queue(maxsize=N). The stdlib version is fine; build this once to internalize the contract.
B.5 Bounded Concurrent Map¶
When: caches with tight memory budgets, multi-writer state.
Design:
- N shards of (threading.RLock, dict). Hash key, mod N, lock the shard.
- Eviction: per-shard LRU list.
Lab: compare to a single-lock dict and to a CAS-y "lock-free" attempt. The simple sharded design wins almost always.
B.6 Vector Index (Brute-Force, Then HNSW Wrapper)¶
When: nearest-neighbor over embeddings.
Design - brute force:
- np.ndarray of shape (N, D), L2-normalized.
- Query: (N, D) @ (D,) dot product, np.argpartition for top-K.
Design - HNSW wrapper:
- Wrap hnswlib.Index with a typed Pythonic API.
- Persist with index.save_index(path).
Lab: Build both. Verify recall@10 vs. brute-force ground truth. Plot recall/QPS trade-off.
B.7 Token Bucket Rate Limiter (asyncio)¶
When: client-side rate limiting against an LLM API.
Design:
- tokens: float, last_refill: float (monotonic).
- On acquire(n): refill (now - last_refill) * rate, cap at capacity. If tokens >= n, deduct and return; else await asyncio.sleep for the deficit.
Lab: ensure bursts don't exceed capacity; ensure long-run average matches rate. Compare to aiolimiter.
B.8 Circuit Breaker¶
When: protecting downstream services from cascading failure.
Design: - States: CLOSED (normal), OPEN (fail fast), HALF_OPEN (probe). - Counters: consecutive failures threshold, reset timeout. - On call: if OPEN, raise immediately; if HALF_OPEN, allow one probe.
Lab: integrate with the LLMClient from week 21. Verify behavior under simulated 503 storms.
B.9 Async Worker Pool with Backpressure¶
When: ingestion pipelines, batch-embedding workloads.
Design:
- Bounded asyncio.Queue, N worker tasks consuming, producer awaits put.
- TaskGroup owns the workers; cancellation cleanly drains.
- Each worker: try/except around the unit of work; metrics per outcome.
Lab: process 1M items at a controlled QPS without OOM. Tune N and queue size.
B.10 Domain Patterns Worth Building¶
- Repository + UnitOfWork over SQLite, with an in-memory fake for tests.
- Result type (
Ok[T] | Err[E]) for code where exceptions obscure flow. Don't over-apply - Python has exceptions for a reason. - Saga / state machine for multi-step durable workflows. State table in Postgres + idempotency keys.
- Outbox pattern for reliable event publishing alongside DB writes.
Appendix C - Deep-Dive Session: CPython Internals and the AI Runtime Stack¶
This is the single-sit deep dive the curriculum promises. Schedule a full day (8 hours, with breaks) at the end of Month 3 (after the runtime chapter) and re-read it at the end of Month 6. The goal: see clearly through every layer between print("hello") and the silicon.
The format is six "stations." Each station has: what to read, what to run, what you should be able to explain afterward.
Station 1 - From python foo.py to a Frame on the Stack (90 min)¶
Read:
- Python/pythonrun.c::_PyRun_SimpleFileObject - the entry path.
- Python/compile.c (skim) - AST → bytecode.
- Include/internal/pycore_frame.h - frame layout in 3.11+.
- Python/ceval.c::_PyEval_EvalFrameDefault - the interpreter loop.
Run:
python -c "
import dis, sys
def f(x): return x*x + 1
dis.dis(f, adaptive=True)
print(f.__code__.co_consts, f.__code__.co_names, f.__code__.co_varnames)
"
Explain afterwards:
- Why LOAD_FAST is faster than LOAD_GLOBAL.
- What "specialization" means in PEP 659 and how to observe it.
- The lifecycle of a frame object - when it's allocated, when it's freed, why exception tracebacks pin frames.
Station 2 - Memory: Refcount, Cyclic GC, pymalloc, Arenas (75 min)¶
Read:
- Include/object.h - PyObject header.
- Objects/obmalloc.c - the small-object allocator.
- Modules/gcmodule.c / Python/gc.c - the cyclic GC.
Run:
import sys, gc, tracemalloc
tracemalloc.start()
xs = [object() for _ in range(10_000)]
print(sys.getsizeof(xs), sum(sys.getsizeof(x) for x in xs))
print(gc.get_count(), gc.get_threshold())
Explain:
- Why del xs deterministically frees memory but gc.collect() is needed for cycles.
- Why __slots__ saves ~40% per instance.
- The interaction between refcounts and the free-threaded build's "biased reference counting."
Station 3 - The GIL and Its Successors (60 min)¶
Read:
- Python/ceval_gil.c - the GIL implementation.
- PEP 703 (no-GIL) and PEP 684 (per-interpreter GIL).
- The "biased reference counting" paper (Choi et al.).
Run:
Reproduce the prime-counting benchmark from Month 4, Week 11.
Explain:
- Why a NumPy-heavy ThreadPoolExecutor scales on stock CPython.
- What changes for pure-Python code under python3.13t.
- When subinterpreters beat both threads and processes.
Station 4 - asyncio Internals (75 min)¶
Read:
- Lib/asyncio/base_events.py - the loop's run_forever.
- Lib/asyncio/tasks.py - Task machinery, cancellation.
- Modules/_asynciomodule.c - the C accelerator (Tasks, Futures).
- The selectors module - epoll/kqueue glue.
Run:
import asyncio, sys
async def main():
loop = asyncio.get_running_loop()
loop.slow_callback_duration = 0.05
# deliberately stall
import time; time.sleep(0.2)
asyncio.run(main())
Watch the warning fire.
Explain:
- The exact path from await coro to a callback scheduled on the loop.
- How Task cancellation delivers CancelledError precisely at the next await.
- Why uvloop is faster (libuv, C event loop, fewer Python frames per I/O).
Station 5 - NumPy and the Buffer Protocol (60 min)¶
Read:
- PEP 3118 - buffer protocol.
- NumPy's numpy/core/src/multiarray/arrayobject.c (skim).
- The strides/shape/dtype model: NumPy User Guide → Internals.
Run:
import numpy as np
a = np.arange(12).reshape(3, 4)
print(a.strides, a.flags['C_CONTIGUOUS'])
b = a.T
print(b.strides, b.flags['C_CONTIGUOUS'])
mv = memoryview(a)
print(mv.format, mv.itemsize, mv.shape, mv.strides)
Explain:
- Why a transpose is O(1) - it changes strides, not data.
- Why a.T.copy() is sometimes necessary before passing to a C library.
- How the buffer protocol lets bytes, array.array, numpy.ndarray, and torch.Tensor share memory without copies.
Station 6 - PyTorch, Autograd, CUDA Streams (90 min)¶
Read:
- PyTorch internals (Edward Yang's blog).
- torch.autograd overview docs; the Function / Variable machinery.
- vLLM PagedAttention paper (sets up serving questions in Month 6).
Run:
import torch
x = torch.randn(4, 4, requires_grad=True)
y = (x * x).sum()
y.backward()
print(x.grad)
print(torch.cuda.is_available(), torch.cuda.current_stream() if torch.cuda.is_available() else None)
Explain:
- The autograd tape: forward builds the graph, backward walks it.
- Why .detach() and with torch.no_grad(): matter for inference latency.
- CPU↔GPU synchronization: when .item() blocks, why torch.cuda.synchronize() exists.
- How vLLM's PagedAttention reduces KV-cache fragmentation and why that translates directly to throughput.
Synthesis: The Mental Model¶
After this deep dive, hold this picture:
your code ──► AST ──► bytecode ──► eval loop (specializing) ──► C function ──► syscall / GPU kernel
│ │ │ │ │
└─ ruff/pyright └─ dis └─ py-spy └─ py-spy --native └─ nsys / nvprof
└─ GIL / free-threaded
└─ refcount / GC
Every performance question maps to one of those columns. Every correctness question maps to a boundary between two of them. That is what "senior" looks like.
Capstone Projects - Three Tracks¶
Pick one in week 21 and build incrementally through Month 6. Defend in week 24.
Every track must, by week 24, exhibit:
- pyright --strict clean.
- ruff clean with the curriculum's full rule set.
- pytest with ≥85% coverage and a hypothesis test suite.
- Structured logs, Prometheus metrics, OpenTelemetry traces, /healthz+/readyz.
- Containerized, deployed somewhere reachable, with a load test and a postmortem doc.
- A docs/architecture.md that another senior engineer could read in 30 minutes.
Track 1 - Production RAG Service¶
Pitch: a multi-tenant retrieval-augmented generation service over a 100k–1M-document corpus with hybrid search, reranking, streaming responses, and an honest eval harness.
Must-have:
- Ingestion pipeline: PDF/HTML/Markdown → chunks → embeddings → pgvector (or qdrant).
- Retrieval: dense + BM25 + RRF, then a cross-encoder reranker.
- Streaming SSE answers with citations linking back to source chunks.
- Per-tenant isolation (row-level filters, separate collections, or both).
- Eval harness (ragas or custom): faithfulness, answer relevance, context precision, retrieval recall@K. CI gate on regressions.
- Cost accounting per request; per-tenant rate limits; cache (prompt prefix + retrieval result).
Stretch: - Query rewriting (HyDE) and routing (small queries → small model). - Multimodal: support image-bearing PDFs via VLM-extracted captions. - Continuous learning: a feedback loop that promotes/demotes chunks based on user signal.
Track 2 - Agent Orchestration Platform¶
Pitch: a platform for running tool-using LLM agents reliably - with durable execution, observability, cost ceilings, and a permissions model.
Must-have:
- Agent definitions as Pydantic schemas: tools, system prompt, model, caps (turns, tokens, cost, wall-time).
- Durable execution: state machine in Postgres; recover after process kill.
- Tool sandbox: at minimum, an e2b-or-Docker-isolated bash tool with allowlist.
- Permissions model: per-agent, per-tenant tool access. Audit log.
- Observability: per-step spans, full trace replay in tests.
- Kill switch: a feature flag that immediately halts execution. End-to-end test for it.
- Replay testing: saved traces become regression tests.
Stretch: - Multi-agent orchestration (orchestrator + workers). - Evaluator-optimizer loops with automated prompt revision. - A small UI (Streamlit or Next.js) for inspecting runs.
Track 3 - Training & Serving Pipeline¶
Pitch: fine-tune a small open-weights model with LoRA, evaluate it, serve it with vLLM behind a FastAPI gateway, with autoscaling and continuous eval.
Must-have:
- Dataset prep: HuggingFace datasets, schema validation with Pydantic, dedup, deterministic train/val/test split with hash-based assignment.
- LoRA fine-tune (peft + trl) on a 7B–8B base. Document VRAM math.
- Offline eval: at minimum, a held-out set with task-appropriate metrics; ideally lm-eval-harness on relevant subsets.
- Serve: vLLM behind FastAPI gateway. Streaming, batching, structured output.
- Routing: cheap queries → small model; complex → large; A/B harness.
- Continuous eval: daily replay of N production samples (PII-scrubbed) against the new checkpoint; block promotion on regression.
- Rollout: shadow → canary 1% → 10% → 50% → 100% via feature flag.
Stretch: - DPO / KTO post-training on preference data. - Quantization (GPTQ/AWQ) and a serving comparison. - Multi-GPU serving with tensor parallelism.
Defense (Week 24)¶
Each track defends the same five reviews:
- Architecture review - whiteboard, defend each component.
- Performance review - flame graphs, throughput, p50/p95/p99.
- Eval review - harness, regressions caught, rollout policy.
- Cost review - $/request, $/user, projected $/month at 10x.
- Failure-mode review - provider outage, vector DB down, OOM, runaway agent, prompt injection, tokenizer mismatch.
The bar: every question gets a substantive answer without hand-waving. That is the senior signal.