04-Python for ML¶
Why this matters in the journey¶
You probably know Python. The question is whether you know it the way ML engineers use it: NumPy idioms, type hints in 2026 style, async, dataclasses/Pydantic, virtualenvs that don't fight you, and packaging that doesn't hurt. ML codebases have a particular flavor; closing the gap takes a focused week.
The rungs¶
Rung 01-Modern Python project hygiene¶
- What:
uv(orpip + venv),pyproject.toml,rufffor lint,mypyfor types,pytestfor tests. - Why it earns its place: A reproducible env is the difference between "training works" and "training works on your machine."
uvfrom Astral is the new standard in 2025+-fast and reliable. - Resource: Astral docs for
uv(search "uv astral docs"). Plus the officialpyproject.tomlreference. - Done when: You can
uv inita new project, add deps, and run a test in one minute flat.
Rung 02-NumPy fluency¶
- What: Array creation, broadcasting, indexing, slicing, vectorized ops,
axisarguments. - Why it earns its place: Every ML engineer reads NumPy code daily. PyTorch tensor APIs are NumPy-shaped on purpose.
- Resource: Python Data Science Handbook (Jake VanderPlas, free online), chapter 2-NumPy. Plus the official NumPy "100 exercises" repo (search "100 numpy exercises").
- Done when: Without docs you can: create a 5×5 random array, normalize each row, compute pairwise Euclidean distances between rows of two arrays.
Rung 03-Broadcasting deeply¶
- What: Operations between arrays of different shapes "broadcast" along compatible axes.
- Why it earns its place: Broadcasting bugs are the #1 silent error in ML code. Either you understand it or you debug forever.
- Resource: NumPy docs page on broadcasting (search "numpy broadcasting"). Plus implement attention by hand using only broadcasting and matmul.
- Done when: Given two arrays of shapes
(B, N, D)and(D,), you can predict the output shape of their sum without running it.
Rung 04-Pandas for data wrangling¶
- What: DataFrames, groupby, merge, melt/pivot,
.applypatterns. - Why it earns its place: Eval data, training data, evaluation reports-all flow through Pandas (or Polars). You'll use it for analysis even if your training pipeline doesn't.
- Resource: Python Data Science Handbook chapter 3. Plus the Pandas "10 minutes to pandas" tutorial.
- Done when: You can read a CSV, filter to a subset, group by a column, compute means, and plot the result.
Rung 05-Polars (the modern alternative)¶
- What: Polars is a Rust-backed DataFrame library-much faster than Pandas, with a cleaner expression API.
- Why it earns its place: New ML codebases often default to Polars now. Learn at least the basics.
- Resource: Polars official "Getting started" guide (search "polars getting started").
- Done when: You can do the Pandas exercise in Polars.
Rung 06-Type hints (the 2026 way)¶
- What:
list[int],dict[str, Any],Optional[X],TypedDict,Protocol,NewType. Plusmypyto enforce. - Why it earns its place: Production ML code uses types. Pydantic uses them. LLM library APIs (Anthropic, OpenAI SDKs) use them. Reading typed code is faster than reading untyped.
- Resource:
mypycheatsheet (search "mypy cheatsheet"). Plus Pydantic docs. - Done when: You can type-annotate a non-trivial function and have
mypy --strictpass.
Rung 07-Pydantic and dataclasses¶
- What: Structured data containers with validation.
dataclassfor simple records; Pydantic for runtime-validated, JSON-friendly data. - Why it earns its place: LLM tool-use, structured outputs, eval datasets, agent state-all are Pydantic models. This is the workhorse of LLM application engineering.
- Resource: Pydantic v2 docs (search "pydantic docs"). Plus the FastAPI tutorial as a worked example.
- Done when: You can define a
class IncidentReport(BaseModel): ...with nested fields, JSON-serialize it, and use it as the schema for an LLM structured output call.
Rung 08-async / await¶
- What: Concurrency primitives for I/O-bound tasks.
asyncio.gather,async for,aiohttp. - Why it earns its place: LLM API calls are I/O-bound and slow. Batching with async is how you keep an evaluation harness fast. Agentic systems are full of concurrent tool calls.
- Resource: Real Python's "Python Async Features" article (search "real python async features"). Plus the official
asynciodocs. - Done when: You can write a function that fires 100 LLM calls concurrently with a semaphore for rate limiting.
Rung 09-Streaming and generators¶
- What: Generators (
yield), async generators (async for), iterating over a stream. - Why it earns its place: LLM streaming responses are async generators. Token streaming, server-sent events, partial results-all use this pattern.
- Resource: Fluent Python (Luciano Ramalho) chapters on iteration and generators. Plus Anthropic SDK streaming examples.
- Done when: You can consume a streamed LLM response and accumulate tokens into a final string.
Rung 10-Testing and debugging ML code¶
- What:
pytestpatterns,pytest-snapshotfor golden tests,pdb/ipdbfor debugging, shape assertions. - Why it earns its place: ML bugs are silent. Tests that pin shapes and behavior are the only safety net.
- Resource:
pytestofficial docs. Plus the "Property-based testing with Hypothesis" intro for advanced cases. - Done when: You have a test suite for one of your build projects with at least one shape assertion test and one snapshot test.
Minimum required to leave this sequence¶
- Spin up a `uv - managed project in 60 seconds.
- NumPy: implement attention from scratch with only broadcasting + matmul.
- Pandas: read, group, aggregate, plot.
- Type-hinted Python passing
mypy --strict. - Pydantic models for an LLM structured output.
- Async function that batches 100 LLM API calls.
Going further¶
- Fluent Python (Ramalho)-read cover to cover when you have time.
- Effective Python (Brett Slatkin)-short and high-density tips.
- Astral's
uvandruffblog-keep up with tooling evolution.
How this sequence connects to the year¶
- Months 1–3: NumPy and basic Python keep you unblocked while doing the math sequences.
- Month 4 onwards: Pydantic, async, types, and testing become daily tools.
- Month 6: async + streaming + Pydantic are the tooling underneath any LLM app and eval harness you build.