Skip to content

04-Python for ML

Why this matters in the journey

You probably know Python. The question is whether you know it the way ML engineers use it: NumPy idioms, type hints in 2026 style, async, dataclasses/Pydantic, virtualenvs that don't fight you, and packaging that doesn't hurt. ML codebases have a particular flavor; closing the gap takes a focused week.

The rungs

Rung 01-Modern Python project hygiene

  • What: uv (or pip + venv), pyproject.toml, ruff for lint, mypy for types, pytest for tests.
  • Why it earns its place: A reproducible env is the difference between "training works" and "training works on your machine." uv from Astral is the new standard in 2025+-fast and reliable.
  • Resource: Astral docs for uv (search "uv astral docs"). Plus the official pyproject.toml reference.
  • Done when: You can uv init a new project, add deps, and run a test in one minute flat.

Rung 02-NumPy fluency

  • What: Array creation, broadcasting, indexing, slicing, vectorized ops, axis arguments.
  • Why it earns its place: Every ML engineer reads NumPy code daily. PyTorch tensor APIs are NumPy-shaped on purpose.
  • Resource: Python Data Science Handbook (Jake VanderPlas, free online), chapter 2-NumPy. Plus the official NumPy "100 exercises" repo (search "100 numpy exercises").
  • Done when: Without docs you can: create a 5×5 random array, normalize each row, compute pairwise Euclidean distances between rows of two arrays.

Rung 03-Broadcasting deeply

  • What: Operations between arrays of different shapes "broadcast" along compatible axes.
  • Why it earns its place: Broadcasting bugs are the #1 silent error in ML code. Either you understand it or you debug forever.
  • Resource: NumPy docs page on broadcasting (search "numpy broadcasting"). Plus implement attention by hand using only broadcasting and matmul.
  • Done when: Given two arrays of shapes (B, N, D) and (D,), you can predict the output shape of their sum without running it.

Rung 04-Pandas for data wrangling

  • What: DataFrames, groupby, merge, melt/pivot, .apply patterns.
  • Why it earns its place: Eval data, training data, evaluation reports-all flow through Pandas (or Polars). You'll use it for analysis even if your training pipeline doesn't.
  • Resource: Python Data Science Handbook chapter 3. Plus the Pandas "10 minutes to pandas" tutorial.
  • Done when: You can read a CSV, filter to a subset, group by a column, compute means, and plot the result.

Rung 05-Polars (the modern alternative)

  • What: Polars is a Rust-backed DataFrame library-much faster than Pandas, with a cleaner expression API.
  • Why it earns its place: New ML codebases often default to Polars now. Learn at least the basics.
  • Resource: Polars official "Getting started" guide (search "polars getting started").
  • Done when: You can do the Pandas exercise in Polars.

Rung 06-Type hints (the 2026 way)

  • What: list[int], dict[str, Any], Optional[X], TypedDict, Protocol, NewType. Plus mypy to enforce.
  • Why it earns its place: Production ML code uses types. Pydantic uses them. LLM library APIs (Anthropic, OpenAI SDKs) use them. Reading typed code is faster than reading untyped.
  • Resource: mypy cheatsheet (search "mypy cheatsheet"). Plus Pydantic docs.
  • Done when: You can type-annotate a non-trivial function and have mypy --strict pass.

Rung 07-Pydantic and dataclasses

  • What: Structured data containers with validation. dataclass for simple records; Pydantic for runtime-validated, JSON-friendly data.
  • Why it earns its place: LLM tool-use, structured outputs, eval datasets, agent state-all are Pydantic models. This is the workhorse of LLM application engineering.
  • Resource: Pydantic v2 docs (search "pydantic docs"). Plus the FastAPI tutorial as a worked example.
  • Done when: You can define a class IncidentReport(BaseModel): ... with nested fields, JSON-serialize it, and use it as the schema for an LLM structured output call.

Rung 08-async / await

  • What: Concurrency primitives for I/O-bound tasks. asyncio.gather, async for, aiohttp.
  • Why it earns its place: LLM API calls are I/O-bound and slow. Batching with async is how you keep an evaluation harness fast. Agentic systems are full of concurrent tool calls.
  • Resource: Real Python's "Python Async Features" article (search "real python async features"). Plus the official asyncio docs.
  • Done when: You can write a function that fires 100 LLM calls concurrently with a semaphore for rate limiting.

Rung 09-Streaming and generators

  • What: Generators (yield), async generators (async for), iterating over a stream.
  • Why it earns its place: LLM streaming responses are async generators. Token streaming, server-sent events, partial results-all use this pattern.
  • Resource: Fluent Python (Luciano Ramalho) chapters on iteration and generators. Plus Anthropic SDK streaming examples.
  • Done when: You can consume a streamed LLM response and accumulate tokens into a final string.

Rung 10-Testing and debugging ML code

  • What: pytest patterns, pytest-snapshot for golden tests, pdb/ipdb for debugging, shape assertions.
  • Why it earns its place: ML bugs are silent. Tests that pin shapes and behavior are the only safety net.
  • Resource: pytest official docs. Plus the "Property-based testing with Hypothesis" intro for advanced cases.
  • Done when: You have a test suite for one of your build projects with at least one shape assertion test and one snapshot test.

Minimum required to leave this sequence

  • Spin up a `uv - managed project in 60 seconds.
  • NumPy: implement attention from scratch with only broadcasting + matmul.
  • Pandas: read, group, aggregate, plot.
  • Type-hinted Python passing mypy --strict.
  • Pydantic models for an LLM structured output.
  • Async function that batches 100 LLM API calls.

Going further

  • Fluent Python (Ramalho)-read cover to cover when you have time.
  • Effective Python (Brett Slatkin)-short and high-density tips.
  • Astral's uv and ruff blog-keep up with tooling evolution.

How this sequence connects to the year

  • Months 1–3: NumPy and basic Python keep you unblocked while doing the math sequences.
  • Month 4 onwards: Pydantic, async, types, and testing become daily tools.
  • Month 6: async + streaming + Pydantic are the tooling underneath any LLM app and eval harness you build.

Comments