02 - Math you actually need (and what you don't)¶

What this session is¶

The math debate. What's hype, what's required, what's gatekeeping, what's optional. With specific resources for each.

The honest answer¶

You can be a productive AI engineer with high-school math plus a working understanding - not formal mastery - of four topics: linear algebra, calculus (just gradients), probability (just expectations), and basic statistics (just sampling, distributions).

Anyone who tells you you need to grind Strang, Spivak, and Casella before writing PyTorch is wrong, or preparing you for a different job (research, not engineering).

What you need, ranked by ROI¶

Tier 1: load-bearing¶

Matrix multiplication, dot products. What shape times what shape gives what shape. You'll do this every day.
Gradients (one-variable, multi-variable, intuitively). What backprop is updating. Why learning rates matter. Why gradient explosion/vanishing happens.
Probability distributions, basic. Normal, uniform, categorical. Sampling vs argmax.
Logarithms, exponents, softmax. Why log-likelihood loss looks weird. Why softmax over logits.

Tier 2: helpful¶

Eigenvalues / SVD, conceptually. Used in PCA, embeddings, attention analysis. You don't need to compute by hand.
Information theory basics. Entropy, cross-entropy, KL divergence. Shows up in loss functions and evaluation.
Basic statistics. Variance, expectation, central limit theorem. For understanding evaluation noise.

Tier 3: nice-to-have¶

Convex optimization. Useful when reading papers. Not blocking.
Measure-theoretic probability. Required for research; not for engineering.
Tensor calculus / differential geometry. Required for very specific specializations (e.g., diffusion models theory).

What you don't need¶

Rigorous epsilon-delta calculus proofs.
Real analysis.
PDEs from scratch.
Group theory.
The full Strang course unless you enjoy it.

What "working understanding" means¶

You can:

Read a paper's equations and parse the shape of what's happening, even if you couldn't reproduce the derivation.
Know when a derivative would be near zero (saturating activations, etc).
Sanity-check that a probability distribution sums to one.
Compute matrix shapes without paper.

You can't (and don't need to):

Derive backprop on a paper napkin.
Prove convergence properties.
Write your own optimizer from scratch.

Resources, ranked¶

In order, by efficiency for the engineering track:

3Blue1Brown's "Essence of Linear Algebra" + "Essence of Calculus" YouTube series. ~6 hours total. Best ROI on this list. Watch even if you "know" linear algebra.
fast.ai's "Practical Deep Learning" course. Math taught just in time, alongside code. Many people learn the math better here than in standalone math courses, because it's grounded.
MIT 18.06 (Strang) Linear Algebra lectures. If you want depth. Watch at 1.5x. Skip the homework.
MIT 6.041 Probability (Tsitsiklis). Same deal. Watch the first 8 lectures, skip the rest unless interested.
The Deep Learning Book (Goodfellow et al), chapters 2-4. Free online. The "math you need" chapters. Skim, don't grind.

What I'd skip¶

Long-form Coursera ML math specializations. Slow, repetitive, will-of-the-living.
"Mathematics for Machine Learning" book. Fine but encyclopedic; you'll bog down.
Khan Academy linear algebra. Too elementary; you'll be bored.

The pragmatic plan¶

Weeks 1-2: 3Blue1Brown linear algebra + calculus. 6 hours total.

Weeks 3-4: First 4 chapters of the Deep Learning book. Skim, take notes on confusion points, move on.

Ongoing: When a paper or library confuses you, look up the one math concept you need. Wikipedia is fine. Don't preemptively learn things.

That's it. You can always come back. Math-first is a trap that costs people 3-6 months.

What you might wonder¶

"But I keep hearing AI is 'all math.'" By people who do research. Engineers use frameworks that abstract the math. The math you need to understand is to debug what your model is doing, not to derive new architectures.

"What if I want to do research?" Different career. Different roadmap. Read Picking a specialization. Research roles usually require a PhD or equivalent published work.

"What if I'm bad at math?" "Bad at math" usually means "didn't have a good teacher" or "stopped before things got interesting." 3Blue1Brown will likely change your relationship with the material. Try it before deciding.

Done¶

Know what's required, helpful, optional, and gatekeeping.
Have a specific 4-week plan.
Are not going to do a 6-month math detour.

Next: The Python + Linux baseline →