06-Classical ML¶
Why this matters in the journey¶
You can do modern AI without classical ML, but you can't do it well. The discipline of train/val/test splits, baseline-first thinking, regularization, bias-variance tradeoff, ROC curves, and feature engineering-these all show up in eval design, fine-tuning data curation, and reading literature. Skipping classical ML is the most common reason "AI engineers" plateau as prompt tinkerers.
The rungs¶
Rung 01-Supervised learning framing¶
- What: Given
(x, y)pairs, learnf(x) → y. Classification vs regression. Train/val/test split. - Why it earns its place: This is the framing under every model, including LLMs (next-token prediction is supervised on shifted text).
- Resource: Andrew Ng's ML Specialization Course 1 (Coursera) or fast.ai lessons 1–2.
- Done when: You can explain the difference between train, val, and test, and why each exists.
Rung 02-Linear regression¶
- What: Fit a line (or hyperplane)
y = wᵀx + bminimizing mean squared error. - Why it earns its place: Simplest possible model-everything else is generalizations of this. It's also the perfect exercise for tying together calculus, linear algebra, and code.
- Resource: Implement from scratch in NumPy. Reference: Andrew Ng course 1, week 1.
- Done when: You can derive the closed-form solution
(XᵀX)⁻¹XᵀyAND implement gradient-descent linear regression.
Rung 03-Logistic regression and the sigmoid¶
- What: A linear model squashed through sigmoid for binary classification. Trained with cross-entropy.
- Why it earns its place: First encounter with cross-entropy as MLE. The "neuron" you'll see in NN intros is a logistic regression.
- Resource: Andrew Ng course 1, week 3. Implement from scratch.
- Done when: You can derive the gradient of the binary cross-entropy w.r.t. weights.
Rung 04-Softmax regression (multiclass)¶
- What: Logistic regression generalized to K classes. Softmax converts logits to a probability distribution.
- Why it earns its place: Softmax is the output of every transformer LM. Cross-entropy + softmax is the LLM training objective.
- Resource: Andrew Ng + the "softmax + cross-entropy" derivation in Goodfellow chapter 6.
- Done when: You can implement softmax + cross-entropy in NumPy and explain why their combined gradient is
softmax(z) − y.
Rung 05-Bias-variance tradeoff and overfitting¶
- What: A model can fail by being too simple (high bias) or too complex (high variance). Train vs val gap diagnoses which.
- Why it earns its place: Diagnosing training is 80% of ML practice. Eval gap analysis is the same skill applied to LLM evals.
- Resource: Andrew Ng's ML diagnostic lectures. Plus The Hundred-Page Machine Learning Book (Burkov) chapter 5.
- Done when: Given a learning curve, you can diagnose under- vs over-fitting.
Rung 06-Regularization (L1, L2, dropout, early stopping)¶
- What: Penalties on weights or activations that prevent overfitting.
- Why it earns its place: Every transformer training run uses weight decay (= L2). Dropout is in many architectures. Early stopping is a default discipline.
- Resource: Deep Learning (Goodfellow) chapter 7.
- Done when: You can explain why L2 regularization is equivalent to a Gaussian prior on weights.
Rung 07-Decision trees, random forests, gradient boosting¶
- What: Tree-based models-still SOTA on tabular data.
- Why it earns its place: You'll be tempted to use neural nets for everything. Knowing when XGBoost is the right answer is a maturity marker.
- Resource: Andrew Ng course 2 + the XGBoost paper (arxiv.org/abs/1603.02754) skim.
- Done when: You can train an XGBoost model on a tabular dataset and beat a simple neural net on it.
Rung 08-Evaluation metrics¶
- What: Accuracy, precision, recall, F1, AUC, log loss, calibration. Each measures something different.
- Why it earns its place: Picking the wrong metric is how teams optimize for the wrong thing. LLM evals are metric design problems in disguise.
- Resource: scikit-learn metrics docs. Plus The Hundred-Page ML Book eval chapter.
- Done when: Given a class-imbalanced problem, you can choose an appropriate metric and justify it.
Rung 09-Cross-validation and statistical comparison¶
- What: k-fold CV for stable estimates. Paired t-tests or bootstrap to compare models meaningfully.
- Why it earns its place: Comparing two LLM prompts on 50 examples and declaring a winner is how teams ship noise. CV discipline transfers directly.
- Resource: scikit-learn cross-validation docs. Plus Sebastian Raschka's "Model Evaluation, Model Selection, and Algorithm Selection" paper (arxiv.org/abs/1811.12808).
- Done when: You can defend a model comparison with proper variance estimates.
Rung 10-Feature engineering and data quality¶
- What: Cleaning, normalizing, encoding categoricals, dealing with missing values, leakage.
- Why it earns its place: "Better data > better model" applies at every level-including LLM fine-tuning datasets and RAG corpora.
- Resource: Kaggle's "Intermediate Machine Learning" course. Plus the AI Engineering (Huyen) chapter on data.
- Done when: You can identify a data leakage bug in a contrived dataset.
Minimum required to leave this sequence¶
- Implement linear, logistic, and softmax regression from scratch.
- Diagnose under- vs overfitting from a learning curve.
- Train an XGBoost model on a tabular dataset.
- Pick a metric appropriate to a problem and defend the choice.
- Set up k-fold CV with a statistical comparison.
Going further¶
- The Elements of Statistical Learning (Hastie, Tibshirani, Friedman; free PDF)-the canonical reference. Hard but rewarding.
- Pattern Recognition and Machine Learning (Bishop)-older but still gold.
- Hands-On Machine Learning (Géron)-practical, sklearn + Keras.
How this sequence connects to the year¶
- Month 2: rungs 01–06 are the build-from-scratch month.
- Month 6: rungs 08–09 are the foundation for LLM eval rigor.
- Month 8: rung 10 (data quality) is the make-or-break of fine-tuning.