Saltar a contenido

Deep Dive 14-Future-Proofing the Curriculum: A Durability Audit

"The first principle of any field with a half-life shorter than your career is to invest most of your time in the things that don't have a half-life."

This chapter is the operating manual for the next 3-5 years of this curriculum. The previous thirteen deep dives, and the seventeen sequences they sit underneath, are a snapshot of what an applied AI engineer should know in 2026. Snapshots age. The job is to keep the curriculum-and the career it produces-current without rewriting it from scratch every six months when a new framework or model family ships.

The structure of this chapter:

  1. A durability framework for tagging every learning artifact with an expected half-life.
  2. A per-sequence audit applying that framework to all 17 sequences.
  3. Refresh cadences-daily, weekly, monthly, quarterly, yearly procedures.
  4. Tripwires that signal the curriculum is broken and needs structural change.
  5. Field-velocity sources to track, named with durable selection criteria.
  6. Milestones at 6, 12, and 24 months, plus pivot signals.
  7. Spine investments that survive pivots, and ephemeral investments that don't.
  8. Cross-curriculum integration with the surrounding stack (Rust, Go, Linux, containers, Kubernetes, AI systems).
  9. Multi-year scenarios (2027, 2028, 2029) and how to react to each.
  10. An annual audit checklist, the honest meta-question, anti-patterns, the reciprocal of feeding learnings back, and yearly exercises.

This chapter is shorter than the technical deep dives. It is also, in expectation, the highest leverage one. Get the durability instinct right and the rest of the curriculum compounds. Get it wrong and you spend the next three years re-learning LangChain APIs.


1. The Durability Framework: Three Tiers

Every concept, technique, tool, vendor, paper, and link in this curriculum has a half-life. Estimate it. Tag it. Spend study time accordingly.

1.1 The three tiers

Tier Half-life Examples Refresh cost when stale
Spine 10+ years Linear algebra, calculus, probability, statistical evaluation discipline, distributed-systems thinking, transformer fundamentals (self-attention, residual streams), backprop, the bias-variance decomposition Negligible-once internalized, stays.
Stable 4-7 years Specific architectures (transformer block, RoPE, GQA, MoE), specific algorithms (BM25, FlashAttention v1/v2/v3, DPO, LoRA), evaluation paradigms (LLM-as-judge, contrastive eval), serving primitives (paged attention, continuous batching), OpenTelemetry semantics A focused weekend; structures and theorems carry over, only details rotate.
Ephemeral 1-3 years Framework versions (LangChain v0.3, DSPy v2.x, vLLM 0.6 vs 0.7), vendor pricing, model names (Claude 3.x, GPT-4.x, Llama 3.x), specific tool-use JSON formats, current SOTA benchmark scores, specific dashboards in Datadog/Grafana Continuous-measured in days/weeks, not months.

Half-lives are estimates, not commitments. Think of them as decay rates: spine knowledge loses 7% of its relevance per decade; ephemeral knowledge loses 50% in eighteen months. Tag accordingly.

1.2 Why tag at all?

Three reasons.

  • Allocation. Without tags you allocate study time uniformly. With tags you can deliberately push spine to ≥60% of total study time, stable to ~25%, ephemeral to ≤15%. The 60/25/15 split is not magic-it is a heuristic that pushes back against the gravitational pull of the news cycle, which always wants you to spend 90% of your time on ephemeral things.
  • Refresh efficiency. When you sit down for a quarterly audit, you want to re-read the ephemeral sections, not the linear algebra. Tags tell you what to skim, what to refresh, what to leave alone.
  • Honest inventory. When you say "I know transformers," you should be able to say what part you know is spine (the math), what is stable (the specific block design), and what is ephemeral (the variant a particular lab shipped last month). That precision survives interviews.

1.3 How to tag (operational)

In every sequence file, every "Going further" link, every code snippet, every named tool-annotate with [Spine], [Stable], or [Ephemeral]. When in doubt, downgrade-assuming a thing is ephemeral when it might be stable costs you a re-read; assuming it is spine when it is ephemeral costs you a wrong mental model.

Worked examples:

  • "The chain rule"-[Spine]. Was true for Newton, will be true forever.
  • "Residual connections"-[Spine]. As long as we use deep nets in any form, residuals are foundational.
  • "Mixture-of-Experts routing with top-2 gating"-[Stable]. The pattern endures; specific routing schemes will iterate.
  • "DPO loss formulation"-[Stable]. The math is durable; specific implementations and hyperparameters will iterate.
  • "LangChain RunnableSequence"-[Ephemeral]. Could be deprecated this year.
  • "Claude 3.7 Sonnet's tool-use JSON"-[Ephemeral]. Vendor-specific, version-specific.
  • "OpenTelemetry GenAI semantic conventions"-currently [Stable] once stabilized; tag as [Ephemeral] while the spec is in draft.

The tagging itself takes about 10-15 minutes per sequence. Do it once, refresh during audits.


2. Per-Sequence Durability Audit

Applying the framework to all 17 sequences. For each: spine content, stable content, ephemeral content, and refresh cadence.

# Sequence Dominant tier Refresh cadence Notes
01 Linear Algebra Spine Never (review) Re-derive, don't re-learn.
02 Calculus & Optimization Spine Never (review) Same.
03 Probability & Statistics Spine Never (review) Same.
04 Python for ML Stable Annual Python evolves slowly; toolchain rotates.
05 PyTorch Stable + Ephemeral edges Annual Compile, FSDP, dtypes shift.
06 Classical ML Spine Never (review) Tree boosting, regularization, calibration.
07 Deep Learning Fundamentals Spine + Stable Biennial Backprop spine; norm/init details stable.
08 Transformers Stable + Ephemeral Annual Architecture stable; variants rotate.
09 LLM App Engineering Ephemeral dominant Quarterly Most volatile sequence in the curriculum.
10 RAG Stable + Ephemeral Quarterly Retrieval theory stable; tools rotate.
11 Agents Stable + Ephemeral Quarterly Patterns stable; frameworks volatile.
12 Evaluation Systems Spine + Ephemeral Semi-annual Statistics spine; tools rotate.
13 LLM Observability Stable + Ephemeral Semi-annual OTel semantics stabilizing; vendors rotate.
14 Inference & Serving Stable + Ephemeral Semi-annual Algorithms stable; runtimes rotate.
15 Fine-tuning Stable + Ephemeral Semi-annual LoRA/DPO stable; TRL APIs rotate.
16 Distributed Training Stable + Ephemeral Annual Math stable; FSDP/DeepSpeed APIs rotate.
17 Capstone & Career Mixed Annual Job market data + personal artifacts.

2.1 Sequences 01-03: Mathematical Spine (Linear Algebra, Calculus, Probability)

  • Spine: vector spaces, eigendecomposition, SVD, gradients, chain rule, conditional probability, expectation, hypothesis testing, Bayes' rule, the central limit theorem.
  • Stable: numerical methods (LU, QR), specific optimizer algorithms (Adam, Lion).
  • Ephemeral: practically nothing.
  • Refresh procedure: don't refresh, exercise. Annually solve 5 problems from a graduate-level text (Boyd's Convex Optimization, Bishop's Pattern Recognition and Machine Learning, MacKay's Information Theory, Inference, and Learning Algorithms). If you cannot, you have spine erosion and need a 2-week deep refresh.
  • What can never break: the math. If a paper from 2032 confuses you, the entry point is always re-deriving the relevant chunk on paper.

2.2 Sequence 04: Python for ML

  • Spine: language design instincts (mutability, scope, references), data-structure complexity intuition.
  • Stable: NumPy, pandas, type hints, packaging conventions (pyproject.toml, virtual environments).
  • Ephemeral: specific Polars/PyArrow versions, current uv/poetry/rye state, the specific pre-commit toolchain.
  • Refresh procedure: annually skim the Python release notes for the past year. Replace any deprecated idioms in your sequence examples. Re-evaluate the package manager (uv replaced poetry/pip-tools for many; whatever replaces uv will arrive).
  • Tripwire: if pyproject.toml examples in your sequence break in a clean install, the toolchain has rotated.

2.3 Sequence 05: PyTorch

  • Spine: tensor abstraction, autograd as reverse-mode AD, the computational graph mental model.
  • Stable: nn.Module, dataloaders, distributed primitives, dtype selection (fp32/fp16/bf16/fp8).
  • Ephemeral: torch.compile semantics, FSDP-2 vs FSDP-1, the current SDPA backend selection logic, specific CUDA/ROCm/MPS quirks.
  • Refresh procedure: annual. Read the past year's PyTorch release notes (1-2 hours). Re-run your sequence's notebooks against the latest stable version; fix breakages; tag what changed.
  • Tripwire: torch.compile semantics shift meaningfully roughly every release. If your fine-tuning sequence uses torch.compile and the example diverges, refresh.

2.4 Sequence 06: Classical ML

  • Spine: bias-variance, regularization, cross-validation, calibration, the supervised-unsupervised-RL trichotomy, the fundamental theorem of statistical learning.
  • Stable: gradient boosting (XGBoost, LightGBM, CatBoost), SVMs, random forests, k-means, GMMs, linear/logistic regression closed forms.
  • Ephemeral: specific scikit-learn API tweaks (which are rare and well-deprecated).
  • Refresh procedure: don't. Classical ML is the most stable sequence after the math sequences. The only thing that rotates is whether new tree-boosting libraries appear; check annually.
  • Why this matters for 2026-2030: when LLM systems fail and you fall back on classifiers/embeddings/calibrators, this is the layer you reach for. It is the unglamorous load-bearing layer of applied AI.

2.5 Sequence 07: Deep Learning Fundamentals

  • Spine: backpropagation, gradient descent, the universal approximation theorem (intuition), loss landscapes, overfitting, the vanishing/exploding gradient story.
  • Stable: Adam/AdamW optimizer math, LayerNorm/RMSNorm/BatchNorm, dropout, weight initialization (Kaiming, Xavier), residual connections.
  • Ephemeral: which specific norm-init combinations are en vogue this year.
  • Refresh procedure: biennial. Once every two years, re-read a contemporary deep-learning textbook chapter on optimization to catch incremental improvements (e.g., Lion, Sophia, Muon-style optimizers). Update the sequence's optimizer section if the new method is broadly adopted.
  • Tripwire: if a foundation-model lab paper says "we trained with optimizer X" and you've never heard of X, refresh.

2.6 Sequence 08: Transformers

  • Spine: self-attention as differentiable retrieval, the residual stream, autoregressive language modeling, the Q/K/V abstraction, scaling laws (Chinchilla-style intuition), positional information as a design choice.
  • Stable: GQA/MQA, RoPE, FlashAttention v1/v2/v3, MoE basics, KV-cache mechanics, sliding-window attention.
  • Ephemeral: latest variants (whatever new positional encoding or attention variant a frontier lab ships this quarter), specific architectural deltas in the current Llama/Mistral/Qwen/Claude/GPT family.
  • Refresh procedure: annual. Read 3-5 new architecture papers from the past year; if any pattern is broadly adopted by 2+ frontier labs, add a section. State-space and hybrid models (Mamba, Jamba family, RWKV, hybrid SSM-attention) are worth tracking even if you do not adopt them yet.
  • Tripwire: a single non-attention architecture (Mamba-class, diffusion-LM-class) becomes the dominant choice for one of the major frontier labs. Then you rewrite a third of this sequence.

2.7 Sequence 09: LLM App Engineering

This is the most volatile sequence in the curriculum. Treat it accordingly.

  • Spine: the prompt/context/response abstraction, the retrieval-augmentation principle, separation-of-concerns between prompt, context, and program logic.
  • Stable: structured outputs (JSON schemas, constrained decoding), tool use as a function-calling pattern, prompt caching as a cost lever, the system/user/assistant role model.
  • Ephemeral: LangChain/LlamaIndex/DSPy/Haystack APIs, specific Anthropic/OpenAI/Google API shapes, model-specific prompt idioms ("think step by step" vs reasoning models), pricing.
  • Refresh procedure: quarterly. 90 minutes per quarter. Run the sequence's example projects against current SDKs; fix breakages; replace dead code paths; update model-name placeholders to neutral aliases (e.g., MODEL_FAST, MODEL_BIG) referenced in a single config table updated separately.
  • Defensive design: keep the concepts in the sequence, push vendor SDK code into a small set of adapter files. When the vendor changes, you replace the adapter, not the lesson.
  • Tripwire: more than 30% of the sequence's code examples fail on a clean install. Triggers full rewrite of the example projects.

2.8 Sequence 10: RAG

  • Spine: the retrieval/generation decomposition, recall vs precision tradeoffs, the role of evaluation in retrieval pipelines.
  • Stable: BM25, dense embeddings, hybrid retrieval, reranking (cross-encoders), chunking strategies, query rewriting, eval metrics (NDCG, MRR, recall@k), the failure modes taxonomy (missing retrieval, distracting retrieval, conflicting retrieval).
  • Ephemeral: specific embedding models (today: cohere-embed-v3, voyage-3, openai-text-embedding-3-large; tomorrow: something else), specific vector DBs (pgvector, Qdrant, Weaviate, LanceDB, Chroma-the menu rotates), specific reranker models.
  • Refresh procedure: quarterly. Re-run your eval harness against current embedding models; replace the named-defaults in code with current best-of-class; keep the evaluation methodology untouched.
  • Defensive design: the eval harness is the spine of this sequence. As long as the harness runs, the lesson survives any embedding-model or vector-DB churn.

2.9 Sequence 11: Agents

  • Spine: control flow as a design surface, the loop-with-tools mental model, the planner/executor decomposition, the failure-mode taxonomy (loops, drift, hallucinated tool calls).
  • Stable: ReAct, Reflexion, plan-and-execute, tool-use evaluation, the multi-agent communication patterns (orchestrator-worker, debate, consensus), state-machine-shaped agent design, the cost/latency/reliability tradeoff curve.
  • Ephemeral: specific frameworks (LangGraph, CrewAI, AutoGen, Letta, Pydantic AI, smolagents, OpenAI Agents SDK, Anthropic computer-use), specific protocol versions (MCP versioning), specific prompt idioms for tool use.
  • Refresh procedure: quarterly. The frameworks rotate fast; the patterns do not. Pick one framework as the worked example, but write the concepts framework-agnostically.
  • Tripwire: if the dominant frontier model ships native, opinionated agent infrastructure (memory, tool registry, planning) that absorbs 60%+ of what the sequence teaches as DIY, narrow this sequence to integration rather than DIY construction.

2.10 Sequence 12: Evaluation Systems

  • Spine: experiment design (treatment vs control, randomization), statistical significance, confidence intervals, sample-size estimation, the test-set hygiene principles, the data-leakage taxonomy.
  • Stable: LLM-as-judge methodology and its known biases (position, verbosity, self-preference), pairwise vs absolute scoring, golden-set construction, online vs offline eval, regression eval, slice-based eval, cost-aware eval.
  • Ephemeral: specific eval tooling (Inspect, Promptfoo, Braintrust, LangSmith, Phoenix, Ragas, DeepEval, OpenAI Evals), specific public benchmarks (MMLU, MMLU-Pro, GPQA, AIME, SWE-bench, ARC-AGI, etc.)-they get saturated and replaced.
  • Refresh procedure: semi-annually. Update the named tools; refresh the public-benchmark list; do not touch the statistical methodology section. The methodology is the durable IP.
  • Tripwire: if frontier-lab providers ship managed evaluation infrastructure that covers 80% of the use case, narrow your specialty to bespoke and adversarial eval that providers won't generalize.

2.11 Sequence 13: LLM Observability

  • Spine: the logs/metrics/traces trichotomy, sampling theory, the observability-as-debugging-substrate principle, distributed tracing fundamentals, RED/USE methodology.
  • Stable: OpenTelemetry, the GenAI semantic conventions (once they stabilize beyond draft), the trace/span/event hierarchy, structured logging, exemplars, the cost/cardinality tradeoff for metrics.
  • Ephemeral: specific vendors (Datadog, Honeycomb, Grafana, Phoenix, Langfuse, Helicone, LangSmith), specific dashboards, specific GenAI-OTel attribute names while the spec churns.
  • Refresh procedure: semi-annually. Track the GenAI semantic-conventions spec; update vendor examples; keep the OTel-as-substrate framing.
  • Why this is your specialty: this is the cleanest bridge between your SRE background and applied AI. The OTel knowledge from your Bamboo + Datadog plugin work is directly load-bearing here, and OTel itself is becoming spine-adjacent.

2.12 Sequence 14: Inference & Serving

  • Spine: the latency/throughput/cost trinity, the request/queue/batch decomposition, the prefill/decode asymmetry, GPU memory hierarchy intuition.
  • Stable: paged attention, continuous batching, speculative decoding, chunked prefill, prefix caching, KV-cache offloading, quantization (INT8, INT4, FP8) tradeoffs, the throughput-vs-TTFT tradeoff curve.
  • Ephemeral: vLLM/SGLang/TensorRT-LLM/TGI versions, specific kernel implementations, specific GPU model nuances (H100 vs B200 vs MI300 quirks).
  • Refresh procedure: semi-annually. Re-benchmark on current hardware/runtime; refresh version-pinned examples; keep the algorithmic explanations untouched.
  • Tripwire: if a fundamentally new serving paradigm appears (e.g., disaggregated prefill/decode becomes the default architecture rather than a research curiosity), update the architecture section.

2.13 Sequence 15: Fine-tuning

  • Spine: the supervised-finetuning / preference-optimization / RL-from-feedback distinction, the catastrophic-forgetting story, the data-quality-dominates-data-quantity principle.
  • Stable: LoRA/QLoRA math, DPO and its descendants (IPO, KTO, ORPO, SimPO), reward-model design, RLHF/RLAIF pipeline structure, evaluation of fine-tuned models.
  • Ephemeral: TRL/Axolotl/Unsloth/torchtune APIs, specific recipes for specific base models, current "best practice" hyperparameters, adapter-merging idioms.
  • Refresh procedure: semi-annually. Update recipes against a current base model (e.g., the open-weight family of the moment); keep the algorithmic descriptions.
  • Tripwire: if frontier-lab fine-tuning APIs (managed SFT/DPO) absorb the majority of practical fine-tuning, the sequence narrows toward "when DIY is justified and how to evaluate it."

2.14 Sequence 16: Distributed Training

  • Spine: the data/tensor/pipeline/expert/sequence parallelism taxonomy, the communication-vs-computation tradeoff, the memory-vs-recompute tradeoff (gradient checkpointing), Amdahl's law applied to training.
  • Stable: ZeRO stages 1/2/3, FSDP semantics, NCCL collectives intuition, mixed-precision training, gradient accumulation, the Chinchilla-scaling intuition, large-batch training stability tricks.
  • Ephemeral: FSDP-1 vs FSDP-2 specifics, DeepSpeed config schema, Megatron-LM config, specific cloud-provider GPU-cluster idioms.
  • Refresh procedure: annual. Most readers will not train from scratch; the spine + stable carries them through reading frontier-lab tech reports. Update the API examples annually.

2.15 Sequence 17: Capstone & Career

  • Mixed. Job-market signals are ephemeral; portfolio principles are stable; build-in-public habits are spine-adjacent.
  • Refresh procedure: annual. Survey 10-20 job postings in your specialty; compare vocabulary to your sequences; note vocabulary drift; update the "tools to be fluent in" list.

3. The Refresh Cadence Playbook

The cadence is the discipline. Without scheduled refreshes the curriculum decays silently. With them, decay is bounded.

Cadence Time budget Activity
Daily 15 min Skim arXiv-sanity / Latent Space backlog / curated Twitter list.
Weekly 90 min Read one paper deeply; write a 200-word note.
Monthly 60 min Re-read one sequence's "Going further"; check links; replace dead resources.
Quarterly 4 hours Full audit of one quarter's sequences.
Semi-annually 1 day Refresh observability/eval/serving/fine-tuning sequences.
Yearly 2 days Re-evaluate specialty bet; redo durability audit; rewrite roadmap.

3.1 Daily-15 min

  • Skim arXiv-sanity (or arXiv cs.LG/cs.CL if sanity is offline; have a backup).
  • Skim r/LocalLLaMA top-of-day.
  • Skim a small Twitter/X list (10-15 accounts max-see §5).
  • Skim HN /news filtered for AI/ML.
  • Output: zero. Daily skim is for pattern recognition, not artifact production. If something genuinely interesting appears, drop a one-liner in a daily_log.md.

The discipline: do this before opening Slack or email. The cost of letting it slip is that you become a lagging indicator of your field.

3.2 Weekly-90 min

Pick one paper from the daily-log shortlist. Read it carefully. Write a 200-word note answering:

  1. What is the claim?
  2. What is the evidence?
  3. What is novel vs incremental?
  4. Does this affect any sequence? Which one? How?
  5. Tag: [Spine] / [Stable] / [Ephemeral].

Store the notes in a single file (paper_notes.md) so you can search them. After a year you have ~50 notes-that is a portfolio artifact in itself.

3.3 Monthly-60 min

Pick one sequence by rotation. Open its "Going further" section. For every link:

  • Is it still live?
  • Is it still relevant?
  • Has it been superseded?

Replace dead links with current equivalents or with self-contained content extracted into the chapter itself. Self-contained is preferable when the resource is small enough-it removes external dependencies.

3.4 Quarterly-4 hours

Each quarter audits one cluster of sequences:

Quarter Sequences Focus
Q1 01-04 Math + Python. Re-derive 5 problems; update Python toolchain section.
Q2 05-08 PyTorch + classical ML + DL + transformers. Re-run notebooks; update transformer-variant section.
Q3 09-12 LLM apps + RAG + agents + eval. Highest-volatility cluster. Re-run example projects; replace dead vendor APIs; update tool tables.
Q4 13-17 Observability + serving + fine-tuning + distributed + capstone. Update vendor lists; refresh recipes; survey job postings.

Quarterly audit checklist (paste into each audit):

  • All notebooks run on a clean install.
  • All external links resolve.
  • Named tools/models/vendors updated against current state.
  • Durability tags re-evaluated.
  • Any new technique above the "broadly adopted" threshold is added.
  • Any deprecated technique is marked deprecated, not deleted (provenance matters).
  • Tripwires checked (see §4).

3.5 Semi-annually-1 day

Two days per year (e.g., end of June, end of December). Refresh sequences 12-15 (eval, observability, serving, fine-tuning)-these are the high-velocity stable+ephemeral hybrids that justify a deeper bi-annual sweep.

3.6 Yearly-2 days

Two consecutive days, scheduled in advance. The full annual ritual:

  1. Re-read this chapter.
  2. Re-tag any sequence whose durability has shifted.
  3. Re-read all sequences' tables of contents (not the body).
  4. Survey 10-20 job postings in your specialty.
  5. Update the year's roadmap.
  6. Make the explicit decision: continue / deepen / pivot.

If you skip the yearly ritual, the curriculum decays by an order of magnitude more than if you skip a quarterly. This is the highest-leverage 16 hours of the year.


4. "This Curriculum Is Broken When..."-Tripwires

Tripwires are pre-committed signals. When tripped, you act, regardless of how busy you are. Pre-commitment beats discretion-the moment you decide "I will refresh next quarter when I have time," you don't.

4.1 Tooling tripwires

  • Dead links in 3+ sequences in a single monthly audit. Action: tooling/sequence refresh in the next quarterly slot, even if it's not that quarter's cluster.
  • Notebook breakage rate >30% on a clean install. Action: full rewrite of broken notebooks in the next available 4-hour slot.
  • A library you depend on is archived/abandoned. Action: replace within 30 days or document the freeze and migrate examples.

4.2 Field tripwires

  • Dominant model architecture changes. If a non-autoregressive or non-attention architecture (e.g., diffusion-LMs, state-space hybrids) becomes the default for one of the top three frontier labs, the transformers sequence needs a structural rewrite, not a refresh.
  • Tool consolidation in the specialty. If 80% of the specialty's surface area is absorbed by 1-2 vendors, your differentiation as an applied practitioner narrows. Action: re-evaluate specialty (§7).
  • Capability obsolescence. If frontier models develop the specialty's core capability natively (e.g., reliable self-evaluation, native multi-tool orchestration), the specialty's tooling layer thins. Action: pivot toward integration/customization or re-specialize.

4.3 Career tripwires

  • Hiring market shift. Job postings in your specialty drop >30% YoY in your region/remote market. Action: re-evaluate specialty within 90 days.
  • Personal capability erosion. You can no longer answer entry-level interview questions in the specialty without preparation. Action: 2-week capability refresh, then a public artifact to reset.
  • Stopped learning. You haven't learned a new thing in the specialty in 6 months. Action: hard look in the mirror; either deepen aggressively or pivot.

4.4 Personal tripwires

  • You stop doing the daily skim for two weeks. Action: figure out why. Burnout, life event, lost interest? The diagnosis matters more than the resumption.
  • You stop shipping artifacts for two consecutive months. Action: schedule one explicitly. The build-in-public habit is spine; do not let it lapse.
  • You start defending the curriculum in conversations rather than updating it. Action: classic sunk-cost signal. Run the §16 exercises.

5. Field-Velocity Sources

The goal is not to track everything. It is to track a small, durable set that gives you signal without noise. Curate aggressively.

5.1 Paper firehose

  • arXiv-sanity (Karpathy's curated arXiv interface). When down, fall back to arxiv.org/list/cs.LG/recent and cs.CL/recent.
  • Hugging Face Daily Papers-community-curated, lower noise than raw arXiv.
  • alphaXiv-community discussion on papers, sometimes valuable signal for what is being adopted.

5.2 Industry pulse

  • Latent Space podcast / newsletter (swyx + Alessio). Industry-leaning interviews; good for what shipping teams actually do.
  • Interconnects (Nathan Lambert)-RL/post-training/policy; high signal on the fine-tuning sequence.
  • The AI Engineer Summit / World's Fair talks-recorded annually; the talks cluster around what practitioners ship.

5.3 Open-weights pulse

  • r/LocalLLaMA-open-weight model releases, quantization tricks, single-GPU practicality. Filter for top-of-week.
  • Hugging Face trending-what models people actually download.

5.4 Twitter/X-durable list

A small, durable list. The principle: pick people whose timelines are technical and consistent over years, not the loudest of the moment. Examples (real public figures; check their current handles when you set up the list):

  • Andrej Karpathy-pedagogy + frontier intuition.
  • Hamel Husain-applied LLM evaluation; contractor-grade pragmatism.
  • Eugene Yan-applied ML; eval and recsys; consistent thoughtful writing.
  • Chip Huyen-ML systems and platforms.
  • Sasha Rush-research clarity; transformer pedagogy.
  • Lilian Weng-survey-style deep posts; high-density.
  • Tri Dao-FlashAttention author; serving/kernels.
  • Jeremy Howard-fast.ai; opinionated practical research.
  • Frontier-lab researchers-Anthropic / OpenAI / Google DeepMind / Meta / Mistral / Qwen team members, picked individually for technical signal rather than corporate broadcasting.

Limit: 15 accounts. If a 16th adds, a 1st leaves. Volume control is the discipline.

5.5 Aggregators

  • Hacker News-filtered for AI/ML, top of day. Good for cross-pollination from systems/security/economics.
  • Import AI (Jack Clark)-weekly newsletter; policy + capability landscape.

5.6 Ground truth

  • Provider blogs-Anthropic, OpenAI, Google DeepMind / Google AI, Meta AI, Mistral, Qwen, xAI, AI21, Cohere, Databricks. The ground truth for what specific provider capabilities are.
  • Model cards of any model you deploy. The model card is more durable than a marketing post.

5.7 Academic conferences

  • NeurIPS, ICLR, ICML-ML research front line.
  • MLSys-ML systems specifically.
  • OSDI, SOSP, ASPLOS, EuroSys-for the systems-side of inference/training.
  • ACL, EMNLP, NAACL-NLP-specific.

You do not need to read all proceedings. You need to skim accepted-paper titles annually (1-2 hours per conference) and dive into 3-5 papers per conference. Write the diving notes per §3.2.

5.8 What not to track

  • General AI Twitter discourse not from the names above.
  • "AI influencer" newsletters with no technical claims.
  • Closed Discord servers (high effort/signal ratio for solo learners).
  • VC commentary unless you are deciding where to work.

6. 6/12/24-Month Milestones

The cadence in §3 is maintenance. The milestones are progress.

6.1 Six-month milestones

  • Refresh 3 sequences in your specialty's cluster.
  • Write a "what changed in [specialty] in the last 6 months" post (1500-2500 words).
  • Ship one new artifact in the specialty (eval harness, observability bridge, agent design pattern, fine-tune recipe-pick one and ship).
  • Survey 10 job postings; compare vocabulary to your sequences; note drift.

6.2 Twelve-month milestones

  • Full year retrospective:
  • Artifacts shipped (count, links, what each taught you).
  • Posts published (count, traffic if you track it).
  • OSS PRs merged.
  • Talks given / podcasts / conference participation.
  • People you spoke with in the field.
  • Curriculum update:
  • Apply all yearly-ritual outputs (§3.6).
  • Tag-shift any drifted sequences.
  • Year-2 roadmap: same structure as year-1, sharpened by a year of evidence.

6.3 Twenty-four-month milestones

  • Re-evaluate the specialty itself. Is the bet still good? (Use §7 pivot signals.)
  • Decision: deepen, pivot, or branch.
  • If pivot: 90-day transition plan.
  • If deepen: define what "expert" means at year 3, with verifiable artifacts.
  • If branch: pick the second specialty deliberately, with explicit time allocation between the two.

7. Pivot Signals-When to Change Specialty

The hard part of pivoting is not deciding it; it is knowing when. Pre-committed signals beat post-hoc rationalization.

Signal Threshold Response
Hiring demand Postings drop >30% YoY in your region/remote 90-day transition plan
Tool consolidation 1-2 vendors absorb 80% of the specialty's surface Re-specialize toward integration/customization
Capability obsolescence Frontier models do the specialty better than tooling Move up the stack toward problem framing
Personal stagnation No new learning in the specialty for 6 months Diagnose: bored, blocked, or done
Market saturation Your specialty's average comp drops 2 quarters in a row Concerning, not decisive
Adjacent opportunity A neighboring specialty opens with 2x demand Branch, don't pivot-keep the spine, add the new layer

Pivoting is expensive. Estimated cost of a clean pivot: 6-9 months of reduced output before you regain velocity in the new specialty. Therefore: pivot when 2+ signals trip simultaneously, or when one signal trips hard. Do not pivot on a single weak signal.

The cheapest pivots are the ones that preserve the spine. Eval-and-observability → AI-platform-engineering keeps OTel, distributed-systems thinking, evaluation discipline, and Python/PyTorch fluency. The spine carries; only the surface changes. Plan pivots along spine-preserving axes when possible.


8. Spine Investments That Survive Pivots

The investments that pay off across any plausible 2026-2030 pivot:

8.1 Math fluency

  • Re-derive backprop on paper in <30 minutes.
  • Compute gradients of a custom loss without looking it up.
  • Reason about a paper's update rule by reading its loss.

This transfers across all of ML, regardless of which architecture wins.

8.2 Distributed-systems instincts

  • Reasoning about queues, batches, retries, idempotency.
  • Reasoning about failure modes and partial failures.
  • Reasoning about latency budgets and tail latency.

This transfers across infra, agents, serving-the entire production-AI stack.

8.3 Evaluation discipline

  • Treating measurement as a first-class artifact.
  • Designing experiments before running them.
  • Refusing to ship without a regression eval.

This transfers to any quality-bound system. It is also one of the rarest skills in applied ML hiring.

8.4 Build-in-public habit

  • Writing about what you build, monthly.
  • Shipping artifacts you can point to.
  • Maintaining a public surface area (GitHub, a blog, talks).

This compounds across careers. The output of the habit is more durable than the content of any specific post.

8.5 Network of practitioners

  • 10-20 people you can ask technical questions of and they answer.
  • 3-5 people who would refer you for a job.
  • 1-2 mentors who are 5+ years ahead of you.

Networks compound. They survive specialty changes (the people you know in agent-eval will cross over to whatever-eval becomes in 2030).

8.6 Writing

  • Long-form technical writing that someone would want to read.
  • The ability to write 1500 words on a technical topic in a sitting without floundering.
  • Editing instincts: knowing when you've over-written.

Writing is a force multiplier for all of the above.

8.7 Reading research papers

  • Reading a paper in 45-60 minutes.
  • Extracting the claim, the evidence, the novelty.
  • Knowing when to skip and when to dive.

This is itself a skill that decays without practice. Weekly cadence (§3.2) protects it.


9. Ephemeral Investments That Decay Fastest

Where to spend ≤20% of total study time.

Investment Estimated half-life Decay reason
Specific framework expertise (LangChain v0.3, DSPy v2.x) 12-18 months API churn; framework competition
Specific vendor APIs (current pricing/tool-use formats) 6-12 months Provider iteration
Specific benchmark scores 6-12 months Benchmark saturation
Specific model names (Llama 3.x, Claude 3.x, GPT-4.x) 12-24 months Version cycles
Specific dashboard layouts 12-24 months Vendor UI churn
Specific cloud-provider GPU SKU quirks 18-30 months Hardware cycles
Specific quantization recipes 12-24 months Kernel/algorithm progress

Strategy: 60% spine, 25% stable, ≤15% ephemeral. When you find yourself spending more on ephemeral, audit; you are likely on a tool-tasting tour (§14.2).

The exception: the specialty's current ephemeral surface is what you ship in production. You need enough fluency in current ephemeral tools to be hireable. Enough is "I've shipped a non-trivial system with this in the past 6 months." More than that is over-investment.


10. Cross-Curriculum Future-Proofing

This curriculum sits in a stack:

applications  ← /tutoriaal/                (this curriculum)
systems       ← /AI_SYSTEMS_PLAN/
orchestration ← /KUBERNETES_PLAN/
containers    ← /CONTAINER_INTERNALS_PLAN/
OS            ← /LINUX/
languages     ← /RUST_TUTORIAL_PLAN/, /GO_LEARNIN_PLAN/, Python (here)

10.1 The bet

This stack is durable for the 2026-2030 production-AI engineer profile: someone who can take a foundation model and ship it in production with eval, observability, and reliability discipline. The bet rests on three assumptions:

  1. Production-AI engineering remains a distinct discipline from research.
  2. The foundation-model layer continues to be consumed via APIs and open-weights, not absorbed entirely into vertical applications.
  3. The systems substrate (Linux, containers, orchestration) remains relevant to AI deployment, not abstracted away by managed services.

Each assumption has a counter-scenario (§11), but the joint probability of all three failing in the 2026-2030 window is low.

10.2 The hedge

Each curriculum is independently valuable. Linux and containers are spine for any infrastructure career. Rust and Go are spine for any systems-programming career. Kubernetes is stable. AI systems and applications are stable+ephemeral. No single curriculum is load-bearing; pivots within the stack are cheap.

This is the structural future-proofing. If the AI-applications layer were the only investment, a paradigm shift would invalidate years of work. With the stack, a paradigm shift in one layer leaves the others intact, and the spine of each layer compounds with the spine of the others.

10.3 Adjacency leverage

  • AI-apps + Linux/containers + Kubernetes → AI-platform-engineer profile.
  • AI-apps + Rust/Go → high-performance inference profile.
  • AI-apps + Linux/containers → on-prem / edge AI profile.
  • AI-apps + AI-systems → infrastructure-research profile.

Knowing which adjacent profile to pivot toward, given which signal trips, is the value of having the stack.


11. Multi-Year Scenarios

Scenarios are thinking tools, not predictions. Each one is plausible enough to warrant a planned response. None is destined.

11.1 Scenario A-"Spec extends" (2027-ish)

Agents become reliable for narrow domains. Eval discipline becomes a standard expectation in mid-sized companies. Observability for LLM systems becomes commoditized. The user has 2-3 years of artifacts in the specialty and is mid-career senior.

  • Refresh: tools, models. Frameworks have settled into a smaller set of survivors.
  • Spine: intact. Math, OTel, eval discipline all carry.
  • Career: senior IC or staff in the specialty; lead role on a focused team.

11.2 Scenario B-"Foundation models commoditize the layer" (2028-ish)

Frontier labs ship managed eval and managed agent infrastructure that absorbs 60-80% of what teams used to DIY. Specialty narrows to integration, customization, and the long tail of cases where the managed layer is insufficient.

  • Refresh: identity shifts toward "AI platform engineer"-the bridge from SRE strengthens, and Kubernetes/observability/Linux become more load-bearing relative to the application layer.
  • Spine: still holds. The skills are the same; the surface changes.
  • Career: platform-engineer profile; leverage the SRE background heavily.

11.3 Scenario C-"New paradigm" (2029-ish)

A fundamentally different model class displaces autoregressive transformers as the dominant deployed architecture (e.g., world-models, diffusion-LMs at scale, hybrid architectures becoming the default).

  • Refresh: architecture sequence (08) needs rewriting, not refresh. Inference/serving (14) needs rewriting. Fine-tuning (15) needs rewriting. Agents (11) probably survives because the patterns are model-agnostic.
  • Spine: still holds. The math is the same; the residual-stream story is generalizable to most plausible successors.
  • Career: 6-9 month re-tooling; spine carries you through; specialty resets but spine compounds.

11.4 Scenario D-"Hardware shift dominates"

GPUs are partially displaced by specialized inference accelerators (TPUs, NPUs, custom inference silicon, edge accelerators) for production workloads.

  • Refresh: serving (14) and distributed (16) sequences update; CUDA-specific knowledge becomes ephemeral; the abstractions (paged attention, continuous batching) carry.
  • Spine: holds.
  • Career: opportunity if you've maintained Linux/containers depth.

11.5 Scenario E-"Regulatory shift"

Eval, observability, and provenance become legally required for AI systems above a size/risk threshold. The specialty becomes a regulated discipline.

  • Refresh: add a regulatory-compliance section to eval and observability sequences; learn the specific frameworks (e.g., whatever the dominant audit framework is at the time).
  • Spine: holds and appreciates-eval discipline becomes more valuable.
  • Career: tailwind.

11.6 Scenario F-"Demand contraction"

AI investment cycle contracts; hiring drops broadly; specialty demand drops with it.

  • Refresh: tighten ship cadence; emphasize unit economics in artifacts; lean on the cross-curriculum stack to pivot toward systems work.
  • Spine: holds.
  • Career: harder, but the stack-style portfolio is exactly the hedge for this scenario.

11.7 What scenarios A-F have in common

In all six, the spine investments hold, the stable investments need partial refresh, and the ephemeral investments need replacement. This is the case for the durability framework: it is robust across plausible futures.


12. The Annual Audit Checklist

Print this. Run it once a year. Keep the filled-in versions.

  • Re-read the durability audit (this chapter).
  • Re-tag any sequence whose durability shifted in the past year.
  • Replace dead external links with self-contained content where feasible.
  • Refresh the "tools to be fluent in" list against current job postings (10-20 postings).
  • Compare KPIs (artifacts shipped, posts published, OSS PRs, talks) to last year's targets.
  • Survey 5 practitioners in your specialty: what changed for them this year? (Email, DM, or coffee.)
  • Evaluate the 6 pivot signals (§7).
  • Run the 6 yearly exercises (§16).
  • Decide explicitly: continue / deepen / pivot. Document the decision and the reasoning.
  • Rewrite next year's roadmap.
  • Schedule the next four quarterly audits in the calendar with reminders.

The act of writing the decision down (§9 of the checklist) is the load-bearing one. Decisions made implicitly drift; decisions made explicitly compound.


13. The Honest Meta-Question

Once a year, sit with this question for an hour, no devices:

"If I were starting from scratch today with the same goals, would I follow this curriculum, or would I do it differently?"

Three possible answers and what they mean:

  • "Same plan, sharpened": the plan is healthy. Refresh and continue.
  • "Mostly same plan, but I'd add/remove [X]": the plan needs targeted updates, not structural revision. Make the changes.
  • "I'd do it substantially differently": structural revision needed. Not a refresh-a re-design. Be honest if this is the answer; do not let sunk cost (§14.1) keep you on the old path.

The honest meta-question is a stress-test of identity, not just curriculum. If you find yourself answering "same plan" three years in a row but the specialty has changed shape underneath you, the answer is wrong-you are over-fitting to past commitments. The fix: run §16 exercises 1-2 first; they expose what you would change if you were honest.


14. Anti-Patterns of Curriculum Staleness

Patterns that look like maintenance but aren't.

14.1 Sunk-cost stickiness

Continuing the curriculum because of the time invested, not because it is still right. Symptom: you can list reasons to continue but cannot list signals that would make you stop. Fix: pre-commit the pivot signals (§7); when they trip, act.

14.2 Tool tourism on refresh

Every refresh becomes a tool-tasting tour: "Let me try the new framework, the new vector DB, the new eval tool." No deepening. Symptom: you can name 12 frameworks; you have shipped 0 systems in any of them in the past 6 months. Fix: every refresh produces an artifact, not a survey.

14.3 Spine erosion

Refreshing only the ephemeral; never re-deriving the math; the foundation becomes shaky. Symptom: you cannot derive backprop on paper without help. Fix: yearly spine exercises (§16.5); if it takes >30 minutes, dedicate 2 weeks to spine refresh.

14.4 Pivot paralysis

Market signals say pivot, you don't. Sunk-cost again, plus identity attachment. Symptom: 3+ pivot signals tripped, no action. Fix: pre-commit the 90-day transition plan in advance, so pulling the trigger is a calendar event, not an existential decision.

14.5 Refresh without writing

You read, you skim, you nod. Nothing is written. Six months later you cannot remember what changed. Fix: every refresh produces a written artifact, even if 200 words. The writing is the learning.

14.6 Curriculum-as-museum

The curriculum becomes a thing to preserve, not a thing to use. You stop adding to it; you stop using it as a working document. Fix: §15-feed learnings back. The curriculum is alive or it is dead.

14.7 Scope creep

Every refresh expands the curriculum. Three years in, it is unmaintainable. Symptom: the curriculum is 1.5x the size it was a year ago. Fix: every refresh has a deletion budget-at least one thing must be cut, even if small. Compression is a discipline.

14.8 Public-output collapse

You stop shipping artifacts and stop publishing. The build-in-public spine erodes silently. Symptom: 2+ months without a public artifact. Fix: schedule one explicitly within 14 days.


15. The Reciprocal-Feeding Learnings Back

The curriculum is a living document. Three update rules.

15.1 New techniques

When you encounter a new technique that proves valuable in production:

  • Identify the relevant sequence.
  • Write a short section (200-500 words) explaining the technique, its pre-conditions, its tradeoffs, and a code snippet.
  • Tag it with an initial durability estimate.
  • Mark it with the date added; revisit during the next quarterly audit to confirm it deserves to stay.

15.2 Dead ends

When you encounter a dead end (a tool that didn't work, a pattern that failed, a paper whose claims didn't replicate):

  • Write a one-paragraph "Why this didn't pan out" note.
  • Place it in the relevant sequence, marked clearly as a dead-end.
  • Future-you will save weeks not re-investigating.

The dead-end notes are some of the most valuable content in any mature curriculum-they encode negative knowledge, which is rarely written down anywhere.

15.3 Compression

Once a year, identify content that can be compressed:

  • Two sections covering similar ground → merge.
  • Long-winded explanations → tightened.
  • Out-of-date examples → replaced or deleted.

The curriculum should be flat or shrinking in size after year 1. Growth is a smell unless it is intentional.


16. Yearly Exercises

Six exercises, run once a year, ideally in a single 4-hour sitting.

Exercise 1-Removal list

List 5 things in this curriculum you would remove if you started today. Justify each in 1-2 sentences. Then: actually remove the 2-3 with the strongest justification.

If you cannot find 5, you are likely under-pruning. The field moves; some things become irrelevant; refusing to remove them is a form of staleness.

Exercise 2-Addition list

List 5 things you would add. Justify each. Then: add the 2-3 with the strongest justification.

The additions and removals together form the year's structural delta. Aim for net-zero or slight reduction in size.

Exercise 3-Backup the biggest external dependency

Identify the single biggest external dependency (a paper, a library, a blog post, a video) that, if it disappeared, would invalidate part of the curriculum.

Plan a self-contained backup: archive the paper, mirror the blog post, write your own version of the explanation. The goal is that no single external resource is load-bearing.

Exercise 4-Job-posting vocabulary drift

Survey 10-20 job postings in your specialty. For each, extract the technical vocabulary. Compare to your sequences' vocabulary.

  • New terms in postings, missing from sequences → add (or evaluate if just hype).
  • Terms in sequences, missing from postings → consider removing or downgrading.
  • Terms in both → confirm coverage depth matches market expectation.

This exercise is the single most reliable signal of curriculum-market fit.

Exercise 5-Spine re-derivation

Pick one piece of foundational math. Re-derive on paper:

  • Backpropagation through a 2-layer MLP.
  • The closed-form solution to ridge regression.
  • The gradient of softmax cross-entropy.
  • The ELBO in a VAE.
  • The DPO loss derivation.

Time yourself. If it takes >30 minutes, you have spine erosion. Schedule a 2-week deep refresh of the relevant sequence.

Exercise 6-State-of-the-specialty memo

Write a 1-page memo as if briefing a friend joining the field today: what is the specialty, what are the durable concepts, what tools/models are current, what is the likely 12-month direction.

If you cannot write the memo in 90 minutes, you are not as fluent as you think. The memo also doubles as a portfolio artifact and a public post.


17. Putting It All Together-The Operating Cadence

A single year of operating this curriculum, condensed:

Month Activity (recurring) Activity (one-off)
Jan Daily skim, weekly paper, monthly audit Q1 cluster audit (sequences 01-04)
Feb Daily, weekly, monthly -
Mar Daily, weekly, monthly -
Apr Daily, weekly, monthly Q2 cluster audit (sequences 05-08)
May Daily, weekly, monthly -
Jun Daily, weekly, monthly Semi-annual eval/observability/serving/fine-tuning refresh; 6-month milestone check
Jul Daily, weekly, monthly Q3 cluster audit (sequences 09-12)
Aug Daily, weekly, monthly -
Sep Daily, weekly, monthly -
Oct Daily, weekly, monthly Q4 cluster audit (sequences 13-17)
Nov Daily, weekly, monthly -
Dec Daily, weekly, monthly Semi-annual refresh; yearly ritual (audit checklist, meta-question, exercises, decision, next-year roadmap)

Total time per year (estimate):

  • Daily: 365 × 15 min ≈ 91 hours
  • Weekly: 52 × 90 min ≈ 78 hours
  • Monthly: 12 × 60 min ≈ 12 hours
  • Quarterly: 4 × 4 hours = 16 hours
  • Semi-annual: 2 × 8 hours = 16 hours
  • Yearly: 16 hours

Total: ~230 hours/year. About 4.5 hours/week. This is the maintenance budget; new learning, new artifacts, and new shipped systems sit on top of it. The maintenance budget protects the rest from decay.


18. Closing-The Discipline Is the Asset

The curriculum is not the asset. The discipline of maintaining it is.

In three years, every framework named in this curriculum will have changed. Every model will have changed. Half the tools will have been replaced. The papers will be different. The vocabulary will have drifted. The job postings will be different.

What will not have changed: the math, the systems instincts, the eval discipline, the writing habit, the network of practitioners, the build-in-public habit, the durability instinct itself.

The framework in this chapter-three tiers, five cadences, six pivot signals, six exercises, the meta-question-is itself spine. It will work in 2027, 2028, 2029, regardless of which scenario from §11 plays out.

The hard part is the discipline. Schedule the cadences. Run the exercises. Write the decisions down. When the tripwires trip, act. When the meta-question's answer drifts, listen.

Three years from now, when the field looks substantially different and you are still hireable, still shipping, still learning-that is the asset. The curriculum was the scaffolding. The discipline was the building.


Appendix A-Durability tag legend (paste at the top of every sequence)

[Spine]    -10+ year half-life; review, don't refresh.
[Stable]   -4-7 year half-life; refresh annually-to-biennially.
[Ephemeral]-1-3 year half-life; refresh quarterly-to-semi-annually.

Appendix B-Quarterly audit template (paste into each audit)

Quarter: [Q_]
Cluster: [sequences X-Y]
Time spent: [hours]

Per sequence:
  Sequence: ___
    Notebooks pass clean install:    [yes/no, % failing]
    Dead links:                       [count, list]
    Tool/model name updates:          [list]
    Durability tag changes:           [list]
    New techniques added:             [list]
    Deprecated marks added:           [list]
    Tripwires triggered:              [list]
  ...

Cross-sequence observations:
  [free text]

Actions for next quarter:
  [explicit list]

Appendix C-The yearly decision record (template)

Year: [YYYY]
Date of decision: [YYYY-MM-DD]

Specialty status:
  Continue / Deepen / Pivot:        [pick one]
  Reasoning:                         [3-5 sentences]
  Pivot signals tripped this year:   [list with thresholds]

Curriculum delta:
  Removed: [list with reasoning]
  Added:   [list with reasoning]
  Tag changes: [list]

Next-year roadmap:
  Q1: [focus]
  Q2: [focus]
  Q3: [focus]
  Q4: [focus]
  Yearly KPIs:
    Artifacts to ship:        [count]
    Posts to publish:          [count]
    OSS PRs to merge:          [count]
    Talks/podcasts:            [count]
    Practitioners to engage:   [count]

Honest meta-question answer:
  [verbatim, written here, signed and dated]

End of Deep Dive 14. The next chapter you read should be your own, written one year from today, with the yearly audit checklist filled in.

Comments