03 - The Python + Linux baseline¶
What this session is¶
The non-AI prerequisites people skip and then can't debug their way out of. Specific skill checklist.
Why this page exists¶
Almost every blocker I've watched a new AI engineer hit was actually a Python or Linux blocker, dressed up. CUDA out-of-memory turning out to be a path issue. "The model won't train" turning out to be a shell environment problem. "The script hangs" turning out to be a buffered stdout question.
If your Python and Linux are strong, AI engineering is just engineering.
Python baseline checklist¶
You should be able to do all of these without Googling syntax:
Language¶
- Write a function with default args, keyword args, and
*args, **kwargs. - Write a class with
__init__, an instance method, and a__repr__. - Write a generator with
yield. Know when generators help. - Use list, dict, set comprehensions.
- Use
withstatements. Know what a context manager does. - Read a stack trace top-to-bottom and find the actual cause.
Standard library¶
pathliboveros.path.jsonfor serialization.argparseorclickfor CLIs.subprocess.runwithcheck=True.logging(notprint) for non-trivial scripts.pytestfor tests.
Ecosystem¶
pipandpip install -e .for a local package.venvoruvfor environments. (uvis faster; use it.)pyproject.tomloversetup.pyfor new projects.- Know what a
requirements.txtlockfile vs declarations difference is. - Read a
pyproject.tomland know what[project],[tool.uv],[tool.pytest.ini_options]blocks do.
AI-specific¶
numpy: arrays, shapes, broadcasting, slicing,np.einsumfor sanity-check matmul.pandas: load CSV, filter, group, save. Know when not to use pandas (huge data, structured tensors).matplotliborseaborn: plot a curve, plot a histogram. Save to PNG.
If any of this is shaky, do Python from Scratch before moving on.
Linux baseline checklist¶
You should be able to:
Filesystem and process¶
- Navigate with
cd,ls,tree. Usefindandgrep(orrg). chmod,chownfor permission issues.ps aux | grep <thing>,kill -9 <pid>.top/htopfor "what's eating my CPU/memory."df -h,du -sh *,du -sh * | sort -h- disk usage matters for AI.nvidia-smiif you have a GPU. Watch live withwatch -n 1 nvidia-smi.
Shell¶
- Pipes (
|), redirects (>,>>,2>&1). - Environment variables:
export FOO=bar,$FOO,env | grep FOO. - Background jobs (
&),nohup,tmuxorscreen. - Editing in
vimornanoat least for quick edits.
Networking¶
curl -v <url>to debug an API.ss -tnlporlsof -i :8000to find what's on a port.ssh user@host,scp,rsync -avhfor moving files.
Python-on-Linux specifics¶
- Where
which pythonis and why it matters. - Why
pip installsometimes installs to the wrong env. Usepython -m pip installinstead. - Reading
journalctlor systemd logs if a service won't start.
If any of this is shaky, do Linux from Scratch before moving on.
Git baseline¶
The minimum:
clone,add,commit,push,pull.branch,checkout -b,merge,rebase(usemergeuntil you understandrebase).- Resolve merge conflicts at least once with intent.
stash,stash pop.log --oneline --graph,diff,blame.- Read a
.gitignore. Write one for a new repo.
The 30-minute self-test¶
Do this in your shell:
mkdir /tmp/baseline && cd /tmp/baseline
uv init && uv venv && source .venv/bin/activate
uv add numpy pandas matplotlib pytest
cat > work.py <<'EOF'
import numpy as np, pandas as pd
df = pd.DataFrame({"x": np.random.randn(1000), "y": np.random.randn(1000)})
df["z"] = df.x * 2 + df.y
print(df.describe())
df.to_csv("out.csv", index=False)
EOF
python work.py
head out.csv
If every step felt natural, you're at baseline. If any step was confusing, that's the gap to close.
What you might wonder¶
"Conda vs uv vs pip vs poetry?"
uv is the current right answer. Fast, modern, replaces venv/pip/pip-tools/pyenv. Conda is fine if your team uses it; don't fight a team's choice. Avoid switching mid-project.
"Mac, Linux, or WSL?"
Linux native is best. Mac is fine (mps works for many models). WSL2 is fine for development; production is Linux. Windows native is painful for AI; avoid.
"Do I need to learn Bash scripting properly?" Read it. Don't write big ones. Reach for Python when a shell script exceeds 30 lines.
Done¶
- Self-tested Python baseline.
- Self-tested Linux baseline.
- Know what to revisit.
Next: ML mental model in one page →