03 - The Python + Linux baseline¶

What this session is¶

The non-AI prerequisites people skip and then can't debug their way out of. Specific skill checklist.

Why this page exists¶

Almost every blocker I've watched a new AI engineer hit was actually a Python or Linux blocker, dressed up. CUDA out-of-memory turning out to be a path issue. "The model won't train" turning out to be a shell environment problem. "The script hangs" turning out to be a buffered stdout question.

If your Python and Linux are strong, AI engineering is just engineering.

Python baseline checklist¶

You should be able to do all of these without Googling syntax:

Language¶

Write a function with default args, keyword args, and *args, **kwargs.
Write a class with __init__, an instance method, and a __repr__.
Write a generator with yield. Know when generators help.
Use list, dict, set comprehensions.
Use with statements. Know what a context manager does.
Read a stack trace top-to-bottom and find the actual cause.

Standard library¶

pathlib over os.path.
json for serialization.
argparse or click for CLIs.
subprocess.run with check=True.
logging (not print) for non-trivial scripts.
pytest for tests.

Ecosystem¶

pip and pip install -e . for a local package.
venv or uv for environments. (uv is faster; use it.)
pyproject.toml over setup.py for new projects.
Know what a requirements.txt lockfile vs declarations difference is.
Read a pyproject.toml and know what [project], [tool.uv], [tool.pytest.ini_options] blocks do.

AI-specific¶

numpy: arrays, shapes, broadcasting, slicing, np.einsum for sanity-check matmul.
pandas: load CSV, filter, group, save. Know when not to use pandas (huge data, structured tensors).
matplotlib or seaborn: plot a curve, plot a histogram. Save to PNG.

If any of this is shaky, do Python from Scratch before moving on.

Linux baseline checklist¶

You should be able to:

Filesystem and process¶

Navigate with cd, ls, tree. Use find and grep (or rg).
chmod, chown for permission issues.
ps aux | grep <thing>, kill -9 <pid>.
top / htop for "what's eating my CPU/memory."
df -h, du -sh *, du -sh * | sort -h - disk usage matters for AI.
nvidia-smi if you have a GPU. Watch live with watch -n 1 nvidia-smi.

Shell¶

Pipes (|), redirects (>, >>, 2>&1).
Environment variables: export FOO=bar, $FOO, env | grep FOO.
Background jobs (&), nohup, tmux or screen.
Editing in vim or nano at least for quick edits.

Networking¶

curl -v <url> to debug an API.
ss -tnlp or lsof -i :8000 to find what's on a port.
ssh user@host, scp, rsync -avh for moving files.

Python-on-Linux specifics¶

Where which python is and why it matters.
Why pip install sometimes installs to the wrong env. Use python -m pip install instead.
Reading journalctl or systemd logs if a service won't start.

If any of this is shaky, do Linux from Scratch before moving on.

Git baseline¶

The minimum:

clone, add, commit, push, pull.
branch, checkout -b, merge, rebase (use merge until you understand rebase).
Resolve merge conflicts at least once with intent.
stash, stash pop.
log --oneline --graph, diff, blame.
Read a .gitignore. Write one for a new repo.

The 30-minute self-test¶

Do this in your shell:

mkdir /tmp/baseline && cd /tmp/baseline
uv init && uv venv && source .venv/bin/activate
uv add numpy pandas matplotlib pytest

cat > work.py <<'EOF'
import numpy as np, pandas as pd
df = pd.DataFrame({"x": np.random.randn(1000), "y": np.random.randn(1000)})
df["z"] = df.x * 2 + df.y
print(df.describe())
df.to_csv("out.csv", index=False)
EOF

python work.py
head out.csv

If every step felt natural, you're at baseline. If any step was confusing, that's the gap to close.

What you might wonder¶

"Conda vs uv vs pip vs poetry?" uv is the current right answer. Fast, modern, replaces venv/pip/pip-tools/pyenv. Conda is fine if your team uses it; don't fight a team's choice. Avoid switching mid-project.

"Mac, Linux, or WSL?" Linux native is best. Mac is fine (mps works for many models). WSL2 is fine for development; production is Linux. Windows native is painful for AI; avoid.

"Do I need to learn Bash scripting properly?" Read it. Don't write big ones. Reach for Python when a shell script exceeds 30 lines.

Done¶

Self-tested Python baseline.
Self-tested Linux baseline.
Know what to revisit.

Next: ML mental model in one page →