Week 9 - The CPython VM: Objects, Bytecode, the Eval Loop¶

9.1 Conceptual Core¶

CPython is a stack-based bytecode interpreter with reference counting + a generational cyclic GC. Every PyObject is a 16-byte header (ob_refcnt, ob_type) + type-specific tail.
The eval loop (Python/ceval.c::_PyEval_EvalFrameDefault) is a giant computed-goto dispatch over opcodes. Since 3.11, the loop is specializing and adaptive (PEP 659): hot opcodes get rewritten in place to type-specialized variants (LOAD_ATTR_INSTANCE_VALUE, BINARY_OP_ADD_INT).

dis.dis(fn): disassemble a function. Memorize the common opcodes: LOAD_FAST, STORE_FAST, LOAD_GLOBAL, LOAD_CONST, CALL, RETURN_VALUE, BINARY_OP, COMPARE_OP, FOR_ITER, POP_JUMP_IF_FALSE, LOAD_ATTR, STORE_SUBSCR.
Why local lookups are fast and global lookups are slow: locals are a fixed-size array indexed by integer (fast locals), globals are a dict lookup. Hot functions often hoist globals to locals (def f(_len=len): ...).
Frame objects, code objects, and the difference. func.__code__.co_consts, co_names, co_varnames, co_flags.
The specializing interpreter: read PEP 659 once. Use python -X opt -c "import dis; dis.dis(fn, adaptive=True)" to see specialized opcodes after warm-up.
Free lists and small-int / interned-string caches.

Write three implementations of "sum of squares": a for loop, a sum() + genexp, and numpy.dot(a, a). dis.dis each. Benchmark with timeit. Explain the gap.
Take a function with a global lookup in its hot loop. Refactor to a default-argument cache. Re-bench. Quantify the win.
Use sys.setprofile to count opcode-level events on a small program. Compare counts before and after warm-up to observe specialization.

Enable ruff PERF. Read every rule. Identify cases in your codebase where the rule applies but readability suffers.

Add pytest-benchmark to CI as a non-failing job that publishes JSON results. Build a script that flags >10% regressions on PRs.