Week 9 - The CPython VM: Objects, Bytecode, the Eval Loop¶
9.1 Conceptual Core¶
- CPython is a stack-based bytecode interpreter with reference counting + a generational cyclic GC. Every
PyObjectis a 16-byte header (ob_refcnt,ob_type) + type-specific tail. - The eval loop (
Python/ceval.c::_PyEval_EvalFrameDefault) is a giant computed-goto dispatch over opcodes. Since 3.11, the loop is specializing and adaptive (PEP 659): hot opcodes get rewritten in place to type-specialized variants (LOAD_ATTR_INSTANCE_VALUE,BINARY_OP_ADD_INT).
9.2 Mechanical Detail¶
dis.dis(fn): disassemble a function. Memorize the common opcodes:LOAD_FAST,STORE_FAST,LOAD_GLOBAL,LOAD_CONST,CALL,RETURN_VALUE,BINARY_OP,COMPARE_OP,FOR_ITER,POP_JUMP_IF_FALSE,LOAD_ATTR,STORE_SUBSCR.- Why local lookups are fast and global lookups are slow: locals are a fixed-size array indexed by integer (
fast locals), globals are a dict lookup. Hot functions often hoist globals to locals (def f(_len=len): ...). - Frame objects, code objects, and the difference.
func.__code__.co_consts,co_names,co_varnames,co_flags. - The specializing interpreter: read PEP 659 once. Use
python -X opt -c "import dis; dis.dis(fn, adaptive=True)"to see specialized opcodes after warm-up. - Free lists and small-int / interned-string caches.
9.3 Lab - "Bytecode Forensics"¶
- Write three implementations of "sum of squares": a
forloop, asum()+ genexp, andnumpy.dot(a, a).dis.diseach. Benchmark withtimeit. Explain the gap. - Take a function with a global lookup in its hot loop. Refactor to a default-argument cache. Re-bench. Quantify the win.
- Use
sys.setprofileto count opcode-level events on a small program. Compare counts before and after warm-up to observe specialization.
9.4 Idiomatic & Linter Drill¶
- Enable
ruffPERF. Read every rule. Identify cases in your codebase where the rule applies but readability suffers.
9.5 Production Hardening Slice¶
- Add
pytest-benchmarkto CI as a non-failing job that publishes JSON results. Build a script that flags >10% regressions on PRs.