Crisp5 min readGo deeper →

CPython bytecode compilation

CPython compiles source to a stack-based bytecode that runs on an interpreter loop. Bytecode is cached in .pyc files and gets faster every release via specialization.

CPython is a bytecode interpreter. Source code goes through parser, then AST, then a bytecode compiler. The resulting bytecode runs on a stack-based virtual machine inside ceval.c. There is no JIT in default CPython (yet - 3.13 has an experimental one).

The pipeline

Tokenizer. Source string to token stream.
Parser. Tokens to Abstract Syntax Tree. (CPython switched to a PEG parser in 3.9, replacing the old LL(1) one.)
Compiler. AST to bytecode. Symbol table analysis, scope resolution, then code generation.
Interpreter. ceval.c runs the bytecode instruction by instruction.

import dis
 
def f(x, y):
    return x + y
 
dis.dis(f)
# LOAD_FAST x
# LOAD_FAST y
# BINARY_ADD
# RETURN_VALUE

The stack machine

Each bytecode instruction operates on a value stack. LOAD_FAST x pushes the value of x. BINARY_ADD pops two values, adds them, pushes the result. RETURN_VALUE pops the top and returns it.

There's no register allocation. The compiler is simple because the machine is simple. The cost is more memory traffic; every operation touches the stack.

.pyc files

The first time you import a module, CPython compiles it and writes the bytecode to __pycache__/<module>.cpython-3X.pyc. Subsequent imports load the .pyc directly, skipping parse and compile. The file includes a magic number tied to the Python version and a source modification time; if either changes, the .pyc is regenerated.

This is why import is fast on the second run but slower on the first.

Specializing adaptive interpreter (PEP 659)

Python 3.11+ ships with an "adaptive" interpreter. The bytecode is the same, but at runtime certain instructions specialize based on observed types. BINARY_ADD becomes BINARY_ADD_INT if both operands are int after the first few calls, skipping type checks. If a type changes, the instruction de-specializes back to the generic form.

This is the same idea as V8's inline caches, ported to a bytecode interpreter. It gave 3.11 a 10-60% speedup over 3.10 on most benchmarks.

CPython compilation pipeline

The 3.13 JIT

Python 3.13 ships an experimental copy-and-patch JIT (off by default). It generates machine code from specialized bytecode by copying pre-compiled templates and patching constants. It's a baseline JIT, not an optimizing one. Real speedups are modest (5-15%) but the infrastructure is there for future work.

The model: parse once, compile to stack-based bytecode, interpret. Specialization makes the interpreter faster without breaking the bytecode contract. The JIT is the next step but is still early.

Learn more

Docs
Python docs: dis modulepython.org
Docs
PEP 659: Specializing Adaptive Interpreterpython.org
Talk
Brandt Bucher and Mark Shannon: The Faster CPython projectPyCon