CPython bytecode compilation
CPython compiles source to a stack-based bytecode that runs on an interpreter loop. Bytecode is cached in .pyc files and gets faster every release via specialization.
CPython is a bytecode interpreter. Source code goes through parser, then AST, then a bytecode compiler. The resulting bytecode runs on a stack-based virtual machine inside ceval.c. There is no JIT in default CPython (yet - 3.13 has an experimental one).
The pipeline
- Tokenizer. Source string to token stream.
- Parser. Tokens to Abstract Syntax Tree. (CPython switched to a PEG parser in 3.9, replacing the old LL(1) one.)
- Compiler. AST to bytecode. Symbol table analysis, scope resolution, then code generation.
- Interpreter.
ceval.cruns the bytecode instruction by instruction.
import dis
def f(x, y):
return x + y
dis.dis(f)
# LOAD_FAST x
# LOAD_FAST y
# BINARY_ADD
# RETURN_VALUEThe stack machine
Each bytecode instruction operates on a value stack. LOAD_FAST x pushes the value of x. BINARY_ADD pops two values, adds them, pushes the result. RETURN_VALUE pops the top and returns it.
There's no register allocation. The compiler is simple because the machine is simple. The cost is more memory traffic; every operation touches the stack.
.pyc files
The first time you import a module, CPython compiles it and writes the bytecode to __pycache__/<module>.cpython-3X.pyc. Subsequent imports load the .pyc directly, skipping parse and compile. The file includes a magic number tied to the Python version and a source modification time; if either changes, the .pyc is regenerated.
This is why import is fast on the second run but slower on the first.
Specializing adaptive interpreter (PEP 659)
Python 3.11+ ships with an "adaptive" interpreter. The bytecode is the same, but at runtime certain instructions specialize based on observed types. BINARY_ADD becomes BINARY_ADD_INT if both operands are int after the first few calls, skipping type checks. If a type changes, the instruction de-specializes back to the generic form.
This is the same idea as V8's inline caches, ported to a bytecode interpreter. It gave 3.11 a 10-60% speedup over 3.10 on most benchmarks.
The 3.13 JIT
Python 3.13 ships an experimental copy-and-patch JIT (off by default). It generates machine code from specialized bytecode by copying pre-compiled templates and patching constants. It's a baseline JIT, not an optimizing one. Real speedups are modest (5-15%) but the infrastructure is there for future work.
The model: parse once, compile to stack-based bytecode, interpret. Specialization makes the interpreter faster without breaking the bytecode contract. The JIT is the next step but is still early.
Learn more
- DocsPython docs: dis modulepython.org
- Docs
- Talk