Python memory model and reference counting
How CPython allocates and frees objects, the refcount mechanism, the generational cycle GC, object interning, arenas and the small-object allocator, and how PEP 703 changes the picture.
The two-layer system
CPython manages memory with two coordinated systems:
-
Reference counting handles deterministic deallocation. Every object knows how many references point to it. When the count drops to zero, the object is freed immediately.
-
Cycle GC is a backup for the case refcounting can't handle: reference cycles. A doubly-linked list, a parent-child tree where children point back to parents, callbacks that close over their owner. The GC walks the heap occasionally to find unreachable cycles and collect them.
The split is a tradeoff. Refcounting is fast (no scans), deterministic (objects die when you expect), and incremental (no pauses). But it can't break cycles. Pure mark-and-sweep can break cycles but pauses to scan. CPython does both: refcounting in the hot path, GC as a periodic sweep.
The PyObject layout
Every Python object starts with a header:
typedef struct _object {
Py_ssize_t ob_refcnt; // Reference count
PyTypeObject *ob_type; // Pointer to type info
} PyObject;So every object pays at least 16 bytes on 64-bit systems for the header, regardless of what it stores. An int(0) takes 28 bytes (header plus value plus padding). A bare object() takes 16 bytes. There's no such thing as a "free" object in Python.
This overhead is one reason Python is memory-heavy compared to C or Rust. A list of a million ints takes ~60MB (each int boxed). A NumPy array of a million int32 values takes 4MB (raw memory, no boxing). The boxing cost is real.
Reference counting in detail
The basic operations:
Py_INCREF(obj): ob_refcnt += 1.Py_DECREF(obj): ob_refcnt -= 1; if zero, call obj'stp_deallocto free it.
These macros are inlined everywhere in the CPython source. Every Python operation that returns a reference does Py_INCREF on it. Every operation that no longer needs a reference does Py_DECREF. The bookkeeping is automatic.
When Py_DECREF brings the count to zero, the object's tp_dealloc is called. This function:
- Calls any
__del__method (the finalizer). - Decrements refcounts on all objects this one references (this can cascade).
- Returns the object's memory to the allocator.
The cascade is important. Freeing a list also decrements the refcount of every element. If those elements drop to zero too, they're freed, decrementing their referents, and so on. A single del can trigger the deallocation of a large object graph, all without invoking the GC.
import sys
x = "hello"
print(sys.getrefcount(x)) # 5 or so - many internal references to short strings
class Box:
def __init__(self): self.data = list(range(1000))
b = Box()
sys.getrefcount(b) # 2 (b itself + getrefcount arg)
del b # refcount drops to 0, Box deallocated, which triggers list dealloc, which triggers int dealloc for each elementReference counting overhead
Refcount manipulation is expensive at scale. Every assignment, every function call, every iteration step touches refcounts. In CPython, this is amortized into the interpreter overhead and isn't usually a bottleneck.
But:
- It poisons CPU caches when many threads touch the same object (cache line bouncing).
- It makes lock-free data structures hard.
- It interacts with the GIL (atomic refcounts without the GIL would slow single-threaded code).
PEP 703 introduces "biased reference counting" to mitigate this for the free-threaded build. The thread that created an object can use non-atomic refcount updates; other threads use atomic updates. This optimizes the common case (objects are usually accessed mostly by their creating thread).
The cycle collector
Refcounting cannot collect cycles. Consider:
a = {}
b = {}
a['b'] = b
b['a'] = a
del a, b # Both dicts have refcount 1 from each other; neither is freedThe cycle GC finds these. It works by tracking container objects (anything that can hold a reference to another Python object) in a generational structure.
Generations: 0, 1, 2. New container objects go into generation 0. When generation 0 reaches a threshold (default 700 allocations), it's collected. Surviving objects move to generation 1. When generation 1 reaches its threshold (default 10 generation-0 collections), it's collected and survivors move to generation 2. Generation 2 is collected the least often.
The algorithm for one generation:
- For each tracked object in the generation, compute its "GC refcount" - the actual refcount minus references coming from objects within this generation.
- Any object with a GC refcount > 0 has an external reference and is reachable. Mark it and its descendants.
- Any unmarked object after this pass is part of an unreachable cycle. Run its finalizer (if any), break the cycle by clearing references, and free.
The GC pauses Python while it runs. For most workloads, the pause is short (milliseconds) because the live set is small. For huge heaps with many container objects, GC pauses can be noticeable.
You can tune or disable the GC:
import gc
gc.set_threshold(1000, 20, 20) # gen 0, 1, 2 thresholds
gc.disable() # Stop automatic collection
gc.collect() # Force a full collection
gc.collect(0) # Collect only generation 0The small-object allocator (pymalloc)
CPython doesn't call malloc for every object. Small objects (up to 512 bytes by default) use a custom allocator called pymalloc. It allocates 256KB "arenas" from the OS and partitions them into pools, which are divided into fixed-size blocks.
This is a slab allocator. Each pool serves one size class. Allocating a 56-byte object pulls a block from the 64-byte pool (sizes are rounded up to a multiple of 8 or 16). Freeing returns the block to the pool's free list. Allocation is O(1) and there's no fragmentation within a size class.
The benefits:
- Faster than calling
mallocrepeatedly (no syscall, no metadata overhead). - No external fragmentation (each pool is one size).
- Cache-friendly (objects of the same type are near each other).
Large objects (> 512 bytes) fall through to the system allocator (malloc/free).
Object interning
Some objects are cached and shared. Small integers (-5 to 256) are pre-allocated singletons - any reference to 42 is the same object. Strings that look like Python identifiers (no spaces, ASCII, short) are interned automatically. You can force interning with sys.intern(string).
a = 256
b = 256
a is b # True - same cached integer
a = 257
b = 257
a is b # False (CPython implementation detail) - 257 is outside the cache
a = "hello"
b = "hello"
a is b # True - identifier-like, interned
a = sys.intern("hello world")
b = sys.intern("hello world")
a is b # True - explicitly internedInterning saves memory for repeated strings and speeds up comparisons (identity check is faster than character-by-character).
slots: skipping the dict
Every Python object has a per-instance dictionary for attributes by default. This costs at least 280 bytes per object on 64-bit, even for instances with two attributes. __slots__ tells Python to use a fixed array instead of a dict:
class Point:
__slots__ = ('x', 'y')
def __init__(self, x, y):
self.x = x; self.y = yMemory savings: typically 5x for small classes. You also can't add attributes not in __slots__, which catches typos at runtime.
__slots__ interacts with inheritance: subclasses without __slots__ regain the dict. Multiple inheritance with __slots__ is tricky. Use them in leaf classes mostly.
weakref: references that don't pin
A weak reference points to an object without increasing its refcount. The object can be collected even while the weakref exists; the weakref then returns None.
import weakref
class Big: pass
b = Big()
r = weakref.ref(b)
r() # <Big object> - the weakref dereferences to b
del b
r() # None - b was collectedUse cases:
- Caches that shouldn't prevent collection (
WeakValueDictionary). - Parent backreferences in trees (avoid cycles).
- Observer patterns where the observer shouldn't keep the subject alive.
WeakValueDictionary and WeakKeyDictionary are the common patterns. Entries vanish when their key or value is collected elsewhere.
del and finalization
__del__ is called when refcount hits zero. It's not a destructor in the C++ sense - the object is about to be freed, and you can do cleanup, but you should be quick.
Pre-Python 3.4, cycles containing objects with __del__ were uncollectable. The GC didn't know in what order to call the finalizers and chose to skip them, leaking the cycle. PEP 442 fixed this in 3.4: cycles with __del__ are collected, finalizers run in unspecified order.
The general advice: don't use __del__ for resource cleanup. Use context managers (with blocks) or explicit close() methods. Finalization timing is unreliable; the file might not close when you expect.
Memory profiling
Tools:
sys.getsizeof(obj): size of one object in bytes (doesn't include referenced objects).tracemalloc: built-in, tracks allocations with their stack trace.pympler: third-party, more detailed analysis.memray: Bloomberg-built, low-overhead, great for production-like workloads.objgraph: visualizes object reference graphs to find leaks.
For finding cycles:
import gc
gc.set_debug(gc.DEBUG_LEAK) # Print collected cycle info
gc.collect()PEP 703 changes
In the free-threaded build:
- Refcount operations are atomic (with biased counting optimization).
- Immortal objects (None, True, False, small ints, interned strings, type objects) have refcount updates skipped entirely. The high bit of
ob_refcntmarks them as immortal. - Container types have per-object locks for thread-safe mutation.
- The cycle GC has additional synchronization to coordinate with running threads.
The single-threaded performance cost is about 10%. The multi-threaded scalability gain is the whole point.
Mental model
Reference counts are bookkeeping: every operation that takes a reference adds 1; every operation that drops it subtracts 1. When the count hits zero, the object dies, fast and deterministically.
Cycles can't be resolved by refcounts alone. The cycle GC scans periodically to find and break them.
The allocator (pymalloc for small, libc malloc for large) hands out memory in chunks; the refcount system decides when to return it.
You almost never call any of this directly. The interpreter handles it. Your job is to avoid creating leaks (unbounded caches, persistent cycles you didn't intend, closures that pin large state) and to use weakrefs where you want a reference that doesn't extend lifetime.
Learn more
- DocsPython docs: Memory Management C APIpython.org
- DocsPython docs: gc modulepython.org
- DocsPEP 442: Safe object finalizationpython.org
- Docs