V8 engine internals
Pipeline from source to optimized machine code, the shape and inline cache system that makes dynamic property access fast, and the deoptimization machinery that handles broken assumptions.
Why a tiered system
A single compiler has to make a tradeoff: compile slow and produce fast code, or compile fast and produce slow code. Browsers cannot afford either extreme. A user clicks a link and expects the page to be interactive in under a second. You cannot spend 500ms compiling. But once the page is interactive, the same code might run for hours, and you cannot afford to run it at interpreter speed.
V8's answer is to do both, and switch between them based on how hot the code is. Cold code runs in the interpreter. Warm code gets a fast baseline compile. Hot code gets a mid-tier optimization. Burning-hot code gets the top-tier optimizer. Each tier has a different compile time and runtime cost tradeoff.
The current pipeline (as of 2024):
- Parser: source to AST, with lazy parsing for functions not yet called.
- Ignition: AST to bytecode, then interprets the bytecode. Collects type feedback.
- Sparkplug: bytecode to unoptimized machine code with no IR pass. Nearly free to compile, ~5-15% faster than Ignition.
- Maglev: bytecode plus type feedback to optimized machine code via a single SSA-based pass. Slower compile, ~2x faster than Sparkplug.
- TurboFan: bytecode plus feedback to highly optimized code with a graph-based optimizer (sea of nodes). Slow compile, ~10x faster than Ignition.
Tier-up is triggered by interrupt checks on loop backedges and function entries. Every iteration of a loop increments a counter; when it crosses a threshold, the engine schedules the function for the next tier.
Hidden classes (Shapes, Maps)
JavaScript objects are syntactically dynamic: you can add a property at any time. But V8 wants to access properties at a known memory offset, like a C struct. The bridge between these is the hidden class system.
Every object has a hidden class. When you write {} you get an empty hidden class. When you add obj.x = 1, V8 transitions the object to a new hidden class that has x at offset 0. Add obj.y = 2 and you transition again to a class with x at offset 0 and y at offset 1.
Multiple objects that go through the same transition chain share the same hidden class. This is the whole point: if a million objects all have {x, y, z} in that order, they share one hidden class, and the engine can compile code that accesses them at fixed offsets.
function Point(x, y) {
this.x = x;
this.y = y;
}
const a = new Point(1, 2); // Hidden class A
const b = new Point(3, 4); // Hidden class A (shared)
b.z = 5; // b transitions to hidden class B
function bad(x, y) {
if (x > 0) { this.x = x; this.y = y; }
else { this.y = y; this.x = x; }
}
// Conditional order = two different hidden classes for the same conceptual shapeHidden class transitions are tracked in a transition tree. The root is the empty class. Each property addition with a name and type is an edge. The cost of a transition is small (a few instructions), but the cost of having many similar-but-distinct hidden classes is large because inline caches degrade.
Inline caches (ICs)
An inline cache is a small piece of state attached to a property access site. The first time obj.x runs, the cache is empty (uninitialized). V8 does a slow lookup, finds x at offset N for shape S, and records "(S, N)" in the cache. The cache is now monomorphic.
Next call: V8 checks if obj has shape S. If yes, jump to offset N. One comparison plus one load. Faster than any hash map can be.
If a second shape T shows up, the cache becomes polymorphic: it stores up to four (shape, offset) pairs and does a small linear scan. Still fast, but slower than monomorphic. If a fifth shape arrives, the cache goes megamorphic and falls back to a generic property lookup (a hash on the global IC table). This is significantly slower.
State machine: uninitialized -> monomorphic -> polymorphic (2-4) -> megamorphic (5+). Once megamorphic, the cache rarely recovers.
Why this matters: a function that operates on one shape is fast. A function that operates on many shapes is slow. The "many" threshold is 4. Library code that handles every possible input often goes megamorphic and stays slow. Application code that handles one or two shapes stays monomorphic and gets the full JIT benefit.
Speculative optimization
TurboFan does not just optimize the bytecode. It optimizes based on assumptions gathered from runtime feedback. If Ignition saw that add(a, b) was always called with two small integers, TurboFan generates code that assumes that, with a guard at the entry. The guard checks the assumption. If it holds, the optimized code runs. If it fails, the code deoptimizes: control jumps back to the interpreter, the optimized version is discarded, and execution continues in Ignition.
Common assumptions:
- Argument types (always int32, always string, always a specific shape)
- Array element types (PACKED_SMI_ELEMENTS, PACKED_DOUBLE_ELEMENTS, PACKED_ELEMENTS, HOLEY_*)
- Global property values (assumed constant if not written since startup)
- Function call targets (devirtualized via the call site IC)
Deopts are not just slow. They can cause functions to oscillate between tiers, never settling on optimized code. V8 has a "deopt limit" per function: too many deopts and the function is marked "do not optimize." From then on it runs at Sparkplug speed forever.
You can see this with --trace-deopt and --trace-opt flags on Node. In production code, the symptom is "this function used to be fast and now it is slow after we added a new code path." The new path triggered a deopt and the function never recovered.
Hidden class pitfalls
Order-sensitive property addition. The classic bug:
function makeUser(data) {
const u = {};
if (data.email) u.email = data.email;
if (data.name) u.name = data.name;
if (data.age) u.age = data.age;
return u;
}Eight possible hidden classes from one factory. Code that consumes the result goes polymorphic immediately.
Fix: always assign all fields, using sentinels for missing ones, or use a class declaration. Classes give V8 a strong hint about the final shape and let it predict transitions.
Delete operator. delete obj.x is the kiss of death for ICs. The object transitions to "dictionary mode" (a real hash map) and loses all the fast property access. Set to undefined if you want to clear a value. Use a separate "deleted keys" set if you need true deletion semantics.
Array element kinds. V8 has multiple internal representations for arrays:
- PACKED_SMI_ELEMENTS: dense array of small integers (the fastest)
- PACKED_DOUBLE_ELEMENTS: dense array of floats
- PACKED_ELEMENTS: dense array of arbitrary values
- HOLEY_*: same, but with holes (skipped indices)
Transitions go from packed to holey and from smi to double to elements. They are one-way. Once an array is HOLEY_ELEMENTS, it stays that way. Adding arr[1000] = x to a 10-element array makes it holey. Mixing integers and floats makes it doubles.
Memory layout
V8 stores objects on the heap as a sequence of "tagged pointers." On 64-bit systems, pointers are aligned to 4 bytes, so the lowest two bits are always zero. V8 uses them as type tags. A tagged value can be:
- 31-bit small integer (Smi) packed inline with a tag bit
- Pointer to a heap object (string, array, plain object, function)
The "pointer compression" feature (default since 2020) stores 32-bit compressed pointers within a 4GB heap, halving memory use for objects. The base pointer is added at load time.
Strings have several representations: SeqString (contiguous bytes), ConsString (concatenation of two strings, lazy), SlicedString (substring, lazy), ExternalString (UTF-16 or one-byte from outside V8). The ConsString optimization is why a + b + c + d is faster than you would expect: no intermediate copies, just a tree of references. The tree is flattened when you read a character.
Garbage collection coupling
V8's GC (Orinoco) is generational and concurrent. Young generation uses a scavenger (semi-space copy). Old generation uses mark-compact with concurrent marking. We covered GC in the memory chapter, but the V8-specific detail is that GC interacts with the JIT through write barriers. The optimizer has to know which writes cross generations or need to update the remembered set. TurboFan inserts these barriers automatically; hand-written JIT code (like Sparkplug) has them baked into its templates.
What to do as a developer
The list is short and boring:
- Write code with stable types. A variable that holds a string should always hold a string. A function parameter should have a consistent type.
- Initialize objects with all their fields up front, in the same order. Use classes when the shape is fixed.
- Avoid
delete. Set toundefinedor use a Map. - Avoid
arguments. Use rest parameters. - Avoid
evalandwith. They poison scope analysis and prevent inlining. - Avoid
try/catchin hot loops in older V8. (Modern V8 handles it fine, but the assumption sometimes lingers in old advice.) The real cost today is when the try body inhibits an optimization the engine could otherwise do. - Profile with
--profand--trace-optif a function is suspiciously slow. The output tells you exactly when functions get optimized and when they get deopted.
Why this design wins
The alternative is what JavaScriptCore (Safari) and SpiderMonkey (Firefox) also chose: tiered JITs with speculative optimization. Every modern JS engine looks like V8 because the problem constraints (dynamic typing, fast startup, hot loops) force a similar shape of solution. The differences are in the tuning: how many tiers, what the IRs look like, what assumptions to speculate on.
What V8 does better than most is the startup story. Ignition's bytecode is small and easy to cache. Sparkplug compiles fast enough that you can JIT every function that runs once. Lazy parsing skips function bodies until first call. Code caching saves bytecode between page loads. The result is that V8 starts fast and ramps up smoothly. That is why Chrome ate the world and why Node ate the backend.
Learn more
- Article
- Article
- ArticleMathias Bynens: Shapes and inline cachesMathias Bynens
- Article