WebSocket protocol
Full protocol walkthrough: handshake, framing, masking, control frames, extensions, and scaling considerations.
Why WebSocket exists
HTTP is half-duplex request-response. Before WebSocket, real-time apps used long polling, Comet, or Adobe Flash sockets. All were workarounds. WebSocket (RFC 6455, 2011) standardized full-duplex framing over a single TCP connection.
The key constraint: it had to tunnel through existing HTTP infrastructure. Most corporate proxies only allow HTTP and HTTPS. So WebSocket starts as HTTP and upgrades, allowing it to traverse the same port (80 or 443) and the same proxy rules.
The opening handshake
The client initiates with an HTTP/1.1 GET that includes specific headers signaling the upgrade.
GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Sec-WebSocket-Protocol: chat.v1
Sec-WebSocket-Extensions: permessage-deflate
Origin: https://example.com
Headers in detail:
Upgrade: websocket: triggers the protocol switch.Connection: Upgrade: tells proxies this is an upgrade, not a normal request.Sec-WebSocket-Key: 16-byte random nonce, base64.Sec-WebSocket-Version: 13: only valid version in current spec.Sec-WebSocket-Protocol: optional subprotocol list, server picks one.Sec-WebSocket-Extensions: optional, advertises permessage-deflate etc.Origin: browsers send this; servers should validate.
Server replies:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat.v1
Sec-WebSocket-Accept is base64(SHA1(Sec-WebSocket-Key + "258EAFA5-E914-47DA-95CA-C5AB0DC85B11")). The magic string is the GUID from RFC 6455. This proves the server speaks WebSocket; it is not security.
After the 101 response, the TCP connection is no longer carrying HTTP. Both sides begin sending WebSocket frames.
Frame format
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len | Extended payload length |
|I|S|S|S| (4) |A| (7) | (16/64) |
|N|V|V|V| |S| | (if payload len==126/127) |
| |1|2|3| |K| | |
+-+-+-+-+-------+-+-------------+ - - - - - - - - - - - - - - - +
| Extended payload length continued, if payload len == 127 |
+ - - - - - - - - - - - - - - - +-------------------------------+
| |Masking-key, if MASK set to 1 |
+-------------------------------+-------------------------------+
| Masking-key (continued) | Payload Data |
+-------------------------------- - - - - - - - - - - - - - - - +
: Payload Data continued ... :
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| Payload Data continued ... |
+---------------------------------------------------------------+
- FIN: 1 if this is the last frame of a logical message.
- RSV1-3: extension bits (permessage-deflate uses RSV1).
- Opcode: 0x0 continuation, 0x1 text, 0x2 binary, 0x8 close, 0x9 ping, 0xA pong.
- MASK: 1 if payload is masked. Always 1 for client-to-server, always 0 for server-to-client.
- Payload length: 7-bit, or 16-bit if 126, or 64-bit if 127.
- Masking key: 4 bytes, used to XOR the payload.
Minimum frame: 2 bytes header for a tiny server-to-client message. Maximum payload: 2^63 bytes (effectively unlimited, but applications usually fragment).
Why masking
Client-to-server frames are masked. This is a defense against cache poisoning attacks where a malicious script could craft WebSocket frames that, if interpreted by a misbehaving HTTP proxy, would inject malicious content into the proxy cache. Masking ensures no controllable byte pattern reaches the proxy.
Masking is not encryption; the key is sent in cleartext. It is a structural defense, not a confidentiality mechanism. For confidentiality, use wss (WebSocket over TLS).
Server-to-client frames are not masked. They go server-to-browser only, and the attack surface does not exist in that direction.
Control frames
Three control opcodes:
- 0x8 Close: optional 2-byte status code + reason. Replied with own close, then TCP FIN.
- 0x9 Ping: optional payload, expects pong with same payload.
- 0xA Pong: response to ping.
Control frames must be 125 bytes or smaller and cannot be fragmented. They can interleave between fragments of a data message.
Pings are useful for keepalive (NAT bindings, idle proxy timeouts) and for measuring RTT.
Close codes
Standard codes from RFC 6455:
- 1000: normal closure.
- 1001: going away (server shutting down).
- 1002: protocol error.
- 1003: unsupported data type.
- 1005: no status (reserved, do not send).
- 1006: abnormal closure (reserved, do not send, indicates connection dropped without close frame).
- 1007: invalid UTF-8 in text frame.
- 1008: policy violation.
- 1009: message too big.
- 1011: server error.
- 1012-1014: service restart, try again later, bad gateway.
- 3000-3999: registered by libraries.
- 4000-4999: application use.
Fragmentation
A logical message can be split across multiple frames. First frame has opcode (text or binary) and FIN=0. Continuation frames have opcode 0x0 and FIN=0 until the last, which has FIN=1.
Use case: streaming large messages where you do not know the total size upfront, or to interleave control frames between large data chunks.
permessage-deflate
RFC 7692 adds per-message compression. Negotiated in the handshake. Each message is compressed with DEFLATE before framing. Saves bandwidth on repetitive payloads (JSON, text) at CPU cost.
Watch out: context takeover between messages can amplify CRIME-style attacks if you mix attacker-controlled and secret data. Many servers disable context takeover by default.
WebSocket over HTTP/2 and HTTP/3
RFC 8441 defines how to bootstrap WebSocket over HTTP/2 using the :protocol pseudo-header and the CONNECT method. This lets WebSocket share a connection with other HTTP/2 streams.
RFC 9220 extends this to HTTP/3.
Browser support is mixed. Most production WebSocket still runs over HTTP/1.1 upgrade.
Subprotocols
A subprotocol is a contract negotiated in the handshake: Sec-WebSocket-Protocol: wamp.2.json for WAMP, mqtt for MQTT over WebSocket, etc. Server picks one from the client's list and echoes it back.
Subprotocols define message semantics: framing within frames, RPC patterns, pub-sub channels.
Authentication
WebSocket has no built-in auth. Options:
- Cookie: browser sends cookies in the upgrade GET if same-origin.
- Authorization header: works in the upgrade GET; browsers cannot set custom headers on WebSocket constructor, only the URL.
- Token in URL:
wss://api.example.com/ws?token=.... URL logs are a leak risk. - Token in subprotocol: hacky but works.
new WebSocket(url, ['v1.json', 'token.eyJ...']). - First-message auth: connect, then client sends a token frame, server validates before processing further.
Production pattern: short-lived JWT issued by REST endpoint, passed in upgrade cookie or first message.
Reconnection
WebSocket has no built-in reconnect. Connections drop on network changes, idle timeouts, server restarts. The client must implement reconnect with exponential backoff and resume state somehow.
Patterns:
- Sequence numbers on every message; on reconnect, client says "I last saw N, give me N+1 onward."
- Resumable sessions: server holds queued messages for a short TTL after disconnect.
- Stateless reconnect: client fetches full state via REST, then opens new WebSocket.
Libraries like Socket.IO, Phoenix Channels, and AblyJS handle this for you.
Backpressure
If the server sends faster than the client can consume, frames pile up. In Node.js the buffer is unbounded by default until memory dies.
Backpressure strategies:
- Check
ws.bufferedAmount(browser) orsocket.sendreturn value (server) and pause your producer. - Use a queue with bounded size; drop or block when full.
- Use a library that exposes
drainevents (like thewsNode library).
This is the most common production failure for naive WebSocket apps.
Scaling and fan-out
A WebSocket connection is sticky to one server (state lives in that process). Horizontal scaling needs a pub-sub layer.
When a client on Server 1 sends a chat message that needs to reach a client on Server 2, the message goes Server 1 to Redis to Server 2 to client. Redis pub-sub, NATS, Kafka, or a managed service like Pusher handle the fan-out.
Load balancer must support WebSocket (HTTP/1.1 upgrade) and sticky sessions (consistent hashing or cookie-based). AWS ALB supports both.
Common production issues
- Idle timeouts: proxies and load balancers kill idle connections after 30-300 seconds. Send pings every 30 seconds.
- Mobile network changes: phone switches Wi-Fi to LTE, WebSocket drops. Need reconnect with state resumption.
- Server restarts: drain connections politely with close code 1001. Clients reconnect to new instance.
- Memory leaks from unbounded send buffers under backpressure.
- Connection limits: each WebSocket is one file descriptor and one TCP connection. ulimit and ephemeral port exhaustion matter at scale.
WebSocket vs SSE vs HTTP/3
- WebSocket: full-duplex, binary or text, low overhead, mature ecosystem. Best for chat, games, collaborative apps.
- SSE: server-to-client only, text only, runs over plain HTTP, automatic reconnect built into EventSource. Best for server notifications.
- HTTP/3 streams: bidirectional but request-response semantics. Use gRPC bidi for full-duplex over HTTP/3.
Numbers to memorize
- Handshake: 1 RTT after TCP+TLS, so 3 RTTs total on cold start (TCP + TLS + upgrade).
- Frame header: 2-14 bytes.
- Default max message in browsers: usually no hard limit, but practical 1 MB before chunking matters.
- Recommended ping interval: 30 seconds.
- Typical proxy idle timeout: 60 seconds.
Learn more
- Paper
- Paper
- Paper
- Docs
- DocsHigh Performance Browser NetworkingIlya Grigorik