TCP vs UDP

Deep dive into TCP's reliability machinery, UDP's minimalism, and when to use which in modern systems.

The default is TCP

If you cannot articulate why your protocol needs UDP, use TCP. TCP solves a deceptively long list of problems for you, and the few microseconds it costs are dwarfed by the milliseconds you save in not having to handle every failure mode yourself.

TCP's reliability machinery

TCP turns a lossy, reordering, congested IP network into a reliable, ordered byte stream. It does this with five mechanisms working together.

Sequence numbers and acknowledgments

Every byte in a TCP stream has a sequence number. The receiver acknowledges the highest contiguous byte received. If bytes 1-100 arrive but 101-200 are lost and 201-300 arrive, the receiver sends ACK 100 (and possibly a SACK noting 201-300 are also held).

The sender keeps a retransmission timer. If no ACK arrives within the retransmission timeout (RTO), it resends. RTO is computed from a smoothed round-trip time estimate.

Three-way handshake

Before any data flows, both sides exchange initial sequence numbers via SYN, SYN-ACK, ACK. This costs 1 RTT. The handshake also negotiates options: window scale, SACK permitted, MSS, timestamps.

Flow control via receive window

The receiver advertises a window: "I can buffer this many more bytes." The sender will not transmit beyond the window. This prevents a fast sender from overwhelming a slow receiver.

Window scaling (RFC 7323) multiplies the 16-bit window field to support modern bandwidth-delay products. Without it, max window is 64 KB, which caps a single TCP flow to about 5 Mbps on a 100 ms RTT link.

Congestion control

This is the part that keeps the internet alive. TCP estimates how much data is in flight (the congestion window, cwnd) and grows or shrinks it based on signals.

Slow start: exponential growth from a small initial cwnd (typically 10 segments today) until loss or threshold.
Congestion avoidance: linear growth after slow start.
Fast retransmit: 3 duplicate ACKs trigger immediate retransmit without waiting for RTO.
Fast recovery: halve cwnd on loss instead of resetting to 1.

Classic algorithms: Reno, NewReno, Cubic (Linux default since 2006). Newer: BBR (Google, 2016), which models bottleneck bandwidth and RTT directly instead of using loss as a signal. BBR can dramatically improve throughput on lossy long-fat links.

Ordered delivery

TCP reassembles segments by sequence number before delivering to the application. If segment 2 arrives before segment 1, the kernel buffers it and waits. The application never sees out-of-order data.

UDP: the absence of all that

UDP (RFC 768) is 8 bytes of header on top of an IP packet: source port, dest port, length, checksum. That is the entire protocol.

There is no connection. There is no ACK. There is no retransmit. There is no ordering. There is no flow control. There is no congestion control. If your packet is lost, your packet is lost. If two packets are reordered, the receiver sees them in the wrong order. If you spray faster than the link can handle, you melt the network.

This sounds bad, and for most applications it is. But for some workloads the lack of TCP machinery is exactly what you want.

When UDP is the right call

DNS

A DNS query is one packet out, one packet back. Setting up TCP would cost 1 extra RTT for a payload that fits in 512 bytes. UDP is the obvious choice. If the response is too big, DNS falls back to TCP.

Real-time media

Voice and video over RTP. If a frame is lost, you do not want to wait 200 ms for retransmit. You want to drop it and move on. TCP's head-of-line blocking would make calls unusable.

Game state

Multiplayer games send position updates 30-60 times per second. If update N is lost, update N+1 supersedes it. Retransmit is worse than useless.

NTP

A timestamp exchange. Tiny payload, no reliability needed.

QUIC

Here is the modern twist. QUIC needs reliability, ordering, and encryption. But TCP cannot evolve fast enough (it lives in the kernel; you cannot deploy a new TCP option globally), and middleboxes drop unknown L4 protocols. So Google built QUIC over UDP, putting reliability and TLS in userspace. HTTP/3 runs over QUIC.

QUIC fixes TCP's biggest weakness: head-of-line blocking across streams. In HTTP/2 over TCP, one lost packet stalls every concurrent stream. In HTTP/3 over QUIC, only the affected stream stalls.

The head-of-line blocking trap

TCP delivers bytes in order. If you multiplex 10 streams over one TCP connection (as HTTP/2 does), a single lost packet stalls all 10 until the retransmit arrives. This is HoL blocking at the transport layer.

QUIC solves this by tracking sequence numbers per stream. A loss on stream 3 does not stall streams 1 and 2.

This is the single biggest reason HTTP/3 was worth building.

TCP performance pitfalls

TCP congestion window dynamics

Slow start hurts short-lived connections. A 10 KB response often finishes before TCP fully ramps up. Mitigations: HTTP/2 multiplexing, connection reuse, TCP Fast Open, larger initial cwnd.
Nagle's algorithm coalesces small writes. Combined with delayed ACKs it can add 200 ms latency. Disable with TCP_NODELAY for interactive protocols.
Bufferbloat: deep buffers in routers cause RTT to balloon under congestion, confusing loss-based congestion control. BBR handles this better.
Connection establishment cost: 1 RTT for TCP, plus 1-2 RTTs for TLS. Use HTTP/2 connection coalescing or HTTP/3 0-RTT.

UDP performance pitfalls

No congestion control means your app must implement it or accept network damage. WebRTC uses Google Congestion Control. QUIC uses TCP-style algorithms.
No NAT keepalive. UDP NAT bindings expire in 30-60 seconds. Long-running UDP apps need keepalive packets.
MTU discovery is harder. TCP probes with MSS; UDP apps must implement PLPMTUD or send small.
Receiver may drop packets if the receive buffer fills. Tune SO_RCVBUF.

The kernel boundary

TCP lives in the kernel. Every byte you send copies from userspace to kernel space, gets segmented, has headers added, and goes out the NIC. Modern optimizations: sendfile, splice, zero-copy via io_uring, kernel TLS (kTLS).

UDP is also in the kernel but the protocol is so thin that userspace can do most of the work. QUIC implementations like quiche or ngtcp2 run entirely in userspace and use UDP sockets as a transport. This means QUIC can deploy a new congestion controller or feature with an app update; no kernel upgrade required.

Numbers to memorize

TCP handshake: 1 RTT.
TLS 1.3 over TCP: 1 RTT (or 0-RTT on resume).
TLS 1.3 over QUIC (HTTP/3): 0 RTT on resume, 1 RTT on cold start.
Typical residential RTT: 20-50 ms.
Typical mobile RTT: 50-200 ms.
Initial TCP cwnd: 10 segments, about 14 KB.
Default MTU: 1500 bytes (Ethernet), 1280 bytes (guaranteed for IPv6).
TCP header: 20 bytes minimum, 60 max.
UDP header: 8 bytes.

Decision framework

Use TCP if: you need ordering, reliability, or congestion control, and you are not willing to implement them yourself.

Use UDP if: you have very small request-response, real-time media, or you are implementing your own L4 (QUIC, custom transport).

Use QUIC if: you want the best of both, you control both endpoints, and you are okay paying the userspace CPU cost.

Learn more

Paper
RFC 793: Transmission Control ProtocolIETF
Paper
RFC 5681: TCP Congestion ControlIETF
Paper
RFC 9000: QUIC: A UDP-Based Multiplexed and Secure TransportIETF
Docs
High Performance Browser NetworkingIlya Grigorik
Paper
BBR Congestion ControlGoogle Research

Deep dive15 min read← Back to crisp