CDNs and anycast routing
How CDNs actually work: anycast vs unicast, cache hierarchies, edge compute, BGP routing, and the failure modes that bite you in production.
Why CDNs exist
The speed of light is 200,000 km/s in fiber. A round trip from New York to Sydney is 16,000 km one way, so 160 ms RTT just from physics. Add TCP and TLS handshakes and the cold-start latency is half a second before any application byte.
CDNs put servers near users so the long-distance hop happens once (origin fetch) and then never again until the cache expires. The user gets a local round trip.
Beyond latency, CDNs absorb traffic spikes and DDoS attacks that would crush an origin.
Anycast: one IP, many places
Unicast: one IP, one location. Most servers. Multicast: one IP, multiple receivers, used inside networks. Rare on internet. Anycast: one IP, many locations. Network routes you to the nearest one.
BGP is the magic. The CDN announces the same IP prefix (say 1.1.1.0/24) from every PoP via BGP. Each upstream provider sees a route through whichever PoP they peer with. BGP's shortest-AS-path algorithm naturally prefers nearer routes.
When a PoP goes down, BGP withdraws the route from there. Within seconds (sometimes minutes), traffic shifts to the next-nearest PoP. No DNS change, no TTL wait.
Anycast trade-offs
Pros:
- Sub-second failover when a PoP dies.
- Same hostname/IP everywhere; no geo-DNS complexity.
- DDoS absorption: attack from many sources hits many PoPs, no single bottleneck.
Cons:
- Anycast is stateless: a connection might shift PoPs mid-flow if BGP reconverges. Most CDNs design around this for HTTP; long-lived stateful protocols (WebSocket, gRPC streaming) need extra care.
- BGP path selection is not geographic, it is topological. Sometimes a Singapore user hits a Tokyo PoP because that is the shorter AS path.
- Hard to do canary deployments in one region only when the IP is global.
CDN architecture: PoPs and hierarchies
A modern CDN has hundreds of PoPs (Points of Presence). Cloudflare claims 300+ cities. Fastly has 100+ POPs (they call them larger and more powerful). AWS CloudFront has 600+ edge locations.
Inside each PoP: many cache servers, load balancers, TLS terminators. The PoP is itself a small data center.
Many CDNs have a hierarchical cache: edge PoPs (closest to user) hold hot content; shield PoPs or tier-2 caches sit between edge and origin to absorb cache misses. A miss at an edge fetches from shield; only a shield miss touches origin.
This compounds your effective cache hit rate.
The lifecycle of a CDN-cached request
- User types
www.example.com. - DNS resolves to the CDN's anycast IP (often via a CNAME from your DNS to the CDN's CNAME).
- User's packet hits the nearest PoP.
- PoP terminates TCP and TLS using your cert (uploaded to the CDN or via Cloudflare's universal SSL).
- PoP parses the HTTP request and computes a cache key (typically host + path + maybe query + Vary headers).
- PoP checks local cache. Hit means serve from SSD in 1-5 ms.
- Miss: PoP fetches from shield or origin. Latency depends on distance to origin and origin response time.
- Response is cached per Cache-Control and Surrogate-Control headers, then served to user.
Cache-Control and the rules
The HTTP cache spec (RFC 9111) governs caching. Key directives:
public: cacheable by shared caches (CDN, proxies).private: only end-user cache (browser), not shared.no-store: do not cache anywhere.no-cache: cache but revalidate every time.max-age=N: cache for N seconds in any cache.s-maxage=N: cache for N seconds in shared caches only (overrides max-age there).must-revalidate: do not serve stale.stale-while-revalidate=N: serve stale up to N seconds while async refreshing.stale-if-error=N: serve stale up to N seconds if origin returns error.
Surrogate-Control is CDN-specific (Fastly, Cloudflare) and overrides Cache-Control for the CDN only, so you can have short browser cache and long CDN cache.
Cache keys and Vary
The cache key determines what counts as "the same response." Default: scheme + host + path + query.
Vary: Accept-Encoding adds the Accept-Encoding header value to the key. Now gzip, brotli, and identity get separate entries.
Vary: Cookie is usually a disaster: every distinct cookie makes a separate cache entry, hit rate collapses. Use cookie stripping or only-cache-when-no-cookie rules instead.
Invalidation
The hard problem in computer science. Three approaches:
- Time-based: set max-age, wait for expiry. Simple, predictable, slow to update.
- Versioned URLs:
/assets/app.abc123.js. Change the URL when content changes. Old version stays cached forever (or until evicted by LRU). Perfect for assets. - Purge: CDN API call to invalidate a URL or tag. Cloudflare purge, Fastly surrogate keys, CloudFront invalidations. Fast (seconds) but costs more or has limits.
Cloudflare and Fastly support tag-based invalidation: tag a response with Surrogate-Key: post-42 user-7, later purge all responses with Surrogate-Key: post-42. Powerful for CMS-style invalidation.
Edge compute
Modern CDNs run your code at every PoP:
- Cloudflare Workers: V8 isolates running JS or Wasm. Sub-millisecond cold start.
- Fastly Compute@Edge: Wasm via Lucet. Strong isolation.
- AWS Lambda@Edge / CloudFront Functions: more limited but integrated.
- Vercel Edge Functions: built on Cloudflare Workers underneath.
Use cases: A/B testing, auth checks, request rewriting, geolocation, header manipulation, dynamic redirects, even full APIs and SSR.
The constraint: short execution time (50 ms typical), small memory (128 MB typical), no long-lived state. Stateless functions over a request/response model.
TLS at the edge
The PoP terminates TLS. This means the PoP holds your private key.
Two security models:
- Trust the CDN: upload your cert and key. Easiest. Cloudflare's default.
- Keyless SSL: keep the key on your side; CDN proxies the signing operation to you over a secure channel. Cloudflare offers this for sensitive customers.
Either way, the PoP sees plaintext HTTP after decryption.
Cache miss penalties
A miss is not free. The PoP makes a request to your origin, which:
- Adds latency for the fetch.
- Loads your origin server.
- Can stampede: if 1000 users miss the same URL simultaneously, your origin gets 1000 simultaneous requests. CDNs offer "request collapsing" or "tiered caching" to deduplicate these.
Cloudflare's "Argo Smart Routing" and similar features route origin fetches over optimized paths.
DDoS absorption
Anycast distributes attack traffic across all PoPs. A 1 Tbps attack on a 100-PoP network is 10 Gbps per PoP, often within capacity. The CDN's WAF and rate limiting drop bad traffic at the edge.
This is why Cloudflare can absorb attacks that would melt any single datacenter.
Bot management and WAF
WAF (Web Application Firewall) inspects HTTP requests and blocks known attack patterns: SQL injection, XSS, path traversal. Modern WAFs use ML-based detection.
Bot management distinguishes humans from bots using fingerprinting, challenge-response, behavioral analysis. Cloudflare's Bot Management, Akamai Bot Manager, AWS WAF Bot Control.
Geographic routing nuances
EDNS Client Subnet (ECS) lets a recursive resolver send the client's IP prefix to the authoritative server, so the CDN can return geographically-relevant IPs even when the client uses 8.8.8.8.
Without ECS, a user in Sydney using a US resolver might get routed to a US PoP, killing CDN benefit. ECS fixes this.
Some privacy-focused resolvers (Cloudflare's 1.1.1.1) do not send ECS. They argue privacy beats marginal latency. CDNs route by the resolver's IP, which is usually still globally distributed.
Common production failures
- Cache key collisions:
Vary: User-Agentblew up your cache because every UA string is unique. - Cookie leak: a logged-in user got served another user's cached HTML because Cache-Control: public was set on a personalized page.
- Origin overload after purge: purging "/products/*" causes thundering herd of fetches. Use staggered or tag-based purge.
- DNS-CDN mismatch: your DNS still points to your origin during a switchover, traffic skips the CDN.
- Cert mismatch: CDN cert says cloudflareprovisionedcert.com because you forgot to upload yours.
Vendor comparison (2026 snapshot)
- Cloudflare: huge anycast network, free tier, Workers for edge compute, integrated DNS, WAF, DDoS, R2 storage, D1 SQLite. Strong on developer ergonomics.
- Fastly: enterprise focus, instant purge, VCL for fine-grained config, Compute@Edge in Wasm. Strong on real-time editorial use cases (NYT, Stripe).
- AWS CloudFront: tightly integrated with S3, Lambda@Edge. Less aggressive PoP expansion. Better when already in AWS.
- Akamai: oldest, biggest enterprise footprint. Complex pricing, comprehensive features. Less developer-friendly.
When you do not need a CDN
- Single-region internal API with all users on the same VLAN.
- Highly personalized HTML where cache hit rate would be near zero.
- Compliance constraints that require data stay in one jurisdiction (most CDNs offer region-locked options now).
Even then, CDNs often help with DDoS protection and TLS offload.
Numbers to memorize
- Local PoP RTT: 5-30 ms.
- Cross-continent direct: 100-300 ms.
- Typical CDN cache hit rate: 60-95% depending on content.
- TLS handshake at PoP: 50-100 ms first time, 0-RTT on resume.
- CDN purge propagation: 1-30 seconds depending on vendor.
- BGP convergence on PoP failure: 5-60 seconds.
Learn more
- ArticleCloudflare: How anycast worksCloudflare
- Docs
- Paper
- Docs
- DocsHigh Performance Browser NetworkingIlya Grigorik
- ArticleCloudflare blog: BGP and anycastCloudflare