DNS resolution end to end
Every step of DNS resolution: stub resolver, recursive resolver, authoritative chain, caching, DNSSEC, DoH, and how CDNs use DNS for routing.
What DNS solves
Humans want names. Machines need IP addresses. DNS is the hierarchical, distributed, eventually-consistent database that bridges them. It scales to billions of names because nobody has to know everything; each zone is authoritative for its own slice.
It is older than the modern web (RFC 1034, 1987) and predates HTTPS by a decade. Most of its design choices were made for a network that does not exist anymore.
The actors
- Stub resolver: a tiny piece of code in your OS (
getaddrinfo) or app. Knows one or more recursive resolver addresses (/etc/resolv.conf). - Recursive resolver: 1.1.1.1, 8.8.8.8, your ISP, your router. Does the actual work of walking the hierarchy.
- Root servers: 13 logical instances (a.root-servers.net through m.root-servers.net), each actually hundreds of anycast nodes. Know which TLD servers to ask.
- TLD servers: .com, .org, .io, etc. Know which authoritative servers handle each second-level domain.
- Authoritative servers: the source of truth for a specific zone (example.com). Run by Cloudflare, Route 53, NS1, or self-hosted.
A cold resolution, step by step
In this scenario, the recursive resolver does 3 round trips to external servers (root, TLD, authoritative) plus the round trip from the stub. On a cold cache anywhere in the world, expect 100-300 ms total.
In practice, the recursive resolver caches everything. The root and TLD servers rarely receive queries from any one resolver because their responses live in cache for days.
Caches everywhere
DNS has caching at every level.
- Browser cache: Chrome, Firefox cache for 60 seconds typically, regardless of TTL.
- OS cache:
nscd,systemd-resolved, Windows DNS client. - Router cache: many home routers cache.
- Recursive resolver cache: respects TTL.
- Authoritative cache: not really a cache, the source of truth.
When you publish a new record, every cache between you and the user holds the old value until its copy expires. This is why TTL planning matters.
TTL strategy
- Long TTL (24 hours): stable records, mail servers, root domains. Reduces query load.
- Medium TTL (1 hour): typical web records. Balances change agility with cache hit rate.
- Short TTL (5 minutes): records that might change, like CDN endpoints or failover targets.
- Very short TTL (60 seconds): active failover, blue-green deploys. Costs more queries.
Before a migration, lower the TTL hours or days in advance, wait for the old TTL to drain everywhere, then make the change. After the change, raise TTL back up.
Record types in depth
A and AAAA
A is IPv4. AAAA is IPv6 (the four As come from being a 128-bit address, four times the size of A's 32-bit). Both are direct name-to-address.
CNAME
Canonical name: alias one name to another. www.example.com CNAME example.com. The resolver follows the chain.
Rules: CNAME cannot coexist with other records at the same name (except DNSSEC records). This is why you cannot CNAME the apex domain (example.com) on most providers; the apex needs SOA and NS records. Workarounds: ALIAS, ANAME, or HTTPS records (RFC 9460).
MX
Mail exchanger: where to deliver email for this domain. Has a priority (lower wins).
TXT
Free-form text. Used for SPF (anti-spoofing), DKIM (mail signing), domain verification (google-site-verification=...).
NS
Name servers for the zone. The TLD returns these in the referral. The recursive resolver caches and uses them to query the authoritative servers directly.
SOA
Start of Authority. One per zone. Contains the primary name server, contact email, serial number, refresh timer, retry, expire, minimum TTL.
SRV
Service record. Specifies a port and target for a named service. _sip._tcp.example.com SRV 0 5 5060 sip.example.com.
HTTPS and SVCB
RFC 9460. Modern record that advertises protocol preferences and connection hints in DNS. A browser can learn from the HTTPS record that the server speaks HTTP/3, what ALPN to negotiate, what Encrypted Client Hello config to use, before making any connection.
This is how HTTP/3 discovery is becoming reliable: instead of waiting for an Alt-Svc HTTP header on the first response, the browser learns from DNS.
EDNS0 and large responses
Original DNS limited responses to 512 bytes over UDP. Modern responses (DNSSEC signatures, big TXT records, HTTPS records) exceed this.
EDNS0 (RFC 6891) lets the client advertise a larger UDP buffer size, typically 4096 bytes. If the response would still exceed, the resolver gets TC=1 (truncated) and retries over TCP.
EDNS0 client subnet (ECS) lets the recursive resolver tell the authoritative server the client's network prefix. CDNs use ECS to return geographically-close IPs even when the client uses a distant resolver like 8.8.8.8.
DNSSEC
DNS responses can be forged. A cache-poisoning attacker who guesses the transaction ID and source port can inject a fake A record into a resolver and redirect every user.
DNSSEC signs records with public-key cryptography. The chain of trust runs from the root (signed) down through TLD (signed) to your zone (you sign). Resolvers validate the chain and reject unsigned or invalidly-signed responses.
Adoption is partial. Many zones are not signed. Many resolvers do not validate. Browsers do not enforce. DNSSEC has been "almost ready" for 20 years.
DNS over HTTPS and DNS over TLS
Plain DNS is unencrypted. Anyone on path (your ISP, hotel Wi-Fi) can see and modify queries.
- DoT (RFC 7858): DNS over TLS, port 853. Encrypts the query.
- DoH (RFC 8484): DNS over HTTPS, port 443. Encrypts and tunnels through HTTPS, indistinguishable from web traffic. Cloudflare, Google, Mozilla support.
Firefox enables DoH by default in many regions. Chrome enables when the system resolver matches a known DoH provider. ISPs hate this because it removes their ability to inspect or block.
DNS as load balancing
The simplest load balancer in the world: return multiple A records and let the client pick.
- Round-robin DNS: authoritative returns a list, rotates order. Crude but cheap.
- Weighted: return some IPs more often than others.
- Geo-DNS: return different IPs based on client geography (often using ECS).
- Latency-based: route to the lowest-latency edge.
- Anycast: same IP advertised from many locations; BGP routes the client to the nearest. CDNs love this.
Failure: client may cache a dead IP. Health checks need to update DNS faster than TTL expires.
CDN routing via DNS
When you dig www.cloudflare-customer.com, you get a CNAME to a Cloudflare hostname, which resolves to an anycast IP that routes to the nearest Cloudflare PoP via BGP. The CDN's authoritative DNS picks the best PoP per region.
This is how DNS doubles as global traffic steering. The DNS server effectively decides which datacenter you hit.
Privacy leaks
DNS is the most leak-prone part of HTTPS browsing. The DNS query reveals the domain you visit, even when the connection is HTTPS-encrypted afterward.
Mitigations:
- DoH or DoT for query confidentiality.
- Encrypted Client Hello (ECH) to hide the SNI in TLS.
- Oblivious DoH (RFC 9230) to decouple "who asked" from "what they asked."
Debugging DNS
dig +trace example.com: walk the chain from root yourself.dig @1.1.1.1 example.com: query a specific resolver.dig example.com TXT: query specific record types.dig -x 8.8.8.8: reverse lookup IP to PTR.nslookup: older tool, less verbose.kdig: from Knot, supports DoH and DoT.
Common pitfalls:
- Resolver returns NXDOMAIN: zone does not exist or record missing.
- SERVFAIL: authoritative is down or DNSSEC validation failed.
- REFUSED: resolver does not handle this query.
- TTL=0 record stuck in cache: probably your stub resolver or browser, not the recursive.
When DNS becomes the outage
History is full of DNS-caused outages. AWS Route 53 misconfiguration takes down a third of the internet. Slack DNS migration knocks out collaboration globally. Facebook BGP withdrawal removes their DNS from the internet, locking out their own engineers.
DNS is the single point of failure for many architectures. Use multiple authoritative providers (anycast multi-cloud DNS), keep TTLs sane, monitor authoritative response time and SERVFAIL rate.
Performance optimization
- Preconnect:
<link rel="preconnect" href="https://api.example.com">triggers DNS + TCP + TLS before the request. - DNS prefetch:
<link rel="dns-prefetch">does just DNS. - HTTP DNS records (RFC 9460) bundle connection hints.
- Connection coalescing: HTTP/2 reuses one connection for multiple hostnames that resolve to the same IP and serve the same cert.
Numbers to memorize
- Cold cross-continent lookup: 50-200 ms.
- Cached lookup: under 1 ms.
- UDP DNS max without EDNS0: 512 bytes.
- EDNS0 default: 4096 bytes.
- Typical TTL: 300-3600 seconds.
- Root server count: 13 logical, hundreds of physical via anycast.
Learn more
- Paper
- Paper
- Paper
- Paper
- DocsCloudflare DNS learning centerCloudflare
- DocsJulia Evans: DNS internalsJulia Evans