DNS and CDNs

Content delivery networks live or die by their ability to route users to the nearest, fastest, healthiest server. The primary tool they use for this is DNS. Understanding how CDN DNS works explains why apex CNAME support matters, why your TTLs affect CDN performance, and why DNS-based health checks look different from application-layer health checks.

How CDNs use DNS for traffic steering

When you put example.com behind a CDN, you typically change your DNS records to point to the CDN's infrastructure. The CDN then gives every user who queries for your domain a different answer — based on where they are, which of the CDN's servers are healthy, and what the current load looks like.

This is geographic routing at the DNS layer. The CDN's authoritative servers know the source IP of the recursive resolver making the query (not the end user's IP, but the resolver's IP, which is usually geographically close). They respond with the IP of the CDN point-of-presence (PoP) closest to that resolver.

A user in Paris querying cdn-example.com gets back 185.31.17.1 (a Paris PoP). A user in Singapore gets back 103.21.244.1 (a Singapore PoP). Same domain name, different answers, based on the query source.

EDNS Client Subnet (ECS, RFC 7871) improves this. With ECS enabled, the recursive resolver passes a truncated version of the end user's IP (typically a /24) along with the query. The CDN's authoritative server can use the actual user's location rather than the resolver's location. This matters when users are served by centralized resolvers far from their physical location — Google's 8.8.8.8 and Cloudflare's 1.1.1.1 are globally distributed, so ECS isn't essential with those resolvers, but corporate recursive resolvers or ISP resolvers might be centralized in a way that produces suboptimal CDN routing without ECS.

Latency-based routing

Beyond geography, some CDNs do active latency measurement to route to the fastest PoP rather than just the closest one. Geography and latency are usually correlated but not identical — a geographically close PoP might be having network problems that make a slightly farther one faster in practice.

AWS Route 53's latency-based routing records work this way: Route 53 maintains a database of measured latencies between AWS regions and various networks, and returns the record pointing to the region with the lowest measured latency for the querying network.

This is more dynamic than pure geo-routing but also more complex to debug. When a user gets routed to what seems like the wrong region, the answer is usually in the latency database rather than a misconfigured record.

The apex CNAME problem

Here's the DNS constraint that causes the most operational friction with CDNs:

A CNAME record cannot coexist with other record types for the same name. The DNS specification (RFC 1034) is explicit about this. A CNAME says "this name is an alias for another name." Nothing else can be at that name.

For subdomains, this is fine. www.example.com can be a CNAME pointing to example.cdn.net. But what about example.com itself — the apex or root of the zone?

A zone apex always has NS and SOA records. By definition. If you add a CNAME to the zone apex, you violate the spec — you can't have a CNAME alongside NS and SOA. No compliant DNS implementation will let you do this correctly.

But users want example.com (without the www) to work. And CDNs want you to use their load-balanced hostnames, not hardcoded IPs. The traditional answer was: add an A record at the apex pointing to a CDN IP. But CDN IPs change. That breaks when CDN infrastructure changes.

The solution CDNs implemented is called CNAME flattening (also called ALIAS records or ANAME records, depending on the provider).

How CNAME flattening works: The DNS provider's authoritative server resolves the target CNAME at query time and returns an A/AAAA record in the response. From the client's perspective, they get an A record for the apex. Under the hood, the authoritative server is doing the CNAME lookup internally and synthesizing the response.

# What you configure in the DNS provider:
example.com   ALIAS   example.cdn.cloudflare.net

# What clients receive in DNS responses:
example.com   A       104.21.45.89
example.com   A       172.67.182.166

Cloudflare calls this a CNAME at the apex (which they support because they control the authoritative server). Route 53 calls it ALIAS records. Most modern DNS providers support this in some form.

The caveat: CNAME flattening is not standardized. Each provider implements it differently. It works well within a single provider's ecosystem but can behave unexpectedly when transferring zones or using secondary servers from a different provider.

How the major CDNs route traffic

Cloudflare uses anycast routing alongside DNS-based steering. Many Cloudflare IP addresses are anycast — the same IP announced from many locations. BGP routing steers packets to the nearest PoP automatically. DNS returns anycast IPs, so the CDN benefits from both BGP anycast routing and DNS-level geo-routing.

Fastly uses DNS heavily for traffic steering. Fastly's model returns different IPs for different regions and uses short TTLs (30 seconds is common) to allow rapid failover.

Akamai has a particularly sophisticated DNS layer. Akamai operates its own DNS infrastructure (GSLB — Global Server Load Balancing) that returns different IPs based on the querying resolver's location, measured latency, and real-time health of Akamai's servers. Akamai's authoritative DNS is also anycast.

The common thread: CDN authoritative DNS servers are doing active work at query time, not just returning static records. They're consulting databases of resolver locations, health check results, current server load, and latency measurements to make routing decisions.

CDN failover patterns

CDNs also use DNS for failover: if a PoP or origin server is unhealthy, route traffic elsewhere.

The mechanism: the CDN runs health checks against its servers and origins. When a health check fails, the DNS layer stops returning that server's IP (or routes traffic to a different PoP).

This is faster than application-level failover but slower than connection-level failover. DNS failover is limited by TTLs — if you set a 5-minute TTL and a server goes down, traffic will continue hitting that server for up to 5 minutes while cached responses persist. CDNs manage this by using short TTLs (often 30-60 seconds) for their own load balancing records.

For multi-CDN setups, some organizations use a DNS-based approach where a primary CDN handles traffic, and DNS failover points to a secondary CDN if the primary's health checks fail. This requires careful coordination of health check thresholds and TTL management to avoid failover thrashing.

Key takeaways

CDNs use DNS geo-routing to direct users to the nearest PoP based on the query source IP
EDNS Client Subnet gives CDNs more accurate user location when resolvers are centralized
CNAME at the apex violates DNS spec; providers solve this with CNAME flattening (non-standard)
CDN authoritative DNS is dynamic — it's making routing decisions at query time
Short TTLs (30-60s) enable CDN failover; the tradeoff is increased query volume

Up next

Lesson 06: The Future of DNS — DNS over QUIC, Oblivious DNS, SVCB records, and what's actually getting deployed.