Module 4 · Lesson 2
Anycast DNS: Improving Resilience and Performance
⏱ 50 minutes
How Cloudflare, NS1, and Route 53 route billions of queries to the nearest healthy node. And how you can do the same if you're running your own authoritative infrastructure.
Anycast DNS: Improving Resilience and Performance
The IP address 1.1.1.1 exists in more than 300 locations simultaneously. When you query it from Paris, you hit a datacenter in Paris. From São Paulo, São Paulo. There's no load balancer in the middle, no geolocation database lookup, no application-layer routing. The network itself figures it out.
That's anycast. It's been the standard architecture for authoritative DNS at any meaningful scale for over a decade. If you're not using it, you're either very small or paying for the privilege of suboptimal latency.
How Anycast Works
In normal unicast routing, each IP address belongs to exactly one network interface in one physical location. When a packet is sent to that IP, BGP routes it to that specific location.
In anycast, the same IP prefix is announced from multiple locations simultaneously via BGP. Each location advertises the same route. Routers follow standard BGP path selection — shortest AS path, lowest MED, local preference — and packets are delivered to whichever location is "nearest" in BGP terms.
This means:
- A query from a client in Tokyo hits your Tokyo PoP
- A query from Frankfurt hits your Frankfurt PoP
- Both are using the same IP address
- Neither client does anything special
When a node goes offline and withdraws its BGP announcement, traffic automatically reroutes to the next-best node. No TTL to wait for. No client reconfiguration. BGP convergence typically happens in 30–90 seconds.
Why This Is the Standard for Authoritative DNS
Authoritative DNS has two performance requirements: low latency (answer quickly) and high availability (always answer). Anycast addresses both.
Latency: Instead of sending a query from Tokyo to a server in Virginia, you answer it locally. The difference between 2ms and 200ms RTT is measurable in page load time. For recursive resolvers that need to chain multiple authoritative queries, this compounds.
Availability: If your Frankfurt node dies, traffic shifts to Amsterdam or Paris. The client's resolver retries within its standard timeout and hits a different node that happens to have the same IP. From the client's perspective, there was a brief timeout. Not an outage.
Scale: You can add capacity by adding nodes. Each node announces the same prefix. Traffic distributes naturally to whoever is closest.
Who Uses It
Every major DNS provider uses anycast for authoritative service:
- Cloudflare: 300+ PoPs, all serving 1.1.1.1 (resolver) and authoritative via anycast
- Route 53: AWS edge nodes globally, same NS IP answering from nearest available region
- NS1: Anycast-native from the start, one of their core selling points
- Google Public DNS: 8.8.8.8 is anycast across Google's global network
For authoritative DNS, the nameserver IPs themselves are anycast. When your zone has NS ns1.exampledns.com, that hostname resolves to an anycast IP that routes to the nearest PoP.
The Trade-offs
Anycast is not free. There are real operational complications.
Zone Transfers Don't Work the Way You'd Expect
AXFR (full zone transfer) and IXFR (incremental) rely on TCP connections from a secondary to a specific primary. With anycast, "the primary" isn't a single location — the same IP might route to different nodes depending on where the secondary is.
The standard pattern is a hidden primary: a unicast (non-anycast) server that holds the authoritative copy of the zone, which is not publicly advertised. All anycast nodes receive zone transfers from this hidden primary. The hidden primary is never in your public NS records.
Zone delegation:
example.com NS ns1.exampledns.net <-- anycast, public
example.com NS ns2.exampledns.net <-- anycast, public
Hidden primary:
primary.internal.exampledns.net <-- unicast, not in NS records
All anycast nodes AXFR/IXFR from here
This is the topology used by every serious authoritative operator. The hidden primary is protected, monitored, and only reachable from the anycast nodes.
Debugging Is Harder
When you query an anycast IP, you don't know which node answered. If there's an inconsistency — one node has stale data, one node has a misconfigured zone — you might see intermittent failures that you can't reproduce.
To debug anycast:
- Use
digwith+short +identifyto see the IP that answered (doesn't always help with anycast) - Contact your DNS provider's API to check which nodes are serving which data
- If you control the infrastructure, use node-specific unicast IPs (out-of-band management addresses) to query individual nodes directly
Inconsistency between anycast nodes is usually caused by failed zone transfers. Check dig AXFR @specific-node.internal against your hidden primary's serial number.
BGP Hijacking Risk
Since anycast routing is based on BGP, a misconfigured or malicious AS announcing the same prefix can attract traffic. This is not theoretical — it happens. RPKI (Resource Public Key Infrastructure) validation mitigates this, but not all providers and networks implement it.
For your own infrastructure: make sure your IP announcements are covered by a valid ROA (Route Origin Authorization) in RPKI. For DNS providers you're relying on, check whether they publish ROAs.
Implementing Anycast for Your Own Infrastructure
If you're running your own authoritative DNS infrastructure (common for large enterprises, hosting providers, and anyone who doesn't want to be fully dependent on a SaaS DNS provider), here's the architecture:
Requirements
- Your own ASN — you need to be able to originate BGP routes
- Your own IP block — at least a /24, since most providers won't accept more-specific announcements
- BGP sessions with transit or IXP at each location
- A BGP daemon on your DNS servers — Bird2 or FRRouting are the standard choices
Basic Node Setup (Bird2)
Each anycast node runs a BGP daemon that announces the anycast prefix when the DNS service is healthy:
# /etc/bird/bird.conf (simplified)
protocol bgp upstream1 {
neighbor 198.51.100.1 as 64496;
local as 65001;
ipv4 {
export filter {
if net = 203.0.113.0/24 then accept;
reject;
};
};
}
Pair this with a health check that withdraws the route if the DNS daemon is unresponsive:
#!/bin/bash
# healthcheck.sh — runs every 10 seconds
if ! dig +time=2 +tries=1 @127.0.0.1 health.internal.example.com > /dev/null 2>&1; then
birdc disable upstream1
logger "DNS healthcheck failed, withdrawing BGP route"
fi
When the service recovers:
birdc enable upstream1
This is the core of anycast resilience: automatic failover based on service health, not just node availability.
Anycast with a DNS Provider
If you're using a managed DNS provider (Cloudflare, NS1, Route 53), anycast is just included. You don't configure it. What you do configure is:
- Multiple providers (secondary DNS) so one provider's anycast network going down doesn't take you offline
- Consistent TTLs across providers
- Zone transfer or API sync between primary and secondary providers
Key Takeaways
- Anycast routes traffic to the nearest BGP-adjacent node with the same IP. It's geography-aware routing at the network layer.
- Zone transfers require a hidden primary. AXFR to an anycast address is unreliable.
- Debugging anycast requires node-specific access or provider tooling — you can't tell which node answered from a standard
dig. - For your own anycast: Bird2 or FRRouting + BGP health check withdrawal is the standard pattern.
- The trade-offs are real but well-understood. Every major DNS provider has solved them. You benefit from their solutions when you use managed DNS.
Further Reading
- RFC 4786 — Operation of Anycast Services
- Bird2 documentation
- Cloudflare's anycast network overview
- RIPE NCC RPKI documentation
Up Next
DNS Monitoring and Logging Best Practices — what metrics to collect, what dashboards to build, and what thresholds to alert on.