Module 4 · Lesson 7
DNS Performance Metrics and Benchmarking
⏱ 45 minutes
P99 resolution latency, TTFB impact, and how to run dnsperf before production runs it for you.
DNS Performance Metrics and Benchmarking
There's a version of DNS performance work where you run a benchmark, look at median latency, declare it "fast enough," and move on. That version misses the point. Median latency tells you what a typical user experiences on a good day. P99 latency tells you what 1% of your users experience — which at any meaningful scale is a lot of people, and they experience it on every DNS lookup, multiplied by every request their browser makes.
This lesson is about measuring DNS performance correctly and understanding what the numbers mean.
The Metrics That Matter
Resolution Latency Percentiles
P50 (median): Half of queries resolve faster than this. For a well-warmed local resolver serving from cache, P50 should be under 2ms. For a public resolver (8.8.8.8, 1.1.1.1), P50 for cached responses is typically 5–15ms depending on distance.
P95: 95% of queries resolve within this time. A reasonable target for a local resolver: under 20ms. For cache misses requiring recursive resolution, P95 under 200ms.
P99: 1% of queries take this long or longer. This is your "worst typical" latency. Cloudflare 1.1.1.1 publishes P99 under 10ms. For a corporate recursive resolver, P99 under 100ms for cached queries is achievable. Above 500ms suggests a configuration or infrastructure problem.
Why P99 matters more than P50 for DNS:
A browser loading a modern webpage makes 20–100 DNS lookups (HTML, CSS, JS, images, analytics, fonts — each domain). If your P99 is 2 seconds, some users hit that on multiple sequential lookups. This compounds directly into TTFB (time to first byte) and page load time.
Google's research showed that a 400ms delay in search results reduced queries by 0.44%. For a service your users interact with repeatedly, P99 DNS latency is a measurable business metric.
Authoritative vs Recursive Latency
Measure both separately — they have different bottlenecks.
Authoritative latency: Time for an authoritative nameserver to answer a query from a resolver. Typically 1–10ms for in-memory zones. Rises with DNSSEC signing overhead (add 2–5ms) and database-backed zones (add backend query latency).
Recursive latency: Time for a recursive resolver to answer a client query. For cached responses: 1–5ms. For cache misses: sum of all authoritative queries needed in the resolution chain, plus network RTTs. A cache miss for a deeply delegated zone might chain 3–4 authoritative queries, each adding RTT.
Measuring them separately tells you whether your resolver is the problem or the authoritative infrastructure is.
TTFB Impact
Time to First Byte is the sum of TCP handshake + TLS handshake + server processing + first byte received. DNS resolution happens before this chain starts.
For a page load:
- DNS lookup: X ms
- TCP connect: ~RTT ms
- TLS handshake: ~1.5 RTT ms
- HTTP request: ~RTT ms
If your DNS P99 is 500ms and your server P99 TTFB is 200ms, the DNS is the bottleneck. On subsequent requests to the same domain, DNS is cached and TTFB dominates. But the first request per domain — the first time a user visits, the first load of a third-party script — is DNS-dominated.
Benchmarking Tools
dnsperf
dnsperf from Nominum/Akamai is the standard tool for DNS performance testing. It sends queries from a file, measures response times, and reports statistics.
Install:
apt install dnsperf # Ubuntu/Debian
brew install dnsperf # macOS
Create a query file (one query per line):
example.com A
www.example.com A
mail.example.com MX
example.com AAAA
Run a benchmark:
dnsperf -s 127.0.0.1 -d queries.txt -c 10 -t 30
Options:
-starget server-dquery data file-cnumber of clients (concurrent connections)-ttest duration in seconds-Qmaximum QPS rate (useful for load testing without hammering)
Sample output:
DNS Performance Testing Tool
Version 2.11.2
[Status] Command line: dnsperf -s 127.0.0.1 -d queries.txt -c 10 -t 30
[Status] Sending queries (to 127.0.0.1:53)
Statistics:
Queries sent: 450123
Queries completed: 450119 (100.00%)
Queries lost: 4 (0.00%)
Response codes: NOERROR 449834 (99.94%), NXDOMAIN 285 (0.06%)
Average QPS: 15004.3 qps
Average latency: 0.664 ms
Latency std deviation: 1.234 ms
Min latency: 0.044 ms
Max latency: 45.234 ms
This doesn't give you P99 directly. To get percentiles, use -v flag or post-process with resperf.
resperf
resperf tests recursive resolver performance under increasing load, measuring latency at each load level. It ramps up QPS and shows you where latency starts to degrade.
resperf -s 127.0.0.1 -d queries.txt -r 60 -m 50000
-r ramp duration in seconds, -m maximum QPS. Output is a CSV you can plot — latency vs QPS shows you the inflection point where your resolver starts struggling.
flamethrower
flamethrower is a newer DNS benchmarking tool with better multithread support and more output formats:
# Install
cargo install flamethrower
# Run
flame -q 10000 -d 30 127.0.0.1 < queries.txt
It outputs percentile statistics (P50/P95/P99) directly.
Measuring from Multiple Locations
A benchmark from localhost measures the server's raw processing speed, not the latency clients experience. For a more realistic measurement, run dnsperf from a client machine that matches your user geography.
For external DNS (public authoritative servers), use synthetic monitoring tools:
- DNSperf.com — global measurement from 200+ locations, publishes P50/P95/P99 for major providers
- DNS Benchmark (Windows, GRC) — tests local resolver and public resolvers
- dig from multiple locations (use public jump hosts or a cloud instance per region)
What Good Looks Like
Reference benchmarks from published data:
| Provider | P50 (ms) | P99 (ms) | Source |
|---|---|---|---|
| Cloudflare 1.1.1.1 | 2–5 | < 10 | Cloudflare published |
| Google 8.8.8.8 | 5–15 | < 30 | DNSperf.com |
| OpenDNS 208.67.222.222 | 10–25 | < 50 | DNSperf.com |
| Self-hosted (warm cache, local) | 1–3 | < 15 | Typical |
| Self-hosted (cache miss) | 50–200 | < 500 | Typical |
What bad looks like and why:
P99 > 1 second: Resolver is saturated, backend queries are timing out, or network path has packet loss. Investigate SERVFAIL rate and upstream resolver health.
P50 > 50ms for cached queries: Cache is too small and evicting frequently, or resolver is under heavy CPU load from DNSSEC validation.
High variance (P99 >> 10x P50): Suggests cache misses are very expensive — either upstream authoritative is slow, or queries are hitting NXDOMAIN and not caching negatives.
Key Takeaways
- Measure P99, not just P50. Median latency hides the tail behavior that affects user experience.
- Cloudflare P99 under 10ms is the current benchmark for public resolver performance. Match it for your internal resolvers on cached queries.
dnsperffor raw throughput,resperffor load curve analysis,flamethrowerfor P-percentile output.- DNS latency directly adds to TTFB on the first request to each domain. At 30 domains per page, P99 DNS latency compounds.
- Always benchmark before deploying infrastructure changes. Production should not be your benchmark environment.
Further Reading
- dnsperf documentation
- DNSperf global provider benchmarks
- Cloudflare 1.1.1.1 performance blog
- Google's web performance research
Up Next
Case Studies: DNS Failures and Lessons Learned — four real incidents, what went wrong, and what changed afterward.