Module 3 · Lesson 7
Performance Optimization Techniques for DNS Queries
⏱ 45 min
Measuring the DNS latency hidden in your application, TTL tuning strategies, in-process caching, dns-prefetch and preconnect for browsers, EDNS Client Subnet, and benchmarking with dnsperf and flamethrower. With real numbers.
Performance Optimization Techniques for DNS Queries
DNS adds latency to every connection your application makes. Most of the time that latency is 1-2ms (cached) or 20-100ms (uncached). Sometimes it's 500ms or more: a cold recursive lookup across DNS hierarchy under load.
The problem is you usually don't see it. DNS latency hides inside your HTTP client's "connect time." It shows up as P99 spikes that don't correlate with server load. It causes startup delays when your application launches and makes 20 connections to 20 different services at once.
This lesson is about making it visible, then making it smaller.
Measuring DNS Latency in Your Application
Start by measuring before optimizing. In Python:
import time
import socket
import statistics
from concurrent.futures import ThreadPoolExecutor
def measure_dns_latency(hostname: str, samples: int = 20) -> dict:
"""Measure DNS resolution time for a hostname."""
latencies = []
for _ in range(samples):
# Flush the OS resolver cache between measurements
# (only meaningful if you restart between runs)
start = time.perf_counter()
try:
socket.getaddrinfo(hostname, None)
elapsed = (time.perf_counter() - start) * 1000 # ms
latencies.append(elapsed)
except socket.gaierror:
pass
if not latencies:
return {}
return {
'hostname': hostname,
'samples': len(latencies),
'p50': statistics.median(latencies),
'p95': sorted(latencies)[int(len(latencies) * 0.95)],
'p99': sorted(latencies)[int(len(latencies) * 0.99)] if len(latencies) >= 100 else max(latencies),
'min': min(latencies),
'max': max(latencies),
'mean': statistics.mean(latencies),
}
# Check multiple services at once
hostnames = [
'api.stripe.com',
'api.sendgrid.com',
'sqs.us-east-1.amazonaws.com',
's3.amazonaws.com',
]
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(measure_dns_latency, hostnames))
for r in results:
print(f"{r['hostname']:40s} p50={r['p50']:.1f}ms p95={r['p95']:.1f}ms max={r['max']:.1f}ms")
Sample output from a production EC2 instance in us-east-1:
api.stripe.com p50=1.2ms p95=3.1ms max=47ms
api.sendgrid.com p50=1.1ms p95=2.8ms max=31ms
sqs.us-east-1.amazonaws.com p50=0.8ms p95=1.4ms max=8ms
s3.amazonaws.com p50=0.9ms p95=1.5ms max=12ms
Cached lookups run in under 2ms. But watch that max column: 47ms for a single DNS lookup is a real problem if you have a 100ms SLA on your API endpoints.
In Go, you can hook into the HTTP client's trace to see DNS time directly:
package main
import (
"context"
"crypto/tls"
"fmt"
"net/http"
"net/http/httptrace"
"time"
)
func measureHTTPWithDNS(url string) {
var (
dnsStart time.Time
dnsDone time.Time
connectDone time.Time
ttfb time.Time
)
trace := &httptrace.ClientTrace{
DNSStart: func(info httptrace.DNSStartInfo) {
dnsStart = time.Now()
},
DNSDone: func(info httptrace.DNSDoneInfo) {
dnsDone = time.Now()
},
ConnectDone: func(network, addr string, err error) {
connectDone = time.Now()
},
GotFirstResponseByte: func() {
ttfb = time.Now()
},
}
req, _ := http.NewRequestWithContext(
httptrace.WithClientTrace(context.Background(), trace),
"GET", url, nil,
)
client := &http.Client{
Transport: &http.Transport{
TLSClientConfig: &tls.Config{InsecureSkipVerify: false},
},
}
resp, err := client.Do(req)
if err != nil {
fmt.Printf("Error: %v\n", err)
return
}
defer resp.Body.Close()
fmt.Printf("URL: %s\n", url)
fmt.Printf("DNS: %v\n", dnsDone.Sub(dnsStart))
fmt.Printf("Connect: %v\n", connectDone.Sub(dnsDone))
fmt.Printf("TTFB: %v\n", ttfb.Sub(connectDone))
}
func main() {
measureHTTPWithDNS("https://api.stripe.com/v1/balance")
}
Pre-resolving at Startup
If your application connects to a fixed set of services, resolve them all at startup before serving traffic. This warms the OS resolver cache and (if using a TTL-aware in-process cache) populates your cache before the first real request arrives.
import asyncio
import dns.asyncresolver
from typing import NamedTuple
class ResolvedEndpoint(NamedTuple):
hostname: str
addresses: list[str]
ttl: int
async def pre_resolve(hostnames: list[str]) -> dict[str, ResolvedEndpoint]:
"""Resolve all hostnames at startup, in parallel."""
resolver = dns.asyncresolver.Resolver()
async def resolve_one(hostname: str) -> ResolvedEndpoint:
answer = await resolver.resolve(hostname, 'A')
addresses = [rdata.address for rdata in answer]
return ResolvedEndpoint(hostname=hostname, addresses=addresses, ttl=answer.ttl)
tasks = [resolve_one(h) for h in hostnames]
results = await asyncio.gather(*tasks, return_exceptions=True)
resolved = {}
for hostname, result in zip(hostnames, results):
if isinstance(result, Exception):
print(f"Warning: failed to pre-resolve {hostname}: {result}")
else:
resolved[hostname] = result
print(f"Pre-resolved {hostname} → {result.addresses} (TTL {result.ttl}s)")
return resolved
# In your application startup:
SERVICE_HOSTS = [
'api.stripe.com',
'api.sendgrid.com',
'sqs.us-east-1.amazonaws.com',
]
async def startup():
resolved = await pre_resolve(SERVICE_HOSTS)
# Store in your app's cache, warmed and ready
TTL Tuning Strategies
Different record types deserve different TTLs:
| Record Type | Recommended TTL | Reasoning |
|---|---|---|
| Root/apex A records | 300-3600s | Rarely change; longer TTL reduces DNS traffic |
| Service A records (internal) | 30-60s | Need fast failover |
| CNAME to CDN/LB | 60-300s | CDNs prefer longer; failover needs shorter |
| MX records | 3600s | Mail routing changes rarely; long TTL is fine |
| TXT (SPF, DKIM) | 3600s | Almost never changes |
| NS records | 86400s | Only change when migrating nameservers |
| SOA | 3600s | Standard |
For your own services with Route 53 health checks, the math is: TTL + health_check_interval + DNS_propagation ≈ time_to_failover. With TTL=30, check interval=10s, propagation=5s, failover is under 45 seconds.
One commonly ignored optimization: pre-lowering TTLs before deployments. If your current TTL is 300s, change it to 30s at least 5 minutes before your planned DNS change. After the change is complete and validated, raise it back to 300s.
In-Process DNS Caching
The OS resolver caches, but if you want control over eviction and monitoring, build a lightweight in-process cache in Go:
package dnscache
import (
"net"
"sync"
"time"
)
type entry struct {
addrs []string
expiresAt time.Time
}
type Cache struct {
mu sync.RWMutex
entries map[string]*entry
refresh time.Duration
}
func New(refreshBeforeExpiry time.Duration) *Cache {
c := &Cache{
entries: make(map[string]*entry),
refresh: refreshBeforeExpiry,
}
go c.evictLoop()
return c
}
func (c *Cache) Resolve(hostname string) ([]string, error) {
c.mu.RLock()
e, ok := c.entries[hostname]
c.mu.RUnlock()
if ok && time.Now().Before(e.expiresAt.Add(-c.refresh)) {
return e.addrs, nil // Cache hit with margin
}
// Cache miss or near-expiry — refresh
addrs, err := net.LookupHost(hostname)
if err != nil {
if ok {
return e.addrs, nil // Return stale rather than fail
}
return nil, err
}
// Store with a 60-second TTL (or use dnspython equivalent to get real TTL)
c.mu.Lock()
c.entries[hostname] = &entry{
addrs: addrs,
expiresAt: time.Now().Add(60 * time.Second),
}
c.mu.Unlock()
return addrs, nil
}
func (c *Cache) evictLoop() {
ticker := time.NewTicker(30 * time.Second)
for range ticker.C {
now := time.Now()
c.mu.Lock()
for k, e := range c.entries {
if now.After(e.expiresAt) {
delete(c.entries, k)
}
}
c.mu.Unlock()
}
}
EDNS Client Subnet and CDN Performance
When your users query DNS through a public resolver (8.8.8.8, 1.1.1.1), the authoritative server sees the resolver's IP, not the user's IP. For geo-routing, this means a user in Paris using Google DNS might get routed to a US CDN PoP.
EDNS Client Subnet (ECS, RFC 7871) fixes this: the resolver includes a truncated version of the client IP in the DNS query, so the authoritative server can make accurate geo-routing decisions.
# Check if a resolver passes ECS
dig +subnet=91.200.100.0/24 @8.8.8.8 cdn.example.com A
# If the answer changes based on the subnet, ECS is working
# Check what IP the authoritative server sees when using 8.8.8.8
dig @8.8.8.8 o-o.myaddr.l.google.com TXT
# Returns the client subnet Google received in the query
Cloudflare's 1.1.1.1 does NOT send ECS by default (privacy policy). Google's 8.8.8.8 does. AWS Route 53 Resolver does. If CDN performance matters for your users who use privacy-focused resolvers, you need a different approach (GeoDNS providers that use Anycast to approximate location from resolver IP).
Browser DNS Hints: dns-prefetch and preconnect
If your application serves web pages that load assets from third-party origins, you can warm the browser's DNS cache before those assets are needed. Two HTML hints do this.
dns-prefetch tells the browser to resolve a hostname in the background while the page loads:
<link rel="dns-prefetch" href="//fonts.googleapis.com">
<link rel="dns-prefetch" href="//cdn.stripe.com">
<link rel="dns-prefetch" href="//api.analytics-provider.com">
The browser issues a DNS lookup as soon as it parses the tag, before the request that actually needs the hostname. When the real request fires, the DNS result is already cached. Savings: typically 20-120ms per uncached hostname.
preconnect goes further: it resolves the DNS, opens the TCP connection, and completes TLS negotiation, all in the background:
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://api.stripe.com" crossorigin>
Savings from preconnect include DNS lookup time + TCP round trip + TLS handshake, often 100-400ms on a cold connection to a new origin.
The tradeoff: preconnect holds an open connection. Browsers limit the number of simultaneous connections and will drop preconnected connections that sit idle. Use preconnect only for origins you're confident will be used within a few seconds. Use dns-prefetch for origins that might be used.
<!-- Good: preconnect to your primary API and CDN -->
<link rel="preconnect" href="https://api.example.com">
<link rel="preconnect" href="https://cdn.example.com" crossorigin>
<!-- Good: dns-prefetch for maybe-used third-party scripts -->
<link rel="dns-prefetch" href="//www.google-analytics.com">
<link rel="dns-prefetch" href="//connect.facebook.net">
The crossorigin attribute is needed for preconnect when the resource will use CORS (fonts, API calls from scripts). Without it, the browser opens a separate connection for CORS requests anyway, wasting the preconnect.
Measuring the impact: In Chrome DevTools, the Network tab shows DNS lookup time as the first segment of the connection timing bar. With dns-prefetch active for an origin, that segment should drop to near-zero. Use Lighthouse or WebPageTest to confirm you're not losing time to DNS on your critical rendering path.
Benchmarking with dnsperf and flamethrower
# Install dnsperf (NLnet Labs queryperf or BIND's dnsperf)
apt-get install dnsperf # or build from source
# Create a query input file
cat > /tmp/queries.txt << 'EOF'
api.example.com A
db.internal.example.com A
redis.internal.example.com A
EOF
# Benchmark your DNS resolver
dnsperf -s 8.8.8.8 -d /tmp/queries.txt -l 30
# Runs for 30 seconds, measures QPS and latency
# Sample output:
# Queries sent: 150000
# Queries completed: 149987
# Average latency: 2.1ms
# Maximum latency: 47ms
# QPS: 4999
# flamethrower — more modern, with detailed latency histograms
# https://github.com/DNS-OARC/flamethrower
flamethrower -q 1000 -d 30 -r api.example.com 8.8.8.8
# -q 1000 = 1000 QPS
# -d 30 = 30 second duration
Real numbers from a Route 53 resolver in us-east-1 under load:
QPS: 500
p50: 0.8ms
p95: 2.1ms
p99: 5.4ms
p999: 31ms
QPS: 5000
p50: 1.1ms
p95: 4.2ms
p99: 18ms
p999: 89ms
At 5000 QPS, the p99 jumps to 18ms and the p999 hits nearly 100ms. If your application makes DNS queries under load (because it's not caching or the TTLs are very short), those tail latencies compound across every service call.
The Five-Minute DNS Performance Checklist
- Are you resolving on every request? Check your HTTP client's DNS caching behavior. Most respect
IdleConnTimeoutand reuse connections, but not all. - What's your ndots setting? In Kubernetes, set
ndots:2unless you have a reason for the default 5. - Are your TTLs appropriate? Internal services: 30-60s. External dependencies you don't control: use whatever the provider publishes.
- Do you pre-resolve at startup? If your app is latency-sensitive, warm your DNS cache before accepting traffic.
- Have you measured? If you don't have DNS latency in your traces, add it.
httptracein Go,dnspythontimer wrappers in Python.
Key Takeaways
dns-prefetchandpreconnectare low-effort browser wins for pages loading third-party origins; preconnect saves DNS + TCP + TLS, dns-prefetch saves DNS only- DNS latency is invisible until you measure it; use
httptracein Go or timedgetaddrinfo()calls in Python to see actual numbers - Pre-resolve all known external dependencies at application startup to warm the cache before traffic arrives
- TTL tuning: shorter for failover-sensitive records (30-60s), longer for stable records (300-3600s); lower TTLs before planned changes
- In-process caching with TTL-based eviction reduces DNS queries by orders of magnitude for stable, frequently-accessed names
- ECS matters for geo-routing accuracy; 1.1.1.1 doesn't send ECS, 8.8.8.8 does
- Benchmark with dnsperf or flamethrower before claiming your DNS infrastructure handles load
Further Reading
- MDN: dns-prefetch
- MDN: preconnect
- RFC 7871 — EDNS Client Subnet
- Go net/http/httptrace package
- dnsperf tool
- flamethrower DNS load tester
- Cloudflare blog: The road to QUIC (includes DNS latency measurements)
Up Next
Lesson 08 is the hands-on capstone: building a working DNS-aware service discovery system and a DNS TTL-based failover detector, from scratch, in Python and Go. Full code, ready to run.