Module 3 · Lesson 7

Performance Optimization Techniques for DNS Queries

45 min

Measuring the DNS latency hidden in your application, TTL tuning strategies, in-process caching, dns-prefetch and preconnect for browsers, EDNS Client Subnet, and benchmarking with dnsperf and flamethrower. With real numbers.

dnsperformancelatencycachingednsbenchmarkingttl

Performance Optimization Techniques for DNS Queries

DNS adds latency to every connection your application makes. Most of the time that latency is 1-2ms (cached) or 20-100ms (uncached). Sometimes it's 500ms or more: a cold recursive lookup across DNS hierarchy under load.

The problem is you usually don't see it. DNS latency hides inside your HTTP client's "connect time." It shows up as P99 spikes that don't correlate with server load. It causes startup delays when your application launches and makes 20 connections to 20 different services at once.

This lesson is about making it visible, then making it smaller.

Measuring DNS Latency in Your Application

Start by measuring before optimizing. In Python:

import time
import socket
import statistics
from concurrent.futures import ThreadPoolExecutor

def measure_dns_latency(hostname: str, samples: int = 20) -> dict:
    """Measure DNS resolution time for a hostname."""
    latencies = []

    for _ in range(samples):
        # Flush the OS resolver cache between measurements
        # (only meaningful if you restart between runs)
        start = time.perf_counter()
        try:
            socket.getaddrinfo(hostname, None)
            elapsed = (time.perf_counter() - start) * 1000  # ms
            latencies.append(elapsed)
        except socket.gaierror:
            pass

    if not latencies:
        return {}

    return {
        'hostname': hostname,
        'samples': len(latencies),
        'p50': statistics.median(latencies),
        'p95': sorted(latencies)[int(len(latencies) * 0.95)],
        'p99': sorted(latencies)[int(len(latencies) * 0.99)] if len(latencies) >= 100 else max(latencies),
        'min': min(latencies),
        'max': max(latencies),
        'mean': statistics.mean(latencies),
    }

# Check multiple services at once
hostnames = [
    'api.stripe.com',
    'api.sendgrid.com',
    'sqs.us-east-1.amazonaws.com',
    's3.amazonaws.com',
]

with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(measure_dns_latency, hostnames))

for r in results:
    print(f"{r['hostname']:40s}  p50={r['p50']:.1f}ms  p95={r['p95']:.1f}ms  max={r['max']:.1f}ms")

Sample output from a production EC2 instance in us-east-1:

api.stripe.com                            p50=1.2ms   p95=3.1ms   max=47ms
api.sendgrid.com                          p50=1.1ms   p95=2.8ms   max=31ms
sqs.us-east-1.amazonaws.com               p50=0.8ms   p95=1.4ms   max=8ms
s3.amazonaws.com                          p50=0.9ms   p95=1.5ms   max=12ms

Cached lookups run in under 2ms. But watch that max column: 47ms for a single DNS lookup is a real problem if you have a 100ms SLA on your API endpoints.

In Go, you can hook into the HTTP client's trace to see DNS time directly:

package main

import (
    "context"
    "crypto/tls"
    "fmt"
    "net/http"
    "net/http/httptrace"
    "time"
)

func measureHTTPWithDNS(url string) {
    var (
        dnsStart    time.Time
        dnsDone     time.Time
        connectDone time.Time
        ttfb        time.Time
    )

    trace := &httptrace.ClientTrace{
        DNSStart: func(info httptrace.DNSStartInfo) {
            dnsStart = time.Now()
        },
        DNSDone: func(info httptrace.DNSDoneInfo) {
            dnsDone = time.Now()
        },
        ConnectDone: func(network, addr string, err error) {
            connectDone = time.Now()
        },
        GotFirstResponseByte: func() {
            ttfb = time.Now()
        },
    }

    req, _ := http.NewRequestWithContext(
        httptrace.WithClientTrace(context.Background(), trace),
        "GET", url, nil,
    )

    client := &http.Client{
        Transport: &http.Transport{
            TLSClientConfig: &tls.Config{InsecureSkipVerify: false},
        },
    }

    resp, err := client.Do(req)
    if err != nil {
        fmt.Printf("Error: %v\n", err)
        return
    }
    defer resp.Body.Close()

    fmt.Printf("URL: %s\n", url)
    fmt.Printf("DNS:     %v\n", dnsDone.Sub(dnsStart))
    fmt.Printf("Connect: %v\n", connectDone.Sub(dnsDone))
    fmt.Printf("TTFB:    %v\n", ttfb.Sub(connectDone))
}

func main() {
    measureHTTPWithDNS("https://api.stripe.com/v1/balance")
}

Pre-resolving at Startup

If your application connects to a fixed set of services, resolve them all at startup before serving traffic. This warms the OS resolver cache and (if using a TTL-aware in-process cache) populates your cache before the first real request arrives.

import asyncio
import dns.asyncresolver
from typing import NamedTuple

class ResolvedEndpoint(NamedTuple):
    hostname: str
    addresses: list[str]
    ttl: int

async def pre_resolve(hostnames: list[str]) -> dict[str, ResolvedEndpoint]:
    """Resolve all hostnames at startup, in parallel."""
    resolver = dns.asyncresolver.Resolver()
    
    async def resolve_one(hostname: str) -> ResolvedEndpoint:
        answer = await resolver.resolve(hostname, 'A')
        addresses = [rdata.address for rdata in answer]
        return ResolvedEndpoint(hostname=hostname, addresses=addresses, ttl=answer.ttl)
    
    tasks = [resolve_one(h) for h in hostnames]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    
    resolved = {}
    for hostname, result in zip(hostnames, results):
        if isinstance(result, Exception):
            print(f"Warning: failed to pre-resolve {hostname}: {result}")
        else:
            resolved[hostname] = result
            print(f"Pre-resolved {hostname} → {result.addresses} (TTL {result.ttl}s)")
    
    return resolved

# In your application startup:
SERVICE_HOSTS = [
    'api.stripe.com',
    'api.sendgrid.com',
    'sqs.us-east-1.amazonaws.com',
]

async def startup():
    resolved = await pre_resolve(SERVICE_HOSTS)
    # Store in your app's cache, warmed and ready

TTL Tuning Strategies

Different record types deserve different TTLs:

Record TypeRecommended TTLReasoning
Root/apex A records300-3600sRarely change; longer TTL reduces DNS traffic
Service A records (internal)30-60sNeed fast failover
CNAME to CDN/LB60-300sCDNs prefer longer; failover needs shorter
MX records3600sMail routing changes rarely; long TTL is fine
TXT (SPF, DKIM)3600sAlmost never changes
NS records86400sOnly change when migrating nameservers
SOA3600sStandard

For your own services with Route 53 health checks, the math is: TTL + health_check_interval + DNS_propagation ≈ time_to_failover. With TTL=30, check interval=10s, propagation=5s, failover is under 45 seconds.

One commonly ignored optimization: pre-lowering TTLs before deployments. If your current TTL is 300s, change it to 30s at least 5 minutes before your planned DNS change. After the change is complete and validated, raise it back to 300s.

In-Process DNS Caching

The OS resolver caches, but if you want control over eviction and monitoring, build a lightweight in-process cache in Go:

package dnscache

import (
    "net"
    "sync"
    "time"
)

type entry struct {
    addrs     []string
    expiresAt time.Time
}

type Cache struct {
    mu      sync.RWMutex
    entries map[string]*entry
    refresh time.Duration
}

func New(refreshBeforeExpiry time.Duration) *Cache {
    c := &Cache{
        entries: make(map[string]*entry),
        refresh: refreshBeforeExpiry,
    }
    go c.evictLoop()
    return c
}

func (c *Cache) Resolve(hostname string) ([]string, error) {
    c.mu.RLock()
    e, ok := c.entries[hostname]
    c.mu.RUnlock()

    if ok && time.Now().Before(e.expiresAt.Add(-c.refresh)) {
        return e.addrs, nil  // Cache hit with margin
    }

    // Cache miss or near-expiry — refresh
    addrs, err := net.LookupHost(hostname)
    if err != nil {
        if ok {
            return e.addrs, nil  // Return stale rather than fail
        }
        return nil, err
    }

    // Store with a 60-second TTL (or use dnspython equivalent to get real TTL)
    c.mu.Lock()
    c.entries[hostname] = &entry{
        addrs:     addrs,
        expiresAt: time.Now().Add(60 * time.Second),
    }
    c.mu.Unlock()

    return addrs, nil
}

func (c *Cache) evictLoop() {
    ticker := time.NewTicker(30 * time.Second)
    for range ticker.C {
        now := time.Now()
        c.mu.Lock()
        for k, e := range c.entries {
            if now.After(e.expiresAt) {
                delete(c.entries, k)
            }
        }
        c.mu.Unlock()
    }
}

EDNS Client Subnet and CDN Performance

When your users query DNS through a public resolver (8.8.8.8, 1.1.1.1), the authoritative server sees the resolver's IP, not the user's IP. For geo-routing, this means a user in Paris using Google DNS might get routed to a US CDN PoP.

EDNS Client Subnet (ECS, RFC 7871) fixes this: the resolver includes a truncated version of the client IP in the DNS query, so the authoritative server can make accurate geo-routing decisions.

# Check if a resolver passes ECS
dig +subnet=91.200.100.0/24 @8.8.8.8 cdn.example.com A
# If the answer changes based on the subnet, ECS is working

# Check what IP the authoritative server sees when using 8.8.8.8
dig @8.8.8.8 o-o.myaddr.l.google.com TXT
# Returns the client subnet Google received in the query

Cloudflare's 1.1.1.1 does NOT send ECS by default (privacy policy). Google's 8.8.8.8 does. AWS Route 53 Resolver does. If CDN performance matters for your users who use privacy-focused resolvers, you need a different approach (GeoDNS providers that use Anycast to approximate location from resolver IP).

Browser DNS Hints: dns-prefetch and preconnect

If your application serves web pages that load assets from third-party origins, you can warm the browser's DNS cache before those assets are needed. Two HTML hints do this.

dns-prefetch tells the browser to resolve a hostname in the background while the page loads:

<link rel="dns-prefetch" href="//fonts.googleapis.com">
<link rel="dns-prefetch" href="//cdn.stripe.com">
<link rel="dns-prefetch" href="//api.analytics-provider.com">

The browser issues a DNS lookup as soon as it parses the tag, before the request that actually needs the hostname. When the real request fires, the DNS result is already cached. Savings: typically 20-120ms per uncached hostname.

preconnect goes further: it resolves the DNS, opens the TCP connection, and completes TLS negotiation, all in the background:

<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://api.stripe.com" crossorigin>

Savings from preconnect include DNS lookup time + TCP round trip + TLS handshake, often 100-400ms on a cold connection to a new origin.

The tradeoff: preconnect holds an open connection. Browsers limit the number of simultaneous connections and will drop preconnected connections that sit idle. Use preconnect only for origins you're confident will be used within a few seconds. Use dns-prefetch for origins that might be used.

<!-- Good: preconnect to your primary API and CDN -->
<link rel="preconnect" href="https://api.example.com">
<link rel="preconnect" href="https://cdn.example.com" crossorigin>

<!-- Good: dns-prefetch for maybe-used third-party scripts -->
<link rel="dns-prefetch" href="//www.google-analytics.com">
<link rel="dns-prefetch" href="//connect.facebook.net">

The crossorigin attribute is needed for preconnect when the resource will use CORS (fonts, API calls from scripts). Without it, the browser opens a separate connection for CORS requests anyway, wasting the preconnect.

Measuring the impact: In Chrome DevTools, the Network tab shows DNS lookup time as the first segment of the connection timing bar. With dns-prefetch active for an origin, that segment should drop to near-zero. Use Lighthouse or WebPageTest to confirm you're not losing time to DNS on your critical rendering path.

Benchmarking with dnsperf and flamethrower

# Install dnsperf (NLnet Labs queryperf or BIND's dnsperf)
apt-get install dnsperf  # or build from source

# Create a query input file
cat > /tmp/queries.txt << 'EOF'
api.example.com A
db.internal.example.com A
redis.internal.example.com A
EOF

# Benchmark your DNS resolver
dnsperf -s 8.8.8.8 -d /tmp/queries.txt -l 30
# Runs for 30 seconds, measures QPS and latency

# Sample output:
# Queries sent:         150000
# Queries completed:    149987
# Average latency:      2.1ms
# Maximum latency:      47ms
# QPS:                  4999
# flamethrower — more modern, with detailed latency histograms
# https://github.com/DNS-OARC/flamethrower

flamethrower -q 1000 -d 30 -r api.example.com 8.8.8.8
# -q 1000 = 1000 QPS
# -d 30 = 30 second duration

Real numbers from a Route 53 resolver in us-east-1 under load:

QPS: 500
p50: 0.8ms
p95: 2.1ms
p99: 5.4ms
p999: 31ms

QPS: 5000
p50: 1.1ms
p95: 4.2ms
p99: 18ms
p999: 89ms

At 5000 QPS, the p99 jumps to 18ms and the p999 hits nearly 100ms. If your application makes DNS queries under load (because it's not caching or the TTLs are very short), those tail latencies compound across every service call.

The Five-Minute DNS Performance Checklist

  1. Are you resolving on every request? Check your HTTP client's DNS caching behavior. Most respect IdleConnTimeout and reuse connections, but not all.
  2. What's your ndots setting? In Kubernetes, set ndots:2 unless you have a reason for the default 5.
  3. Are your TTLs appropriate? Internal services: 30-60s. External dependencies you don't control: use whatever the provider publishes.
  4. Do you pre-resolve at startup? If your app is latency-sensitive, warm your DNS cache before accepting traffic.
  5. Have you measured? If you don't have DNS latency in your traces, add it. httptrace in Go, dnspython timer wrappers in Python.

Key Takeaways

  • dns-prefetch and preconnect are low-effort browser wins for pages loading third-party origins; preconnect saves DNS + TCP + TLS, dns-prefetch saves DNS only
  • DNS latency is invisible until you measure it; use httptrace in Go or timed getaddrinfo() calls in Python to see actual numbers
  • Pre-resolve all known external dependencies at application startup to warm the cache before traffic arrives
  • TTL tuning: shorter for failover-sensitive records (30-60s), longer for stable records (300-3600s); lower TTLs before planned changes
  • In-process caching with TTL-based eviction reduces DNS queries by orders of magnitude for stable, frequently-accessed names
  • ECS matters for geo-routing accuracy; 1.1.1.1 doesn't send ECS, 8.8.8.8 does
  • Benchmark with dnsperf or flamethrower before claiming your DNS infrastructure handles load

Further Reading

Up Next

Lesson 08 is the hands-on capstone: building a working DNS-aware service discovery system and a DNS TTL-based failover detector, from scratch, in Python and Go. Full code, ready to run.