DNS in Microservices and Container Environments

There's a DNS resolver running inside every Docker container on your machine right now. It lives at 127.0.0.11. Most developers don't know it's there.

And there's a configuration in every Kubernetes pod that adds five DNS search domains and fires off up to five DNS queries for every hostname you try to resolve. Most engineers have never seen it, but it's adding latency to every service call.

This lesson is about understanding both, and what to do about them.

Docker's Embedded DNS Resolver

When you start a container in a user-defined Docker network, Docker intercepts DNS queries at 127.0.0.11:53. This resolver handles:

Service discovery between containers: curl http://my-api-service/ resolves via the embedded DNS
Forwarding unknown names to the host's resolvers

# Check the DNS configuration inside a running container
docker exec my-container cat /etc/resolv.conf
# nameserver 127.0.0.11
# options ndots:0

# Test name resolution inside a container
docker exec my-container nslookup other-service
# Server:  127.0.0.11
# Address: 127.0.0.11#53
# Name: other-service
# Address: 172.18.0.3

# Test from within the default bridge network (no Docker DNS)
docker run --rm alpine nslookup other-service
# This will FAIL — default bridge network doesn't get Docker DNS

The key point: user-defined networks get Docker DNS; the default bridge network does not. If your containers can't find each other by name, they're probably on the default bridge.

# docker-compose.yml — services on user-defined networks get DNS automatically
version: '3.8'
services:
  api:
    image: my-api:latest
    networks:
      - backend

  db:
    image: postgres:15
    networks:
      - backend

networks:
  backend:
    driver: bridge
    # Docker creates a user-defined network here
    # api can reach db via hostname "db"
    # db can reach api via hostname "api"

# Debug Docker DNS resolution
docker exec api-container nslookup db
docker exec api-container dig db A

# Check what resolver the container is using
docker exec api-container cat /etc/resolv.conf

# Run a container with DNS debugging tools
docker run --rm --network my-network nicolaka/netshoot nslookup my-service

Kubernetes CoreDNS

Kubernetes replaces the kube-dns service (the original implementation) with CoreDNS. It runs as a deployment in kube-system and every pod in the cluster is configured to use it for DNS resolution.

The internal DNS structure is:

# Service DNS names
<service-name>.<namespace>.svc.<cluster-domain>

# Examples (cluster domain defaults to cluster.local):
my-api.production.svc.cluster.local.
postgres.databases.svc.cluster.local.
redis.caching.svc.cluster.local.

# Pod DNS names (less commonly used)
<pod-ip-dashes>.<namespace>.pod.<cluster-domain>
10-244-1-5.production.pod.cluster.local.

The CoreDNS configuration lives in a ConfigMap:

kubectl get configmap coredns -n kube-system -o yaml

# Typical CoreDNS Corefile
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health {
           lameduck 5s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
           ttl 30
        }
        prometheus :9153
        forward . /etc/resolv.conf {
           max_concurrent 1000
        }
        cache 30
        loop
        reload
        loadbalance
    }

The ndots:5 Problem

Every pod in Kubernetes gets a /etc/resolv.conf that looks like this:

nameserver 10.96.0.10
search production.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

ndots:5 means: if the hostname you're querying has fewer than 5 dots, try appending the search domains first before treating it as a fully-qualified domain name.

Here's what happens when your application resolves api.stripe.com (2 dots, less than 5):

1. Query: api.stripe.com.production.svc.cluster.local  → NXDOMAIN
2. Query: api.stripe.com.svc.cluster.local              → NXDOMAIN
3. Query: api.stripe.com.cluster.local                  → NXDOMAIN
4. Query: api.stripe.com.                               → ANSWER (finally!)

Four queries instead of one. For every single external hostname you resolve that has fewer than 5 dots (which is most of them), you're generating 3 unnecessary DNS queries first.

At scale, this is visible latency. A service making 1000 requests per second to 10 different external hosts is generating 30,000 unnecessary DNS queries per second.

The fix: use trailing dots for external names. A fully-qualified domain name ending with a dot bypasses the search domain expansion entirely.

import dns.resolver

# Without trailing dot — triggers ndots search expansion
resolver = dns.resolver.Resolver()
answer = resolver.resolve('api.stripe.com', 'A')   # 4 queries in Kubernetes

# With trailing dot — goes direct, bypasses search expansion
answer = resolver.resolve('api.stripe.com.', 'A')  # 1 query

In HTTP clients, you can't always add the trailing dot, but you can tune the pod's DNS configuration:

# Pod spec with reduced ndots
spec:
  dnsConfig:
    options:
      - name: ndots
        value: "1"   # Only expand if hostname has fewer than 1 dot (never)
    # Or more pragmatically, value "2" — expands single-label names, skips FQDNs

Or tune per-pod in your deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-api
spec:
  template:
    spec:
      dnsPolicy: ClusterFirst
      dnsConfig:
        options:
          - name: ndots
            value: "2"
          - name: timeout
            value: "2"
          - name: attempts
            value: "2"

ndots: 2 means hostnames with fewer than 2 dots get search expansion (so single-label names like redis become redis.production.svc.cluster.local). Hostnames with 2+ dots (like api.stripe.com) go directly as FQDNs. This preserves internal service resolution while eliminating the external hostname overhead.

Debugging DNS Inside Pods

This is the workflow you need when DNS is broken in Kubernetes:

# Step 1: Check if DNS is working at all inside the pod
kubectl exec -it <pod-name> -n <namespace> -- nslookup kubernetes.default
# Should resolve to the kubernetes service IP

# Step 2: Check what DNS config the pod has
kubectl exec -it <pod-name> -- cat /etc/resolv.conf

# Step 3: Test internal service resolution
kubectl exec -it <pod-name> -- nslookup my-service.my-namespace.svc.cluster.local

# Step 4: Test external resolution
kubectl exec -it <pod-name> -- nslookup google.com

# Step 5: Check CoreDNS is running
kubectl get pods -n kube-system -l k8s-app=kube-dns
kubectl logs -n kube-system -l k8s-app=kube-dns --tail=50

# Step 6: Check CoreDNS metrics
kubectl port-forward -n kube-system svc/kube-dns 9153:9153
curl http://localhost:9153/metrics | grep coredns_dns_request

# Step 7: Run a debugging pod with full network tools
kubectl run -it --rm debug --image=nicolaka/netshoot --restart=Never -- bash
# Inside:
dig @10.96.0.10 my-service.my-namespace.svc.cluster.local
dig +trace api.stripe.com

# Check if a specific service has DNS entries
kubectl exec -it debug-pod -- dig my-service.my-namespace.svc.cluster.local SRV

# Watch DNS queries in real time with CoreDNS log plugin enabled
# First, enable logging in the CoreDNS ConfigMap:
kubectl edit configmap coredns -n kube-system
# Add: log
# before: errors

# Then watch logs
kubectl logs -n kube-system -l k8s-app=kube-dns -f

Service Mesh DNS

When you add a service mesh (Istio, Linkerd), DNS resolution still goes through CoreDNS, but the traffic is intercepted by sidecars before it reaches the network. The distinction matters for debugging:

DNS lookup: CoreDNS still resolves service names to ClusterIP
Connection: iptables rules redirect to the Envoy/linkerd2-proxy sidecar
Load balancing: the sidecar handles it, ignoring the single ClusterIP

# In Istio, check if a service is visible to the mesh
istioctl proxy-config cluster <pod-name>.<namespace> | grep my-service

# Check sidecar DNS interception
istioctl analyze -n my-namespace

# DNS debugging with Istio
kubectl exec -it <pod> -c istio-proxy -- pilot-agent request GET /config_dump | \
  jq '.configs[] | select(.["@type"] | contains("Cluster")) | .dynamic_active_clusters[]
      | select(.cluster.name | contains("my-service"))'

Key Takeaways

Docker's embedded DNS resolver runs at 127.0.0.11 and is only available in user-defined networks, not the default bridge
Kubernetes CoreDNS provides DNS at <service>.<namespace>.svc.cluster.local and handles forwarding to external resolvers
ndots:5 (the Kubernetes default) causes up to 4 DNS queries for every external hostname with fewer than 5 dots; set ndots:2 in your pod DNS config to fix it
Use trailing dots in DNS queries (e.g., api.stripe.com.) to bypass search domain expansion entirely
kubectl exec + nslookup/dig + nicolaka/netshoot is the debugging toolkit; CoreDNS logs are the last resort
Service meshes intercept traffic after DNS resolution; the sidecar handles load balancing, not DNS

Up Next

Lesson 06 covers DNS in cloud environments: Route 53 private zones, GCP Cloud DNS, Consul's DNS interface, and the split-horizon pattern that lets you serve different answers to internal vs external clients.