Module 3 · Lesson 6

DNS and Service Discovery in Cloud Environments

45 min

Route 53 Private Hosted Zones, GCP Cloud DNS, Azure Private DNS, Consul's DNS interface, and the split-horizon pattern that lets your infrastructure speak two different truths.

dnsawsroute53gcpazureconsulservice-discoverysplit-horizon

DNS and Service Discovery in Cloud Environments

Your application running in AWS knows its database as postgres.internal:5432. Your developers know it as postgres.dev.company.com. Your monitoring knows it as 10.0.1.45. Same server, three different identities depending on who's asking.

That's split-horizon DNS — one of the most useful patterns in cloud infrastructure — and it's the thread running through this entire lesson.

Split-Horizon DNS

Split-horizon (or split-brain) DNS serves different answers for the same name depending on the source of the query. Internal clients get private IPs; external clients get public IPs or see nothing at all.

# From inside AWS VPC:
$ nslookup api.example.com
Server: 169.254.169.253  (Route 53 Resolver)
Name: api.example.com
Address: 10.0.1.100  (private IP)

# From the public internet:
$ nslookup api.example.com
Server: 8.8.8.8
Name: api.example.com
Address: 203.0.113.50  (public IP / load balancer)

The same hostname routes to completely different infrastructure depending on the network context. This is how you keep private services unreachable from the internet while still using consistent names in your code.

AWS Route 53 Private Hosted Zones

Private Hosted Zones (PHZs) are DNS zones that only respond to queries from associated VPCs.

# Create a private hosted zone
aws route53 create-hosted-zone \
  --name internal.example.com \
  --caller-reference $(date +%s) \
  --hosted-zone-config Comment="Internal services",PrivateZone=true \
  --vpc VPCRegion=us-east-1,VPCId=vpc-0abc123def456

# The zone returns an ID like /hostedzone/Z1234567890ABC
ZONE_ID="Z1234567890ABC"

# Add an A record pointing to a private IP
aws route53 change-resource-record-sets \
  --hosted-zone-id $ZONE_ID \
  --change-batch '{
    "Changes": [{
      "Action": "CREATE",
      "ResourceRecordSet": {
        "Name": "postgres.internal.example.com",
        "Type": "A",
        "TTL": 60,
        "ResourceRecords": [{"Value": "10.0.1.100"}]
      }
    }]
  }'

# Associate additional VPCs with this zone
aws route53 associate-vpc-with-hosted-zone \
  --hosted-zone-id $ZONE_ID \
  --vpc VPCRegion=us-east-1,VPCId=vpc-0xyz789

# List all private zones
aws route53 list-hosted-zones --query 'HostedZones[?Config.PrivateZone==`true`]'

Now any EC2 instance, Lambda function, or ECS task in the associated VPCs can resolve postgres.internal.example.com to 10.0.1.100. Nothing outside those VPCs sees this record.

Route 53 Resolver Rules extend this further for hybrid networks:

# Forward specific zones to on-premises DNS (via Direct Connect or VPN)
aws route53resolver create-resolver-rule \
  --rule-type FORWARD \
  --domain-name corp.example.com \
  --resolver-endpoint-id rslvr-out-1234567890 \
  --target-ips '[{"Ip":"10.100.0.10","Port":53},{"Ip":"10.100.0.11","Port":53}]' \
  --name "Forward corp.example.com to on-prem"

This makes corp.example.com queries from within your VPC forward to your on-premises DNS servers, enabling seamless hybrid cloud service discovery.

The AWS Architecture: Kubernetes + Route 53

A common pattern for EKS clusters:

Internet
    |
Route 53 Public Zone (api.example.com → ALB public IP)
    |
Application Load Balancer
    |
EKS Cluster (VPC)
    |
CoreDNS (cluster.local → ClusterIP)
    |
Route 53 Private Zone (*.internal.example.com → VPC IPs)
    |
RDS / ElastiCache / other VPC services

Pods in EKS resolve in this order:

  1. cluster.local names → CoreDNS handles internally
  2. External names → CoreDNS forwards to Route 53 Resolver (169.254.169.253)
  3. Route 53 checks private zones first → then public zones

To configure CoreDNS to forward a specific internal zone to Route 53:

# CoreDNS ConfigMap update
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
        }
        # Forward internal.example.com to Route 53 Resolver
        forward internal.example.com 169.254.169.253
        # Everything else also goes to Route 53 Resolver
        forward . 169.254.169.253
        cache 30
        loop
        reload
        loadbalance
    }

GCP Cloud DNS

GCP Cloud DNS supports private zones tied to VPC networks, similar to Route 53 PHZs.

# Create a private zone
gcloud dns managed-zones create internal-zone \
  --dns-name="internal.example.com." \
  --description="Internal services" \
  --visibility=private \
  --networks=default,production-vpc

# Add a record
gcloud dns record-sets create postgres.internal.example.com. \
  --zone=internal-zone \
  --type=A \
  --ttl=60 \
  --rrdatas=10.0.1.100

# DNS peering — share zones across projects
gcloud dns managed-zones create peered-zone \
  --dns-name="shared.internal.example.com." \
  --visibility=private \
  --networks=my-network \
  --dns-peering-zone=projects/shared-project/managedZones/source-zone

GCP Cloud DNS also supports DNS Policy for hybrid setups:

# Enable inbound DNS forwarding (allow on-prem to query GCP's resolver)
gcloud dns policies create inbound-policy \
  --enable-inbound-forwarding \
  --networks=production-vpc \
  --description="Allow inbound DNS from on-prem"

# Get the inbound forwarding IPs to configure on-prem resolvers to use
gcloud dns policies describe inbound-policy

Consul DNS Interface

HashiCorp Consul provides service discovery with a built-in DNS interface. Services register themselves (or are registered via health checks), and Consul answers DNS queries for them.

# Consul DNS naming convention
<service>.service.<datacenter>.consul
web.service.dc1.consul → IPs of healthy "web" service instances

# With tags
<tag>.<service>.service.<datacenter>.consul
v2.web.service.dc1.consul → only "web" instances tagged "v2"

# SRV records (port included)
dig @127.0.0.1 -p 8600 web.service.dc1.consul SRV

Integrating Consul DNS into a Kubernetes cluster:

# Forward .consul domains to Consul DNS
apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
        }
        # Forward consul DNS to consul agents
        forward consul 10.0.1.200:8600
        forward . 169.254.169.253
        cache 30
        reload
    }

Now pods in Kubernetes can discover Consul-registered services via DNS:

import dns.resolver

def discover_consul_service(service_name: str, datacenter: str = "dc1") -> list[tuple[str, int]]:
    """Resolve a Consul service to host:port pairs via DNS."""
    resolver = dns.resolver.Resolver()
    query = f"{service_name}.service.{datacenter}.consul"

    answer = resolver.resolve(query, 'SRV')
    endpoints = []
    for rdata in answer:
        host = str(rdata.target).rstrip('.')
        # Resolve the host to an IP
        a_answer = resolver.resolve(host, 'A')
        for a_rdata in a_answer:
            endpoints.append((a_rdata.address, rdata.port))
    return endpoints

# Only healthy instances are returned — Consul filters unhealthy ones
endpoints = discover_consul_service("payment-api")

Service Discovery Without a Service Mesh

Here's a pattern for service discovery in cloud environments using only DNS and Route 53, no service mesh required:

Architecture:
  Services register themselves by writing DNS records via Route 53 API
  Services discover each other via DNS SRV queries
  Health state drives TTL (healthy: 30s, unhealthy: 0 → record removed)

Code flow:
  1. Service starts → registers _service._tcp.internal. SRV record
  2. Other services discover via SRV lookup
  3. Health check fails → Lambda removes the SRV record
  4. Clients re-query within 30s and stop routing to dead instance
import boto3
import socket

def register_service(
    service_name: str,
    port: int,
    hosted_zone_id: str,
    priority: int = 10,
    weight: int = 50,
    ttl: int = 30,
):
    """Register this instance as a service endpoint in Route 53."""
    hostname = socket.getfqdn()
    client = boto3.client('route53')

    srv_name = f"_{service_name}._tcp.internal.example.com"
    srv_value = f"{priority} {weight} {port} {hostname}."

    client.change_resource_record_sets(
        HostedZoneId=hosted_zone_id,
        ChangeBatch={
            'Changes': [{
                'Action': 'UPSERT',
                'ResourceRecordSet': {
                    'Name': srv_name,
                    'Type': 'SRV',
                    'TTL': ttl,
                    'ResourceRecords': [{'Value': srv_value}],
                }
            }]
        }
    )
    print(f"Registered: {srv_value} → {srv_name}")

def deregister_service(service_name: str, port: int, hosted_zone_id: str):
    """Remove this instance's SRV record on shutdown."""
    hostname = socket.getfqdn()
    client = boto3.client('route53')
    # ... (same as register but with Action: DELETE)

Azure Private DNS Zones

Azure's implementation follows the same pattern:

# Create private DNS zone
az network private-dns zone create \
  --resource-group my-rg \
  --name internal.example.com

# Link to a VNet
az network private-dns link vnet create \
  --resource-group my-rg \
  --zone-name internal.example.com \
  --name my-vnet-link \
  --virtual-network my-vnet \
  --registration-enabled false  # true = auto-register VMs

# Add a record
az network private-dns record-set a add-record \
  --resource-group my-rg \
  --zone-name internal.example.com \
  --record-set-name postgres \
  --ipv4-address 10.0.1.100

Key Takeaways

  • Split-horizon DNS serves different answers to internal vs external clients — use it to keep private services private while using consistent names in your code
  • Route 53 Private Hosted Zones are the AWS native tool; associate them with VPCs and use Resolver Rules for hybrid networks
  • GCP Cloud DNS private zones and DNS peering across projects work the same concept as AWS PHZs
  • Consul's DNS interface serves SRV records for healthy service instances; forward .consul queries from CoreDNS to Consul agents
  • The Route 53 + SRV pattern can implement lightweight service discovery without a full service mesh
  • All three clouds (AWS, GCP, Azure) support private DNS zones with VNet/VPC association — the APIs differ but the concept is identical

Further Reading

Up Next

Lesson 07 gets practical: measuring DNS latency in your applications, techniques to reduce it, TTL tuning strategies, and how to benchmark your DNS infrastructure with real tools.