Module 6 · Lesson 4

AI and ML in DNS Management

30 min

What machine learning actually does in DNS security today: DGA detection, tunneling detection, NXDomain analysis. Plus the false positive problem nobody talks about enough.

AI and ML in DNS Management

The security industry has been putting "AI-powered" on every product for several years now. DNS security is no exception. So let me separate what machine learning actually does well in DNS from what's mostly marketing.

There are three areas where ML has genuine, deployed utility in DNS: detecting domain generation algorithm (DGA) activity, identifying DNS tunneling, and analyzing anomalous NXDOMAIN patterns. These are real problems, the techniques work, and they're in production at scale in major products. The limitations are also real: false positive rates in production are higher than vendor benchmarks suggest, and building your own requires training data that most organizations don't have.

DGA detection

Malware needs to communicate with its operators. Hardcoding a command-and-control IP address is fragile: block the IP and the malware goes silent. Hardcoding a domain is slightly better but still easy to block once discovered.

Domain Generation Algorithms solve this problem by generating many pseudorandom domain names based on a seed (often the current date). The malware and the operator independently run the same DGA and arrive at the same set of domains. On any given day, the malware knows which domains to query. The operator registers just one or two of them. Defenders face tens of thousands of generated names to block.

DGA domains look statistically different from human-registered domains. Human domains tend to have pronounceable, semantically meaningful names: cloudflare.com, amazon.com, stripe.com. DGA domains look like xvfzqmrp.net or tklbznqw2019.org: high entropy, low pronounceability, no meaningful n-grams.

ML models can learn this distinction. The basic approach trains on known DGA samples and legitimate domain corpora, extracting features like:

  • Character entropy
  • Ratio of consonants to vowels
  • Presence of dictionary words or substrings
  • N-gram frequency (how common are adjacent character pairs)
  • Domain length distribution
  • TLD distribution

Simple models (random forests, logistic regression on these features) can achieve 95%+ accuracy on test sets. In production, you'll see more variation, as attackers tune their DGAs specifically to defeat ML detectors.

Cisco Umbrella's DNS security uses DGA detection as one signal in a broader reputation system. Akamai's Secure Internet Access does similar work. At EBRAND, the X-RAY platform I built processes millions of domain registrations daily looking for patterns relevant to brand protection, not exactly DGA detection but similar statistical analysis of domain naming patterns at scale.

The training data problem. Your model is only as good as your training data. Public DGA datasets exist (DGArchive has a large collection), but attackers know these datasets too. A DGA specifically designed to look like legitimate domains by incorporating dictionary words will fool models trained only on obvious random strings. Staying ahead requires continuous retraining on fresh samples, which requires a feed of new malware families, something individual organizations rarely have.

DNS tunneling detection

DNS tunneling uses DNS queries and responses as a covert channel to exfiltrate data or maintain C2 communication through environments where other protocols are blocked.

The technique: encode data in DNS queries. A subdomain like aGVsbG8gd29ybGQ.exfil.attacker.com contains base64-encoded data. The authoritative server for attacker.com decodes the subdomains and assembles the exfiltrated payload. Responses come back as TXT or CNAME records with encoded data.

DNS tunneling tools like iodine and dns2tcp have been public for years. More sophisticated malware implements custom tunneling.

Detection signals:

Query volume and rate. A host sending 500 queries per minute to subdomains of a single parent domain is not normal browsing behavior.

Subdomain length. Legitimate subdomains are short and human-readable. Tunneled subdomains are long and high-entropy.

Response size patterns. Exfiltration over DNS tends to produce unusually large TXT records or consistently sized responses.

NXDOMAIN rates. Early-stage tunneling often generates NXDOMAIN responses as the malware tests connectivity.

Querying pattern uniqueness. A subdomain being queried that's never been seen before by any other client is suspicious. Legitimate CDN and tracking domains get queried by millions of clients.

ML approaches typically use these features to build anomaly detectors or classifiers. The challenge is that some legitimate software (backup tools, configuration management systems, some VPN implementations) uses DNS in ways that look superficially similar to tunneling.

NXDOMAIN spike analysis

NXDOMAIN responses ("domain does not exist") are normal background noise. Every network generates them. But spikes in NXDOMAIN rates often indicate something abnormal:

  • DGA malware querying its generated domain list (most won't be registered)
  • Fast-flux infrastructure that's rotating through domains faster than DNS caches
  • Misconfigured software querying for domains with typos
  • Reconnaissance scanning

Baseline NXDOMAIN rate analysis is relatively simple and can be done without ML: just track the ratio of NXDOMAIN to successful resolutions per source IP over time and alert on deviations. ML adds value when you want to distinguish between types of NXDOMAIN patterns (is this DGA activity, or a misconfigured backup job?).

What you can realistically build

If you're at a large organization with a mature security team:

  • Deploy a commercial DNS security product (Cisco Umbrella, Akamai SIA, Infoblox BloxOne)
  • Feed your DNS query logs into your SIEM and write detection rules for the patterns above
  • Subscribe to threat intelligence feeds that include malicious domain lists

If you want to build your own ML-based detection:

  • You can implement DGA detection with reasonable accuracy using standard classifiers and public datasets
  • You'll need ongoing maintenance to keep up with new DGA families
  • DNS tunneling detection at decent precision/recall requires a large corpus of labeled examples, which are hard to get without scale

What requires massive infrastructure:

  • Novel DGA family detection before samples are public
  • Behavioral analysis across millions of clients (what Cisco and Akamai do)
  • Zero-day C2 domain detection based on registration patterns and early query behavior

The gap between "I can build a proof of concept that works on test data" and "I have a production system with acceptable false positive rates" is significant. Vendors have real advantages here: they have the query volume to build meaningful behavioral baselines and the team to tune models continuously.

The false positive problem

This gets undersold in vendor materials. In production, DNS security ML generates false positives. A DGA detector trained on a 95% accurate model will generate millions of false positives when you're processing billions of queries per day.

False positives in DNS security mean blocking legitimate domains. The consequence can range from "this website loads slowly" to "this critical API call fails" to "this medical device can't reach its cloud service." The blast radius depends on what's being blocked.

Every production deployment needs a process for investigating and clearing false positives quickly. The products that work well in enterprise environments have good exception management workflows. The ones that don't end up turned off after the first major false positive incident.

Key takeaways

  • ML works well for DGA detection, DNS tunneling detection, and NXDOMAIN anomaly analysis
  • Commercial products (Cisco Umbrella, Akamai SIA) have genuine ML capabilities at scale
  • Building your own is feasible for DGA detection; it's hard for novel threat detection
  • False positive rates in production are higher than benchmarks suggest; plan for exception management
  • Training data quality is the main constraint on model effectiveness

Further reading

Up next

Lesson 05: DNS and CDNs: How content delivery networks use DNS for traffic steering, and why CNAME at the apex is such an annoying problem.