Section 2.4.1 📖 ~11 min read

DNS Caching and TTL

How DNS caching works at every layer, TTL mechanics, negative caching, cache poisoning risks, and migration strategies.

DNS caching is what makes the system fast. Without it, every website visit would require multiple round trips to servers around the world. With it, most queries resolve instantly from nearby caches.

But caching introduces complexity: stale data, propagation delays, and security risks. Understanding how caching works — and how TTL controls it — is essential for managing DNS effectively.

Caching at Every Layer

DNS queries pass through multiple caching layers, each with its own behavior.

Browser Cache

Modern browsers maintain their own DNS cache, independent of the operating system. When you type a URL, the browser checks its cache first.

Chrome: chrome://net-internals/#dns shows the cache
Firefox: about:networking#dns shows cached entries
Safari: No built-in viewer, but it caches DNS results

Browser caches typically respect TTL values but may have their own minimum TTL (often 60 seconds) to prevent excessive queries.

Operating System Cache

If the browser cache misses, the OS resolver cache is checked next.

On macOS, the mDNSResponder service handles caching:

# Flush macOS DNS cache
sudo dscacheutil -flushcache; sudo killall -HUP mDNSResponder

On Linux with systemd-resolved:

# View cache statistics
resolvectl statistics

# Flush cache
sudo systemd-resolve --flush-caches

On Windows:

# View cache
ipconfig /displaydns

# Flush cache
ipconfig /flushdns

Recursive Resolver Cache

This is the big one. Recursive resolvers (like 8.8.8.8 or 1.1.1.1) cache aggressively because they serve millions of clients. A popular domain might be queried thousands of times per second — the cache absorbs nearly all of that traffic.

You have no direct control over external resolver caches. When you change a DNS record, you wait for the TTL to expire.

CDN and Edge Caches

Content delivery networks often run their own DNS infrastructure. Cloudflare, Akamai, and Fastly all cache DNS at their edge nodes. This adds another layer between your authoritative servers and end users.

TTL Mechanics

TTL (Time To Live) is a value in seconds that tells caches how long to keep a record before considering it stale. It’s set by the authoritative server for each record.

www.example.com.    300   IN  A    93.184.216.34
                    ^^^
                    TTL = 300 seconds (5 minutes)

How TTL Decrements

When a recursive resolver caches a record, it stores the original TTL and the time it was cached. On subsequent queries, it calculates the remaining TTL:

remaining_ttl = original_ttl - (current_time - cache_time)

When it returns the record to clients, it sends the remaining TTL, not the original. This ensures downstream caches don’t hold records longer than intended.

Common TTL Values

TTL	Duration	Use Case
60	1 minute	Rapid failover, dynamic IPs
300	5 minutes	Balanced freshness/performance
3600	1 hour	Stable services
86400	24 hours	Rarely-changed records
172800	48 hours	TLD delegation (NS records)

The tradeoff: Lower TTL = faster propagation of changes, but more queries hitting your authoritative servers. Higher TTL = better caching efficiency, but slower propagation.

TTL Best Practices

Use shorter TTLs for records that might change: Load balancer IPs, failover targets
Use longer TTLs for stable records: MX records, TXT records for email authentication
Lower TTL before planned changes: Drop to 300 seconds 24-48 hours before migrations
Don’t go below 60 seconds: Many resolvers enforce a minimum; shorter TTLs waste bandwidth

Negative Caching

What happens when a domain doesn’t exist? Without caching, every typo would generate a full query chain. Negative caching prevents this.

When an authoritative server returns NXDOMAIN (name doesn’t exist) or NODATA (name exists but not for the requested type), resolvers cache that negative result. RFC 2308 defines the rules.

The SOA Minimum Field

The TTL for negative caching comes from the SOA record’s minimum field:

example.com. 3600 IN SOA ns1.example.com. admin.example.com. (
    2024012801  ; Serial
    3600        ; Refresh
    600         ; Retry
    604800      ; Expire
    300         ; Minimum ← Negative cache TTL
)

This 300 means “cache NXDOMAIN responses for 5 minutes.”

Why Negative Caching Matters

Performance: Typos, probes, and scans don’t hammer authoritative servers
Timing: If you create a new subdomain, cached NXDOMAIN responses must expire before it’s visible everywhere
Attacks: Attackers can’t overwhelm servers by querying random non-existent names

For new records, consider that negative caching might delay visibility. If someone queried new.example.com and got NXDOMAIN cached, they won’t see the new record until that cache expires.

Cache Poisoning

DNS caching creates an attack surface: if an attacker can inject false data into a cache, all clients using that cache will receive malicious responses.

The Kaminsky Attack

In 2008, Dan Kaminsky demonstrated a devastating cache poisoning technique. The attack exploits several DNS weaknesses:

Predictable transaction IDs: DNS uses 16-bit transaction IDs (0-65535)
UDP’s lack of authentication: Responses aren’t verified beyond matching the transaction ID
Additional section injection: Attackers can include extra records in forged responses

The attack flow:

Attacker sends many queries for random subdomains: random1.target.com, random2.target.com, etc.
For each query, attacker floods the resolver with forged responses claiming to be the authoritative server
Forged responses include a poisoned NS record in the authority section: “the nameserver for target.com is evil.attacker.com”
Due to the birthday paradox, the attacker only needs ~256 attempts to match a transaction ID
Once poisoned, the resolver directs ALL queries for target.com to the attacker

Mitigations

Source port randomization: Instead of using a fixed port, resolvers randomize the source port. Combined with the 16-bit transaction ID, this creates ~32 bits of entropy — making blind spoofing much harder.

0x20 encoding: Some resolvers randomize the case of query names (e.g., WwW.ExAmPlE.cOm). Since DNS is case-insensitive but case-preserving, forged responses with wrong case are rejected.

DNSSEC: The real fix. DNSSEC cryptographically signs DNS records, making spoofed responses verifiable false. But it requires deployment by zone owners and validation by resolvers.

TTL Strategies for Migrations

Changing DNS records during a migration requires planning. Here’s the standard playbook:

Phase 1: Lower TTL (24-48 hours before)

Before the migration, reduce TTL on records you’ll change:

; Before: stable TTL
www.example.com.  86400  IN  A    192.0.2.10

; Temporary: short TTL
www.example.com.  300    IN  A    192.0.2.10

Wait for the old TTL to expire everywhere. If your original TTL was 86400 (24 hours), wait at least that long.

Phase 2: Make the Change

Update the record to point to the new destination:

www.example.com.  300    IN  A    192.0.2.20

With the short TTL, clients will pick up the change within 5 minutes.

Phase 3: Verify

Monitor traffic to both old and new destinations. Watch for:

Error rates on the new service
Lingering traffic on the old service
Geographic or network-specific issues

Phase 4: Raise TTL (after stability)

Once everything is working, increase TTL for efficiency:

www.example.com.  86400  IN  A    192.0.2.20

Common Mistakes

Forgetting to lower TTL first: If you change a record with a 24-hour TTL, some users won’t see the change for 24 hours. During a migration, that’s 24 hours of potential downtime.

Not waiting long enough: After lowering TTL, you must wait for the OLD TTL to expire before making changes. Lowering from 86400 to 300 doesn’t instantly update caches.

Lowering TTL too much: Going from 86400 to 30 seconds increases authoritative server load dramatically. 300 seconds is usually sufficient.

Raising TTL too soon: Wait until you’re confident the migration is stable. Quick rollbacks are only possible with short TTLs.

Debugging Cache Issues

When DNS changes aren’t propagating as expected:

# Check what authoritative servers actually serve
dig @ns1.example.com www.example.com +short

# Check what a public resolver sees (with TTL)
dig @8.8.8.8 www.example.com

# Compare multiple resolvers
dig @1.1.1.1 www.example.com +short
dig @8.8.8.8 www.example.com +short
dig @9.9.9.9 www.example.com +short

# Trace the full resolution path
dig +trace www.example.com

If authoritative servers show the correct record but resolvers don’t, the old record is still cached. Check the TTL in resolver responses to estimate when it will expire.

Key Takeaways

DNS caches at multiple layers: browser, OS, recursive resolver, CDN
TTL controls cache duration: lower = fresher data, higher = better performance
Negative responses are also cached: determined by SOA minimum field
Cache poisoning is a real threat: mitigated by port randomization, 0x20 encoding, and DNSSEC
Migration playbook: lower TTL → wait → change → verify → raise TTL
Always wait for the old TTL to expire before expecting changes to propagate

Understanding caching is what separates DNS operators from DNS debuggers. When things break, knowing where data is cached — and for how long — is often the key to diagnosis.

Next, we’ll explore the difference between authoritative and recursive resolvers in depth.