How does DNS updating work?

Recently, I came across an article about DNS by a great expert and I found it very good, so I decided to translate it myself.
If there are any mistakes, please feel free to correct them.

I have seen many people confused about updating DNS records for their websites to change the IP address. Why is it so slow? Do we really have to wait 2 days for all the changes to take effect? Why do some people see the new IP while others see the old one? What is happening?

Here, I will explain what happens behind the scenes when updating DNS.

Classification of DNS: Recursive vs. Authoritative DNS Servers#

First, we need to explain some knowledge about DNS. There are two types of DNS servers: authoritative servers and recursive servers.

Authoritative DNS servers (also known as name servers) have a database of IP addresses responsible for each domain name. For example, the authoritative DNS server for github.com is ns-421.awsdns-52.com. You can use the dig command to get the IP address of github.com.

dig @ns-421.awsdns-52.com github.com

Recursive DNS servers themselves do not know who owns what IP address. They determine the IP address of a domain name by querying the correct authoritative DNS server and then cache that IP address in case it is asked again. 8.8.8.8 is a recursive DNS server.

When people visit your website, they may be making DNS queries to recursive DNS servers. So, how do recursive DNS servers work? Let's take a look!

How does a recursive DNS server query github.com?#

Let's take an example of a recursive DNS server (like 8.8.8.8) and its functioning when you request the IP address (A record) of github.com. First - if some content is already cached, it will provide you with the cached content. But what if all the caches have expired? Here's what happens:

Step 1: Hardcode the IP address of the root DNS server in the source code. You can see this in the source code of unbound. Let's assume 198.41.0.4 is chosen from the beginning. This is the formal source of these hardcoded IP addresses, also known as the "root hints file".

Step 2: Query the root domain server for github.com.

We can roughly reproduce what happens with dig. This gives us a new authoritative name server for the .com IP, 192.5.6.30.

$ dig @198.41.0.4 github.com
...
com.			172800	IN	NS	a.gtld-servers.net.
...
a.gtld-servers.net.	172800	IN	A	192.5.6.30
...

The details of the DNS response are more complex than this - in this case, there is an authority section with some NS records and another section with A records, so you don't have to do an additional lookup to get the IP addresses of these name servers.

(Actually, it would already have the address of the .com name server cached 99.99% of the time, but let's pretend it's starting from scratch)

Step 3: Query the .com domain server for information about github.com.

    $ dig @192.5.6.30 github.com
    ...
    github.com.		172800	IN	NS	ns-421.awsdns-52.com.
    ns-421.awsdns-52.com.	172800	IN	A	205.251.193.165
    ...

We have a new IP address to ask about! This is the name server for github.com.

Step 4: Query the github.com domain server for information about github.com.

    $ dig @205.251.193.165 github.com
    
    github.com.		60	IN	A	140.82.112.4

OK! We now have an A record for github.com! Now, the recursive name server has the IP address of github.com and can return it to you. It does all this by just hardcoding a few IP addresses - the addresses of the root name servers.

How to see all the recursive DNS servers: `dig+trace`#

When I want to see what operations a recursive DNS server performs when resolving a domain name, I run the following command:

    $ dig @8.8.8.8 +trace github.com

This shows all the DNS records it queries, starting from the root DNS server - all the 4 steps we just went through.

How to update DNS records#

Now that we understand how DNS works, let's update some DNS records and see what happens.

When updating DNS records, there are two main options:

Keep the same name servers
Change the name servers

About TTL#

Here, let's explain the concept of TTL. As mentioned earlier, recursive DNS servers cache records until they expire, and the way they determine if a record should expire is by looking at its TTL (Time To Live).

In the example below, the A record for github's name server has a TTL of 60 returned by its DNS record, which means 60 seconds:

    $ dig @205.251.193.165 github.com
    github.com.		60	IN	A	140.82.112.4

This is a very short TTL, which theoretically means that if every DNS implementation followed the DNS standard, everyone should get the new IP address for github.com within 60 seconds if Github decides to change the IP address. Let's see how it actually works.

Option 1: Updating DNS records on the same name servers#

First, I updated my name servers (Cloudflare) to have a new DNS record: mapping test.jvns.ca to the A record 1.2.3.4.

    $ dig @8.8.8.8 test.jvns.ca
    test.jvns.ca.		299	IN	A	1.2.3.4

This takes effect immediately! No waiting is needed because there was no DNS record for test.jvns.ca to be cached before. But it looks like the new record is cached for about 5 minutes (299 seconds).

So, what happens if we try to change the IP? I changed it to 5.6.7.8 and then ran the same DNS query.

    $ dig @8.8.8.8 test.jvns.ca
    test.jvns.ca.		144	IN	A	1.2.3.4

Hmm, it seems like the DNS server's record for 1.2.3.4 is still cached for 144 seconds. Interestingly, if I query 8.8.8.8 multiple times, I actually get inconsistent results - sometimes it gives me the new IP, sometimes it gives me the old IP, probably because 8.8.8.8 actually load balances across a bunch of different backends, each with its own cache.

After waiting for 5 minutes, all the caches of 8.8.8.8 have been updated and consistently return the new 5.6.7.8 record. I must say, that's pretty fast!

You can't always rely on TTL#

Like most things on the Internet, not all implementations follow the DNS standard. Some ISP DNS servers will cache records for longer than the specified TTL, for example, 2 days instead of 5 minutes. People can always hardcode the old IP address in their /etc/hosts file.

When updating DNS records with a 5-minute TTL, what I would expect to happen in practice is that a large proportion of clients would quickly (within, say, 15 minutes) move to the new IP, and then there would be some slow clients that slowly update over the next few days.

Option 2: Updating the name servers#

We have seen that when you update the IP address without changing the name servers, many DNS servers will quickly get the new IP. But what if you change the name servers?

I don't want to update the name servers for my blog, so instead, I used a different domain and used examplecat.com as an example from the HTTP zine.

Previously, my name servers were set to dns1.p01.nsone.net. I decided to switch them to Google's name servers - ns-cloud-b1.googledomains.com, etc.

After making the change, my domain registrar popped up a message saying "Changes to examplecat.com have been saved. They will take effect within the next 48 hours." Then, I set a new A record for that domain pointing to 1.2.3.4.

OK, let's see if it has any effect:

    $ dig @8.8.8.8 examplecat.com
    examplecat.com.		17	IN	A	104.248.50.87

No change. If I ask other DNS servers, it will return the new IP address:

    $ dig @1.1.1.1 examplecat.com
    examplecat.com.		299	IN	A	1.2.3.4

But 8.8.8.8 remains unchanged. Even though I just made the change 5 minutes ago, the reason 1.1.1.1 can see the new IP is probably that no one has asked 1.1.1.1 about examplecat.com before, so it has no cache for it.

Name server TTL is longer#

The reason my domain registrar said "it will take 48 hours" is because the TTL on the NS records (how recursive name servers know which name servers to ask) is longer!

The new domain servers definitely return the new IP for examplecat.com.

    $ dig @ns-cloud-b1.googledomains.com examplecat.com
    examplecat.com.		300	IN	A	1.2.3.4

But remember what happened when we queried the github.com name servers?

    $ dig @192.5.6.30 github.com
    ...
    github.com.		172800	IN	NS	ns-421.awsdns-52.com.
    ns-421.awsdns-52.com.	172800	IN	A	205.251.193.165
    ...

172800 seconds is 48 hours! So, compared to updating just the IP address without changing the name servers, name server updates usually take longer to expire from caches and propagate in use.

How are name servers updated?#

When I update the name servers for examplecat.com, what happens is that the .com domain server gets the new records for the NS of the new domain. Like this:

    dig ns @j.gtld-servers.net examplecat.com
    
    examplecat.com.		172800	IN	NS	ns-cloud-b1.googledomains.com

But how does the new NS record get there? What happens is that I tell the domain registrar what new name servers I want for the domain by updating the domain on their website, and then the domain registrar tells the .com domain server to update.

For .com, these updates happen very quickly (within a few minutes), but I think for some other TLDs, the TLD domain servers might not apply the updates as quickly.

DNS resolver libraries in programs may also cache DNS records#

Another reason why TTL might not be followed in practice is that many programs need to resolve DNS names, and some programs will cache DNS records in memory indefinitely (until the program is restarted).

For example, AWS has an article about setting JVM TTL for DNS name lookups. I haven't written much JVM code that does DNS lookups myself, but from some careful reading about JVM and DNS, it seems like it's possible to configure the JVM to cache every DNS lookup indefinitely. (For example, this Elasticsearch issue)

p.s. TTL doesn't explain everything about how DNS works - even major DNS servers like 8.8.8.8, some recursive DNS servers definitely do not respect TTL. So, even if you update an A record with a short TTL, it's still quite possible to get some requests for the old IP within a day or two.

Reference:
https://jvns.ca/blog/how-updating-dns-works

Originally posted on my personal blog: 方寸之间