DNS - the Catch-22

Alex talked earlier this morning about how little one thinks about DNS until it is gone/has failed you in some way.

This post will explain the problems we encountered, and cover the basics of DNS. Feel free to skip the tutorial if you are already familiar.

Basics of DNS

Everything on the internet today has an IP address. This is simply a street sign that tells you where you can find some resource, be it a server (like the server hosting this blog), a service endpoint (like email or a web site). When you sign onto the internet you want to get to these places, but you don’t want to remember addresses like 65.90.218.228. You want to use easy names like feedlounge.com, and alexking.org. DNS (Domain Name System) helps you do that.

You type feedlounge.com into the browser, the browser asks the DNS system who knows where feedlounge.com lives? Your DNS then queries that server (the authoritative server) for feedlounge.com’s address. Once your browser has that address, it then initiates a conversation with the feedlounge.com web server, and you finally get that pretty page you’ve always wanted.

What went wrong

What happened in our case is that the authoritative server went away (burned up), and the cached information on your DNS server expired. At this point, even though feedlounge.com is in the exact same place it was, your DNS doesn’t know where feedlounge.com is - and furthermore when it checks to find out, it gets no response.

The Catch-22, or why we were stuck

When we started FeedLounge, things were going fast and furious. We were in a constant change mode, and at the time we couldn’t live with the timing limits of DNS changes provided by our registrar, so we took over DNS responsibilities on our shared server, austin.kingdesign.net (RIP). At the time, it was just to make sure that changes could be made as timely (fast) as possible - to keep everyone happy. And DNS hummed along1 without ever having a problem… until it disappeared along with all our other websites, email, etc. this weekend.

Generally a DNS record has a TTL (time-to-live, the expiration date on the carton of a DNS record) of 24 hours.

The problem is that when your TTL of DNS records is turned down so low, bringing up another DNS server to serve new records can take what seems like FOREVER (24-48 hours). 24-48 hours isn’t bad when your TTL is also in the same range, and your current DNS server is alive. You just make the transition, and everyone eventually forgets about the old server, and then move on to the new server.

However, when your old server dies you are stuck.

So you turned down the TTL to be responsive, but when the DNS server went away, everyone forgot where you were! All for something that we had previously taken for granted because it “Just Worked”.

Why couldn’t we just set up a redirect?

Redirection is something that happens once you have found a resource to connect to. If you could get to feedlounge.com to receive a redirect, you would have been able to get to the real feedlounge.com. The DNS is the redirect, and if that fails, you’ve got bupkis. The fastest solution for this scenario is to register new authoritative name servers, and that is exactly what we did, it just takes a long time to fully transition.

Conclusion

As Alex said, don’t forget about the little things, like DNS.

If you are changing server addresses often, turn down your TTL to make sure the transition happens as fast as possible. To change DNS servers, turn it up, so that it happens as slow as possible in case your server melts down, and you need some breathing room to bring it back.

And ALWAYS have a backup.

  1. Like it had on that server since late 2003). [back]

Posted May 22nd, 2006 @ 9:14 PM in General by Scott

3 Replies to “DNS - the Catch-22”

  1. Deconstructing the FeedLounge Downtime

    If this were Slashdot, I’d file this under the so-meta-it-hurts department. It’s not, though. It’s IJSM.org, which means that the audience is smaller, “FR1ST P50T!” is a rarity, and Natalie Portman isn’t pouring ho…

    May 23rd, 2006 at 7:01 pm
  2. So….is there any plans to compensate paying users for the downtime?

    May 24th, 2006 at 10:51 am
  3. Guess not

    May 29th, 2006 at 1:08 pm

Add a Comment or Trackback