For Want Of DNS, The Network Goes Amiss
I thought I’d wrap up this week with a personal story from three-plus weeks ago. Maybe it’ll remind you of how vulnerable seemingly robust systems can actually be. Minimally, I hope it’ll reassure you that after almost 12 years as an EDN editor, I’ve still ‘got game’ in the debug arena
On November 12, the Firefox Foundation released v3.04 of its flagship browser. The next morning, after waiting what I felt was an appropriate amount of time to monitor for early-adopter negative feedback (and hearing none), I tackled the upgrade myself beginning with my primary system, the Apple MacBook Air. The upgrade downloaded and seemed to install fine, although Firefox did crash during shutdown, prior to the auto-update step.
I relaunched Firefox, the update successfully finished, and for a few minutes all seemed to be well. Then I noticed that although I could navigate just fine within sites that I already had open within Firefox, attempts to access new sites (by, for example, clicking on URLs embedded within emails) failed with ‘destination unreachable’ errors. That’s strange…
Then those sites that had previously still been working started failing. I thought at this point that I had a munged Firefox install by virtue of the earlier crash. To confirm my hunch I launched Apple’s Safari browser…which also failed to pull up any sites I tried accessing. Was my system’s OS X TCP/IP stack messed up?
Here’s the odd thing, though…my VMware Fusion-virtualized Windows XP build running on the same machine was still able to download email. Since the virtual machine leverages the foundation operating system’s networking hardware and software, I was pretty sure my networking stack was intact. Pretty sure…until the virtual machine failed a few minutes later, too.
My next debug step was to launch both Firefox (which I hadn’t yet updated from v3.03 to v3.04) and Internet Explorer on the Dell laptop sitting on the shelf behind me. Neither browser on that machine could pull up websites, either, so I was pretty sure I had a LAN-wide problem. It was at that point I remembered that since my new router (unlike its predecessor) allowed for manual entry of DNS servers versus relying solely on those provided by AT&T DSL’s PPPoE setup, I’d configured the settings to point at OpenDNS.
I punched into both systems’ browsers the IP address of a friend’s WAN-accessible router whose static IP assignment I remembered, and the router’s configuration screens pulled up fine on both systems, thereby confirming that URL-to-IP address translation (i.e. DNS) was the root cause of the problems I was having. After deleting the OpenDNS server override I’d earlier configured, I rebooted my router, refreshed the Dell laptop’s DHCP assignment, and this particular system was subsequently working fine again…
…but post-DHCP refresh the MacBook Air was still hosed when I fed it URLs. This one baffled me for a few minutes, I admit, until I remembered that since this was my travel system, I’d also manually entered OpenDNS’s servers in its TCP/IP settings so that I’d be sure I was using them no matter what Internet connection I was employing at the time. I deleted the OpenDNS settings here, too, and all was well.
So what did I learn?
- Operating systems cache already-requested DNS results for a short period of time
- Don’t assume that a DNS server is operating properly just because it promptly responds to ‘ping’ requests (yes, this was another step in my debug process)
- Don’t assume that a piece of software you’ve just installed or upgraded is the root cause of a problem you’re having; coincidences do sometimes occur, and
- Disregard my earlier recommendations; for now, at least, don’t rely on OpenDNS. I’m not necessarily bugged that the service’s servers hiccupped…I’d never encountered a problem like this before through over a year’s worth of use, and after all, ‘six sigma‘ uptime still isn’t 100%. What bugs me is that OpenDNS’s status page (which, for perhaps obvious reasons, relies on a hard-coded IP address instead of a URL) never reported the problem.
I’ll forward this writeup to my OpenDNS contacts and, if they have anything notable to share in response, I’ll pass it along here at Brian’s Brain. Happy weekend, all.