Imagine this. You wake up in the morning, go about your daily chores, and finally sit down to surf the web, read some news, check your mail, etc. A some point, you decide to log in to your bank to check your accounts. You get there, login, and you’re greeted with a page explaining that the site is down for maintenance. Oh well, you’ll come back later. In the meantime, someone drains your account using the username and password that you just graciously handed them, not realizing that the site you went to was not where you intended to go.
Sound familiar? Yeah, I guess it sounds a bit like a phishing attack, though a tad more sophisticated. I mean, you did type in the address for the bank yourself, didn’t you? It’s not like you clicked on a link in a email or something. But in the end, you arrived at the wrong site, cleverly designed, and gave them your information.
So how the hell did this happen? How could you end up at the wrong site when you personally put in the address, your computer has all the latest in virus scanning, firewalling, etc? You spelled it right, too! It’s almost as if someone took over the bank’s computer!
Well, they did. Sort of. But they did it without touching the bank’s computers at all. They used the DNS system to inject a false address for the bank website, effectively re-directing you to their site. How is this possible? Well, it’s a flaw in the DNS protocol itself that allows this. The Matasano Security blog posted about this on Monday, though the post was quickly removed. You may still be able to see the copy that Google has cached.
Let me start from the beginning. On July 8th, Dan Kaminsky announced that he had discovered a flaw in the DNS protocol and had been working, in secret, with vendors to release patches to fix this problem. This was a huge effort, one of the very first the world has ever seen. In the end, patches were released for Bind, Microsoft DNS, and others.
The flaw itself is interesting, to say the least. When a user requests an address for a domain, it usually goes to a local DNS cache for resolution. If the cache doesn’t know the answer, it follows a set of rules that eventually allow it to ask a server that is authoritative for that domain. When the cache asks the authoritative server, the packet contains a Query ID (QID). Since caches usually have multiple requests pending at any given time, the QID helps distinguish which response matches which request. Years ago, there was a way to spoof DNS by guessing the QID. This was pretty simple to do because the QID was sequential. So, the attacker could guess the QID and, if they could get their response back to the server faster than the authoritative server could, they would effectively hijack the domain.
So, vendors patched this flaw by randomizing the QID. Of course, if you have enough computing power, it’s still possible to guess the QID by cracking the random number generator. Difficult, but possible. However, the computing power to do this in a timely manner wasn’t readily available back then. So, 16-bit random QIDs were considered secure enough.
Fast forward to 2008. We have the power, and almost everyone with a computer has it. It is now possible to crack something like this in just a few seconds. So, this little flaw rears its ugly head once again. But there’s a saving grace here. When you request resolution for a domain name, you also receive additional data such as a TTL. The TTL, or Time To Live, defines how long an answer should be kept in the cache before asking for resolution again. This mechanism greatly reduces the amount of DNS traffic on the network because, in many cases, domain names tend to use the same IP address for weeks, months, and, in many cases, years. So, if the attacker is unsuccessful in his initial attack, he has to wait for the TTL to expire until he can try again.
There was another attack, back in the day, that allowed an attacker to overwrite entries in the cache, regardless of the TTL. As I mentioned before, when a DNS server responds, it can contain additional information. Some of this information is in the form of “glue” records. These are additional responses, included in the original response, that helps out the requester.
Let’s say, for instance, that you’re looking for the address for google.com. You ask your local cache, which doesn’t currently know the answer. It forwards that request on to the root servers responsible for .com domains using a process known as recursion. When the root server responds, the response will be the nameserver responsible for google.com, such as ns1.google.com. The cache now needs to contact ns1.google.com, but it does not know the address for that server, so it would have to make additional requests to the root servers to determine this. However, the root server already includes a glue record that gives the cache this information, without the cache asking for it. In a perfect world, this is wonderful because it makes the resolution process faster and reduces the amount of DNS traffic required. Unfortunately, this isn’t a perfect world. Attackers could exploit this by including glue records for domains that they were not authoritative for, effectively injecting records into the cache.
Again, vendors to the rescue! The concept of a bailiwick was introduced. In short, if a cache was looking for the address of google.com, and the response included the address for yahoo.com, it would ignore the yahoo.com information. This was known as a bailiwick check.
Ok, we’re safe now, right? Yeah, no. If we were safe, there wouldn’t be much for me to write about. No, times have changed… We now have the power to predict 16-bit random numbers, overcoming the QID problem. But TTL’s save us, right? Well, yes, sort of. But what happens if we combine these two attacks? Well, interesting things happen, actually.
What happens if we look up a nonexistent domain? Well, you get a response of NXDOMAIN, of course. Well yeah, but what happens in the background? Well, the cache goes through the exact same procedure it would normally go through for a valid domain. Remember, the cache has no idea that the domain doesn’t exist until it asks. Once it receives that NXDOMAIN, though, it will cache that response for a period of time, usually defined by the owner of the root domain itself. However, since it does go through the same process of resolving, there exists an attack vector that can be exploited.
So let’s combine the attacks. We know that we can guess the QID given enough guessing. And, we know that we can inject glue records for domains, provided they are within the same domain the response is for. So, if we can guess the QID, respond to a non-existent domain, and include glue records for a real domain, we can poison the cache and hijack the domain.
So now what? We already patched these two problems! Well, the short-term answer is another patch. The new patch adds additional randomness to the equation in the form of the source port. So, when a DNS server makes a request, it randomizes the QID and the source port. Now the attacker needs to guess both in order to be successful. This basically makes it a 32-bit number that needs to be guessed, rather than a 16-bit number. So, it takes a lot more effort on the part of the attacker. This helps, but, and this is important, it is still possible to perform this attack given enough time. This is NOT a permanent fix.
That’s the new attack in a nutshell. There may be additional details I’m not aware of, and Dan will be presenting them at the Blackhat conference in August. In the meantime, the message is to patch your server! Not every server is vulnerable to this, some, such as DJBDNS, have been randomizing source ports for a long time, but others are. If in doubt, check with your vendor.
This is pretty big news, and it’s pretty important. Seriously, this is not a joke. Check your servers and patch. Proof of concept code is in the wild already.