dns – Technological Musings

NANOG 46 – Final Thoughts

Nanog 46 is wrapping up today and it has been an incredible experience. This particular Nanog seemed to have an underlying IPv6 current to it, but, if you believe the reports, IPv6 is going to have to become the standard in the next couple of years. We’ll be running dual-stack configurations for some time to come, but IPv6 rollout is necessary.

To date, I haven’t had a lot to do with IPv6. A few years ago I set up one of the many IPv6 shims, just to check out connectivity, but never really went anywhere with it. It was nothing more than a tech demo at the time, with no real content out there to bother with. Content exists today, however, and will continue to grow as time moves on.

IPv6 connectivity is still spotty and problematic for some, though, and there doesn’t seem to be a definitive, workable solution. For instance, if your IPv6 connectivity is not properly configured, you may lose access to some sites as you receive DNS responses pointing you at IPv6 content, but that you cannot reach. This results in either a major delay in falling back to IPv4 connectivity, or complete breakage. So one of the primary problems right now is whether or not to send AAAA record responses to DNS requests when the IPv6 connectivity status of the receiver is unknown. Google, from what I understand, is using a whitelist system. When a provider has sufficient IPv6 connectivity, Google adds them to their whitelist and the provider is then able to receive AAAA records.

Those problems aside, I think rolling out IPv6 will be pretty straightforward. My general take on this is to run dual-stack to start, and probably for the forseeable future, and getting the network to hand out IPv6 addresses. Once that’s in place, then we can start offering AAAA records for services. I’m still unsure at this point how to handle DNS responses to users with possibly poor v6 connectivity.

Another area of great interest this time around is DNSSEC. I’m still quite skeptical about DNSSEC as a technology, partly due to ignorance, partly due to seeing problems with what I do understand. Rest assured, once I have a better handle on this, I’ll finish up my How DNS Works series.

I’m all for securing the DNS infrastructure and doing something to ensure that DNS cannot be poisoned the same way it can today. DNSSEC aims to add security to DNS such that you can trust the responses you receive. However, I have major concerns with what I’ve seen of DNSSEC so far. One of the bigger problems I see is that each and every domain (zone) needs to be signed. Sure, this makes sense, but my concern is the cost involved to do so. SSL Certificates are not cheap and are a recurring cost. Smaller providers may run into major issues with funding such security. As a result, they will be unable to sign their domains and participate in the secure infrastructure.

Another issue I find extremely problematic is the fallback to TCP. Cryptographic signatures are big, and they tend to be bigger, the larger the key you use. As a result, DNS responses are exceeding the size of UDP and falling back to TCP. One reason DNS works so well today is that the DNS server doesn’t have to worry about retransmissions, state of connections, etc. There is no handshake required, and the UDP packets just fly. It’s up to the client to retransmit if necessary. When you move to TCP, the nature of the protocol means that both the client and server need to keep state information and perform any necessary retransmissions. This takes up socket space on the server, takes time, and uses up many more CPU cycles. Based on a lightning talk during today’s session, when the .ORG domain was signed, they saw a 100-fold increase in TCP connections, moving from less than 1 query per second to almost 100. This concerns me greatly as the majority of the Internet has not enabled DNSSEC at this point. I can see this climbing even more, eventually overwhelming the system and bringing DNS to its knees.

I also believe that moving in this direction will allow the “bad guys” to DoS attack servers in much easier ways as they can easily trigger TCP transactions, perform various TCP-based attacks, and generally muck up the system further.

So what’s the alternative? Well, there is DNSCurve, though I know even less about that as it’s very much a fringe technology at this point. In fact, the first workable patch against djbdns was only released in the past few weeks. It’s going to take some time to absorb what’s out there, but based on the current move to DNSSEC, my general feeling is that no matter how much better DNSCurve may or may not be, it doesn’t have much of a chance. Even so, there’s a lot more to learn in this arena.

I also participated in a Security BOF. BOFs are, essentially, less structured talks on a given subject. There is a bit more audience participation and the audience tends to be a bit smaller. The Security BOF was excellent as there were conversations about abuse, spam, and methods of dealing with each. The spam problem is, of course, widespread and it’s comforting to know that you’re not the only one without a definitive answer. Of course, the flip side of that is that it’s somewhat discouraging to know that even the big guys such as Google are still facing major problems with spam. The conversation as a whole, though, was quite enlightening and I learned a lot.

One of the more exciting parts of Nanog for me, though, was to meet some of the Internet greats. I’ve talked to some of these folks via email and on various mailing lists, but to meet them in person is a rare honor. I was able to meet and speak with both Randy Bush and Paul Vixie, both giants in their fields. I was able to rub elbows with folks from Google, Yahoo, and more. I’ve exchanged PGP keys with several people throughout the conference, serving as a geek’s autograph. I have met some incredible people and I look forward to talking with them in the future.

If you’re a network operator, or your interests lie in that direction, I strongly encourage you to make a trip to at least one NANOG in your lifetime. I’m hooked at this point and I’m looking forward to being able to attend more meetings in the future.

Hacking the Infrastructure – How DNS works – Part 2

Welcome back. In part 1, I discussed the technical details of how DNS works. In this part, I’ll introduce you to some of the more common DNS server packages. In a future post I will cover some of the common problems with DNS as well as proposed solutions. So let’s dive right in.

The most popular DNS server, by far, is BIND, the Berkley Internet Name Domain. BIND has long and storied past. On the one hand, it’s one of the oldest packages for serving DNS, dating back to the early 1980’s, and on the other, it has a reputation for being one of the most insecure. BIND started out as a graduate student project at the University of California at Berkley, and was maintained by the Computer Systems Research Group. In the late 1980’s, the Digital Equipment Corporation helped with development. Shortly after that, Paul Vixie became the primary developer and eventually formed the Internet Systems Consortium which maintains BIND to this day.

Being the most popular DNS software out there, BIND suffers from the same malady that affects Microsoft Windows. It’s the most popular, most widely installed, and, as a result, hackers can gain the most by breaking it. In short, it’s the most targeted of DNS server softwares. Unlike Windows, however, BIND is open source and should benefit from the extra scrutiny that usually entails, but, alas, it appears that BIND is pretty tightly controlled by the ISC. From the ISC site, I do not see any publicly accessible software repository, no open discussion of code changes, and nothing else that really marks a truly open source application. The only open-source bits I see are a users mailing list and source code downloads. Beyond that, it appears that you either need to be a member of the “Bind Forum,” or wait for new releases with little or no input.

Not being an active user of BIND, I cannot comment too much on the current state of BIND other than what I can find publicly available. I do know that BIND supports just about every DNS convention there is out there. That includes standard DNS, DNSSEC, TSIG, and IPv6. The latter three of these are relatively new. In fact, the current major version of BIND, version 9, was written from the ground up specifically for DNSSEC support.

In late 1999, Daniel J. Bernstein, a professor at the University of Illinois, wrote a suite of DNS tools known as djbdns. Bernstein is a mathematician, cryptographer, and a security expert. He used all of these skills to produce a complete DNS server that he claimed had no security holes in it. He went as far as offering a security guarantee, promising to pay $1000 to the first person to identify a verifiable security hole in djbdns. To date, no one has been able to claim that money. As recently as 2004, djbdns was the second most popular DNS server software.

The primary reason for the existence of djbdns is Bernstein’s dissatisfaction with BIND and the numerous security problems therein. Having both security and simplicity in mind, Bernstein was able to make djbdns extremely stable and secure. In fact, djbdns was unaffected by the recent Kaminsky vulnerability, which affected both BIND and Microsoft DNS. Additionally, configuration and maintenance are both simple, straightforward processes.

On the other hand, the simplicity of djbdns may become its eventual downfall. Bernstein is critical of both DNSSEC and IPv6 and has offered no support for either of these. While some semblance of IPv6 support was added via a patch provided by a third party, I am unaware of any third-party DNSSEC support. Let me be clear, however, while the IPv6 patch does add additional support for IPv6, djbdns itself can already handle serving the AAAA records required for IPv6. The difference is that djbdns only talks over IPv4 transport while the patch adds support for IPv6 transport.

Currently, it is unclear at to whether Bernstein will ever release a new version of djbdns with support for any type of “secure” DNS.

The Microsoft DNS server has existed since Windows NT 3.51 was shipped back in 1995. It was included as part of the Microsoft BackOffice, a collection of software intended for use by small businesses. As of 2004, it was the third most popular DNS server software. According to Wikipedia, Microsoft DNS is based on BIND 4.3 with, of course, lots of Microsoft extensions. Microsoft DNS has become more and more important with new releases of Windows Server. Microsoft’s Active Directory relies heavily on Microsoft DNS and the dynamic DNS capabilities included. Active Directory uses a number of special DNS entries to identify services and allow machines to locate them. It’s an acceptable use of DNS, to be sure, but really makes things quite messy and somewhat difficult to understand.

I used Microsoft DNS for a period of time after Windows 2000 was released. At the time, I was managing a small dial-up network and we used Active
Directory and Steel-Belted RADIUS for authentication. Active Directory integration allowed us to easily synchronize data between the two sites we had, or so I thought. Because we were using Active Directory, the easiest thing to do was to use Microsoft DNS for our domain data and as a cache for customers. As we found out, however, Microsoft DNS suffered from some sort of cache problem that caused it to stop answering DNS queries after a while. We suffered with that problem for a short period of time and eventually switched over to djbdns.

There are a number of other DNS servers out there, both good and bad. I have no experience with any of them other than to know some of them by reputation. Depending on what happens in the future with the security of DNS, however, I predict that a lot of the smaller DNS packages will fall by the wayside. And while I have no practical experience with BIND beyond using it as a simple caching nameserver, I can only wonder why such a package claiming to be open source, but so guarded as it is, maintains its dominance. Perhaps I’m mistaken, but thus far I have found nothing that contradicts my current beliefs.

Next time we’ll discuss some of the more prevalent problems with DNS and DNS security. This will lead into a discussion of DNSSEC and how it works (or, perhaps, doesn’t work) and possible alternatives to DNSSEC. If you have questions and/or comments, please feel free to leave them in the comment section.

Hacking the Infrastructure – How DNS works – Part 1

Education time… I want to learn a bit more about DNS and DNSSEC in particular, so I’m going to write a series of articles about DNS and how it all works. So, let’s start at the beginning. What is DNS, and why do we need it?

DNS, the Domain Name Service, is a hierarchical naming system used primarily on the Internet. In simple terms, DNS is a mechanism by which the numeric addresses assigned to the various computers, routers, etc. are mapped to alphanumeric names, known as domain names. As it turns out, humans tend to be able to remember words a bit easier than numbers. So, for instance, it is easier to remember blog.godshell.com as opposed to 204.10.167.1.

But, I think I’m getting a bit ahead of myself. Let’s start back closer to the beginning. Back when ARPANet was first developed, the developers decided that it would be easier to name the various computers connected to ARPANet, rather than identifying them by number. So, they created a very simplistic mapping system that consisted of name and address pairs written to a text file. Each line of the text file identified a different system. This file became known as the hosts file.

Initially, each system on the network was responsible for their own hosts file, which naturally resulted in a lot of systems either unaware of others, or unable to contact them easily. To remedy this, it was decided to make an “official” version of the hosts file and store it in a central location. Each node on ARPANet then downloaded the hosts file at a fairly regular interval, keeping the entire network mostly in-sync with new additions. As ARPANet began to grow and expand, the hosts file grew larger. Eventually, the rapid growth of ARPANet made updating and distributing the hosts file a difficult endeavor. A new system was needed.

In 1983, Paul Mockapetris, one of the early ARPANet pioneers, worked to develop the first implementation of DNS, called Jeeves. Paul wrote RFC 882 and RFC 883, the original RFCs describing DNS and how it should work. RFC 882 describes DNS itself and what it aims to achieve. It describes the hierarchical structure of DNS as well as the various identifiers used. RFC 883 describes the initial implementation details of DNS. These details include items such as message formats, field formats, and timeout values. Jeeves was based on these two initial RFCs.

So now that we know what DNS is and why it was developed, let’s learn a bit about how it works.

DNS is a hierarchical system. This means that the names are assigned in an ordered, logical manner. As you are likely aware, domain names are generally strings of words, known as labels, connected by a period, such as blog.godshell.com. The rightmost label is known as the top-level domain. Each label to the left is a sub-domain of the label to the right. For the domain name blog.godshell.com, com is the top-level domain, godshell is a sub-domain of com, and blog is a sub-domain of godshell.com. Information about domain names is stored in the name server in a structure called a resource record.

Each domain, be it a top level domain, or a sub-domain, is controlled by a name server. Some name servers control a series of domains, while others control a single domain. These various areas of control are called zones. A name server that is ultimately responsible for a given zone is known as an authoritative name server. Note, multiple zones can be handled by a single name server, and multiple name servers can be authoritative for the same zone, though they should be in primary and backup roles.

Using our blog.godshell.com example, the com top-level domain is in one zone, while godshell.com and blog.godshell.com are in another. There is another zone as well, though you likely don’t see it. That zone is the root-zone, usually represented by a single period after the full domain name, though almost all modern internet programs automatically append the period at the end, making it unnecessary to specify it explicitly. The root-zone is pretty important, too, as it essentially ties together all of the various domains. You’ll see what I mean in a moment.

Ok, so we have domains and zones. We know that zones are handled individually by different name servers, so we can infer that the name servers talk to each other somehow. If we infer further, we can guess that a single name resolution probably involves more than two name servers. So how exactly does all of this work? Well, that process depends on the type of query being used to perform the name resolution.

There are two types of queries, recursive and non-recursive. The query type is negotiated by the resolver, the software responsible for performing the name resolution. The simpler of the two queries is the non-recursive query. Simply put, the resolver asks the name server for non-recursive resolution and gets an immediate answer back. That answer is generally the best answer the name server can give. If, for instance, the name server queried was a caching name server, it is possible that the domain you requested was resolved before. If so, then the correct answer can be given. If not, then you will get the best information the name server can provide which is usually a pointer to a name server that will know more about that domain. I’ll cover caching more a little later.

Recursive queries are probably the most common type of query. A recursive query aims to completely resolve a given domain name. It does this by following a few simple steps. Resolution begins with the rightmost label and moves left.

The resolver asks one of the root name servers (that handle the root-zone) for resolution of the rightmost label. The root server responds with the address of a server who can provide more information about that domain label.
Query the next server about the next label to the left. Again, the server will respond with the address of a server that will know more about that domain label, or, possibly, an authoritative answer for the domain.
Repeat step 2 until the final answer is given.

These steps are rather simplistic, but give a general idea of how DNS works. Let’s look at an example of how this works. For this example, I will be using the dig command, a standard Linux command commonly used to debug DNS. To simplify things, I’m going to use the +trace option which does a complete recursive lookup, printing the responses along the way.

$ dig +trace blog.godshell.com
; <<>> DiG 9.4.2-P2 <<>> +trace blog.godshell.com
;; global options: printcmd
. 82502 IN NS i.root-servers.net.
. 82502 IN NS e.root-servers.net.
. 82502 IN NS h.root-servers.net.
. 82502 IN NS g.root-servers.net.
. 82502 IN NS m.root-servers.net.
. 82502 IN NS a.root-servers.net.
. 82502 IN NS k.root-servers.net.
. 82502 IN NS c.root-servers.net.
. 82502 IN NS j.root-servers.net.
. 82502 IN NS d.root-servers.net.
. 82502 IN NS f.root-servers.net.
. 82502 IN NS l.root-servers.net.
. 82502 IN NS b.root-servers.net.
;; Received 401 bytes from 192.168.1.1#53(192.168.1.1) in 5 ms

This first snippet shows the very first query sent to the local name server (192.168.1.1) which is defined on the system I’m querying from. This is often configured automatically via DHCP, or hand-entered when setting up the computer for the first time. This output has a number of fields, so let’s take a quick look at them. First, any line preceded by a semicolon is a comment. Comments generally contain useful information on what was queried, what options were used, and even what type of information is being returned.

The rest of the lines above are responses from the name server. As can be seen from the output, the name server responded with numerous results, 13 in all. Multiple results is common and means the same information is duplicated on multiple servers, commonly for load balancing and redundancy. The fields, from left to right, are as follows : domain, TTL, class, record type, answer. The domain field is the current domain being looked up. In the example above, we’re starting at the far right of our domain with the root domain (defined by a single period).

TTL stands for Time To Live. This field defines the number of seconds this data is good for. This information is mostly intended for caching name servers. It lets the cache know how much time has to pass before the cache must look up the answer again. This greatly reduces DNS load on the Internet as a whole, as well as decreasing the time it takes to obtain name resolution.

The class field defines the query class used. Query classes can be IN (Internet), CH (Chaos), HS (Hesiod), or a few others. Generally speaking, most queries are of the Internet class. Other classes are used for other purposes such as databases.

Record type defines the type of record you’re looking at. There are a number of these, the most common being A, PTR, CNAME, MX, and NS. An A record is ultimately what most name resolution is after. It defines a mapping from a domain name to an IP address. A PTR record is the opposite of an A record. It defines the mapping of an IP Address to a domain name. CNAME is a Canonical name record, essentially an alias for another record. MX is a mail exchanger record which defines the name of a server responsible for mail for the domain being queried. And finally, an NS record is a name server record. These records generally define the name server responsible for a given domain.

com. 172800 IN NS a.gtld-servers.net.
com. 172800 IN NS b.gtld-servers.net.
com. 172800 IN NS c.gtld-servers.net.
com. 172800 IN NS d.gtld-servers.net.
com. 172800 IN NS e.gtld-servers.net.
com. 172800 IN NS f.gtld-servers.net.
com. 172800 IN NS g.gtld-servers.net.
com. 172800 IN NS h.gtld-servers.net.
com. 172800 IN NS i.gtld-servers.net.
com. 172800 IN NS j.gtld-servers.net.
com. 172800 IN NS k.gtld-servers.net.
com. 172800 IN NS l.gtld-servers.net.
com. 172800 IN NS m.gtld-servers.net.
;; Received 495 bytes from 199.7.83.42#53(l.root-servers.net) in 45 ms

Our local resolver has randomly chosen an answer from the previous response and queried that name server (l.root-servers.net) for the com domain. Again, we received 13 responses. This time, we are pointed to the gtld servers, owned by Network Solutions. The gtld servers are responsible for the .com and .net top-level domains. These are two of the most popular TLDs available.

godshell.com. 172800 IN NS ns1.emcyber.com.
godshell.com. 172800 IN NS ns2.incyberspace.com.
;; Received 124 bytes from 192.55.83.30#53(m.gtld-servers.net) in 149 ms

Again, our local resolver has chosen a random answer (m.gtld-servers.net) and queried for the next part of the domain, godshell.com. This time, we are told that there are only two servers responsible for that domain.

blog.godshell.com. 3600 IN A 204.10.167.1
godshell.com. 3600 IN NS ns1.godshell.com.
godshell.com. 3600 IN NS ns2.godshell.com.
;; Received 119 bytes from 204.10.167.61#53(ns2.incyberspace.com) in 23 ms

Finally, we randomly choose a response from before and query again. This time we receive three records in response, an A record and two NS records. The A record is the answer we were ultimately looking for. The two NS records are authority records, I believe. Authority records define which name servers are authoritative for a given domain. They are ultimately responsible for giving the “right” answer.

That’s really DNS in a nutshell. There’s a lot more, of course, and we’ll cover more in the future. Next time, I’ll cover the major flavors of name server software and delve into some of the problems with DNS today. So, thanks for stickin’ around! Hopefully you found this informative and useful. If you have questions and/or comments, please feel free to leave them in the comment section.

Detecting DNS cache poisoning

I spoke with a good friend of mine last week about his recent trip to NANOG. While he was there, he listened to a talk about detecting DNS cache poisoning. However, this was detection at the authoritative server, not at the cache itself. This is a bit different than detection at a cache because most cache poisoning will happen outside of your domain.

I initially wrote about the Kaminsky DNS bug a while back, and this builds somewhat on that discussion. When a cache poisoning attack is underway, the attacker must spoof the source IP of the DNS response. From what I can tell, this is because the resolver is told by the root servers who the authoritative server is for the domain. Thus, if a response comes back from a non-authoritative IP, it won’t be accepted.

So let’s look at the attack briefly. The attacker starts requesting a large number of addresses, something to the tune of a.example.com, b.example.com, etc. While those packets are being sent, the attacker sends out the responses with the spoofed headers. Since we are now guessing both the QID *and* the port, we miss a lot because the port is incorrect.

When the server receives a packet on a port that is not expecting data, it responds with an ICMP message, “Destination Port Unreachable.” That ICMP message is sent to the source IP of the packet, which is the spoofed authoritative IP. This is known as ICMP backscatter.

Administrators of authoritative name servers can monitor for ICMP backscatter and identify possible cache poisoning attacks. In most cases, there is nothing that can be done directly to mitigate these attacks, but it is possible to identify the cache being attacked and notify the admin. Cooperation between administrators can lead to a complete mitigation of the attack and protection of clients who may be harmed.

This is an excellent example of the type of data you can identify simple through passive monitoring on your local network.

Steal the Net’s Identity

Imagine this. You wake up in the morning, go about your daily chores, and finally sit down to surf the web, read some news, check your mail, etc. A some point, you decide to log in to your bank to check your accounts. You get there, login, and you’re greeted with a page explaining that the site is down for maintenance. Oh well, you’ll come back later. In the meantime, someone drains your account using the username and password that you just graciously handed them, not realizing that the site you went to was not where you intended to go.

Sound familiar? Yeah, I guess it sounds a bit like a phishing attack, though a tad more sophisticated. I mean, you did type in the address for the bank yourself, didn’t you? It’s not like you clicked on a link in a email or something. But in the end, you arrived at the wrong site, cleverly designed, and gave them your information.

So how the hell did this happen? How could you end up at the wrong site when you personally put in the address, your computer has all the latest in virus scanning, firewalling, etc? You spelled it right, too! It’s almost as if someone took over the bank’s computer!

Well, they did. Sort of. But they did it without touching the bank’s computers at all. They used the DNS system to inject a false address for the bank website, effectively re-directing you to their site. How is this possible? Well, it’s a flaw in the DNS protocol itself that allows this. The Matasano Security blog posted about this on Monday, though the post was quickly removed. You may still be able to see the copy that Google has cached.

Let me start from the beginning. On July 8th, Dan Kaminsky announced that he had discovered a flaw in the DNS protocol and had been working, in secret, with vendors to release patches to fix this problem. This was a huge effort, one of the very first the world has ever seen. In the end, patches were released for Bind, Microsoft DNS, and others.

The flaw itself is interesting, to say the least. When a user requests an address for a domain, it usually goes to a local DNS cache for resolution. If the cache doesn’t know the answer, it follows a set of rules that eventually allow it to ask a server that is authoritative for that domain. When the cache asks the authoritative server, the packet contains a Query ID (QID). Since caches usually have multiple requests pending at any given time, the QID helps distinguish which response matches which request. Years ago, there was a way to spoof DNS by guessing the QID. This was pretty simple to do because the QID was sequential. So, the attacker could guess the QID and, if they could get their response back to the server faster than the authoritative server could, they would effectively hijack the domain.

So, vendors patched this flaw by randomizing the QID. Of course, if you have enough computing power, it’s still possible to guess the QID by cracking the random number generator. Difficult, but possible. However, the computing power to do this in a timely manner wasn’t readily available back then. So, 16-bit random QIDs were considered secure enough.

Fast forward to 2008. We have the power, and almost everyone with a computer has it. It is now possible to crack something like this in just a few seconds. So, this little flaw rears its ugly head once again. But there’s a saving grace here. When you request resolution for a domain name, you also receive additional data such as a TTL. The TTL, or Time To Live, defines how long an answer should be kept in the cache before asking for resolution again. This mechanism greatly reduces the amount of DNS traffic on the network because, in many cases, domain names tend to use the same IP address for weeks, months, and, in many cases, years. So, if the attacker is unsuccessful in his initial attack, he has to wait for the TTL to expire until he can try again.

There was another attack, back in the day, that allowed an attacker to overwrite entries in the cache, regardless of the TTL. As I mentioned before, when a DNS server responds, it can contain additional information. Some of this information is in the form of “glue” records. These are additional responses, included in the original response, that helps out the requester.

Let’s say, for instance, that you’re looking for the address for google.com. You ask your local cache, which doesn’t currently know the answer. It forwards that request on to the root servers responsible for .com domains using a process known as recursion. When the root server responds, the response will be the nameserver responsible for google.com, such as ns1.google.com. The cache now needs to contact ns1.google.com, but it does not know the address for that server, so it would have to make additional requests to the root servers to determine this. However, the root server already includes a glue record that gives the cache this information, without the cache asking for it. In a perfect world, this is wonderful because it makes the resolution process faster and reduces the amount of DNS traffic required. Unfortunately, this isn’t a perfect world. Attackers could exploit this by including glue records for domains that they were not authoritative for, effectively injecting records into the cache.

Again, vendors to the rescue! The concept of a bailiwick was introduced. In short, if a cache was looking for the address of google.com, and the response included the address for yahoo.com, it would ignore the yahoo.com information. This was known as a bailiwick check.

Ok, we’re safe now, right? Yeah, no. If we were safe, there wouldn’t be much for me to write about. No, times have changed… We now have the power to predict 16-bit random numbers, overcoming the QID problem. But TTL’s save us, right? Well, yes, sort of. But what happens if we combine these two attacks? Well, interesting things happen, actually.

What happens if we look up a nonexistent domain? Well, you get a response of NXDOMAIN, of course. Well yeah, but what happens in the background? Well, the cache goes through the exact same procedure it would normally go through for a valid domain. Remember, the cache has no idea that the domain doesn’t exist until it asks. Once it receives that NXDOMAIN, though, it will cache that response for a period of time, usually defined by the owner of the root domain itself. However, since it does go through the same process of resolving, there exists an attack vector that can be exploited.

So let’s combine the attacks. We know that we can guess the QID given enough guessing. And, we know that we can inject glue records for domains, provided they are within the same domain the response is for. So, if we can guess the QID, respond to a non-existent domain, and include glue records for a real domain, we can poison the cache and hijack the domain.

So now what? We already patched these two problems! Well, the short-term answer is another patch. The new patch adds additional randomness to the equation in the form of the source port. So, when a DNS server makes a request, it randomizes the QID and the source port. Now the attacker needs to guess both in order to be successful. This basically makes it a 32-bit number that needs to be guessed, rather than a 16-bit number. So, it takes a lot more effort on the part of the attacker. This helps, but, and this is important, it is still possible to perform this attack given enough time. This is NOT a permanent fix.

That’s the new attack in a nutshell. There may be additional details I’m not aware of, and Dan will be presenting them at the Blackhat conference in August. In the meantime, the message is to patch your server! Not every server is vulnerable to this, some, such as DJBDNS, have been randomizing source ports for a long time, but others are. If in doubt, check with your vendor.

This is pretty big news, and it’s pretty important. Seriously, this is not a joke. Check your servers and patch. Proof of concept code is in the wild already.