Friday, June 26. 2009
I have a few servers with hardware RAID directly on the motherboard. They're not the best boards in the world, but they process my data and serve up the information I want. Recently, I noticed that one of the servers was running on the /dev/sdb* devices, which was extremely odd. Digging some more, it seemed that /dev/sda* existed and seemed to be ok, but wasn't being used. After some searching, I was able to determine that the server, when built, actually booted up on /dev/mapper/via_* devices, which were actually the hardware RAID. At some point these devices disappeared. To make matters worse, it seems that kernel updates weren't being applied correctly. My guess is that either the grub update was failing, or it updated a boot loader somewhere that wasn't actually being used to boot. As a result, an older kernel was loading, with no way to get to the newer kernel. I spent some time tonight digging around with Google, posting messages on the CentOS forums, and digging around on the system itself. With guidance from a user via the forums, I discovered that my system should be using dmraid, which is a program that discovers and runs RAID devices such as the one I have. Digging around a bit more with dmraid and I found this : [user@dev ~]$ sudo /sbin/dmraid -ay -v Password: INFO: via: version 2; format handler specified for version 0+1 only INFO: via: version 2; format handler specified for version 0+1 only RAID set "via_bfjibfadia" was not activated [user@dev ~]$
Apparently my RAID is running version 2 and dmraid only supports versions 0 and 1. Since this was initially working, I'm at a loss as to why my RAID is suddenly not supported. I suppose I can rebuild the machine, again, and check, but the machine is about 60+ miles from me and I'd rather not have to migrate data anyway. So how does one go about fixing such a problem? Is my RAID truly not supported? Why did it work when I built the system? What changed? If you know what I'm doing wrong, I'd love to hear from you... This one has me stumped. But fear not, when I have an answer, I'll post a full writeup!
Thursday, June 25. 2009
Slashdot posted a news item late last evening about some rather stunning photos from the International Space Station. On June 12th, the Sarychev Peak volcano erupted. At the same time, the ISS happened to be right overhead. What resulted was some incredible imagery, provided to the public by NASA. Check out the images below: 

You can find more images and information here. Isn't nature awesome?
Wednesday, June 17. 2009
Nanog 46 is wrapping up today and it has been an incredible experience. This particular Nanog seemed to have an underlying IPv6 current to it, but, if you believe the reports, IPv6 is going to have to become the standard in the next couple of years. We'll be running dual-stack configurations for some time to come, but IPv6 rollout is necessary. To date, I haven't had a lot to do with IPv6. A few years ago I set up one of the many IPv6 shims, just to check out connectivity, but never really went anywhere with it. It was nothing more than a tech demo at the time, with no real content out there to bother with. Content exists today, however, and will continue to grow as time moves on. IPv6 connectivity is still spotty and problematic for some, though, and there doesn't seem to be a definitive, workable solution. For instance, if your IPv6 connectivity is not properly configured, you may lose access to some sites as you receive DNS responses pointing you at IPv6 content, but that you cannot reach. This results in either a major delay in falling back to IPv4 connectivity, or complete breakage. So one of the primary problems right now is whether or not to send AAAA record responses to DNS requests when the IPv6 connectivity status of the receiver is unknown. Google, from what I understand, is using a whitelist system. When a provider has sufficient IPv6 connectivity, Google adds them to their whitelist and the provider is then able to receive AAAA records. Those problems aside, I think rolling out IPv6 will be pretty straightforward. My general take on this is to run dual-stack to start, and probably for the forseeable future, and getting the network to hand out IPv6 addresses. Once that's in place, then we can start offering AAAA records for services. I'm still unsure at this point how to handle DNS responses to users with possibly poor v6 connectivity. Another area of great interest this time around is DNSSEC. I'm still quite skeptical about DNSSEC as a technology, partly due to ignorance, partly due to seeing problems with what I do understand. Rest assured, once I have a better handle on this, I'll finish up my How DNS Works series. I'm all for securing the DNS infrastructure and doing something to ensure that DNS cannot be poisoned the same way it can today. DNSSEC aims to add security to DNS such that you can trust the responses you receive. However, I have major concerns with what I've seen of DNSSEC so far. One of the bigger problems I see is that each and every domain (zone) needs to be signed. Sure, this makes sense, but my concern is the cost involved to do so. SSL Certificates are not cheap and are a recurring cost. Smaller providers may run into major issues with funding such security. As a result, they will be unable to sign their domains and participate in the secure infrastructure. Another issue I find extremely problematic is the fallback to TCP. Cryptographic signatures are big, and they tend to be bigger, the larger the key you use. As a result, DNS responses are exceeding the size of UDP and falling back to TCP. One reason DNS works so well today is that the DNS server doesn't have to worry about retransmissions, state of connections, etc. There is no handshake required, and the UDP packets just fly. It's up to the client to retransmit if necessary. When you move to TCP, the nature of the protocol means that both the client and server need to keep state information and perform any necessary retransmissions. This takes up socket space on the server, takes time, and uses up many more CPU cycles. Based on a lightning talk during today's session, when the .ORG domain was signed, they saw a 100-fold increase in TCP connections, moving from less than 1 query per second to almost 100. This concerns me greatly as the majority of the Internet has not enabled DNSSEC at this point. I can see this climbing even more, eventually overwhelming the system and bringing DNS to its knees. I also believe that moving in this direction will allow the "bad guys" to DoS attack servers in much easier ways as they can easily trigger TCP transactions, perform various TCP-based attacks, and generally muck up the system further. So what's the alternative? Well, there is DNSCurve, though I know even less about that as it's very much a fringe technology at this point. In fact, the first workable patch against djbdns was only released in the past few weeks. It's going to take some time to absorb what's out there, but based on the current move to DNSSEC, my general feeling is that no matter how much better DNSCurve may or may not be, it doesn't have much of a chance. Even so, there's a lot more to learn in this arena. I also participated in a Security BOF. BOFs are, essentially, less structured talks on a given subject. There is a bit more audience participation and the audience tends to be a bit smaller. The Security BOF was excellent as there were conversations about abuse, spam, and methods of dealing with each. The spam problem is, of course, widespread and it's comforting to know that you're not the only one without a definitive answer. Of course, the flip side of that is that it's somewhat discouraging to know that even the big guys such as Google are still facing major problems with spam. The conversation as a whole, though, was quite enlightening and I learned a lot. One of the more exciting parts of Nanog for me, though, was to meet some of the Internet greats. I've talked to some of these folks via email and on various mailing lists, but to meet them in person is a rare honor. I was able to meet and speak with both Randy Bush and Paul Vixie, both giants in their fields. I was able to rub elbows with folks from Google, Yahoo, and more. I've exchanged PGP keys with several people throughout the conference, serving as a geek's autograph. I have met some incredible people and I look forward to talking with them in the future. If you're a network operator, or your interests lie in that direction, I strongly encourage you to make a trip to at least one NANOG in your lifetime. I'm hooked at this point and I'm looking forward to being able to attend more meetings in the future.
I'm here in sunny Philadelphia, attending NANOG46, a conference for network operators. The conference, thus far, has been excellent, with some great information being disseminated. One of the talks was by a long-time Internet pioneer, Paul Vixie. Vixie has had his hands in a lot of different projects ranging from being the primary author of BIND for many years, starting MAPS way back in 1996, and more recently, involvement with the Conficker Working Group.
Vixie's talk was titled "Internet Superbugs and The Art of War," and was about the struggle between Internet operators and the "criminal" element that uses the Internet for spam, DDOS attack, etc. The crux of the talk centered around the fact that it costs the bad guys next to nothing to continually evolve their attacks and use the network for their nefarious activities. On the flip side, however, it costs the network operators a good deal of time and money to try and stop these attacks.
Years ago, attacks were generally sourced from a single location and it was relatively easy to mitigate them. In addition, tracking down the source of the attack was simple enough, so legal action could be taken. At the very least, the network provider upstream from the attacker could disable the account and stop the attack.
Fast forward to today and we have botnets that are used for sending spam, performing DDOS attacks, and causing other sorts of havoc. It becomes next to impossible to mitigate a DDOS attack because the attack can be sourced from hundreds and thousands of machines simultaneously. This costs the bad guys nothing to deploy because users are largely ignorant and don't understand the importance of patching and securing their networks. This results in millions of machines on the Internet that are exploitable. The bad guys write viruses, worms, trojans, etc. that infect these machines and turn them into zombie machines for their botnet.
Fighting these attacks becomes an exercise in futility. We use blacklists to block traffic from places we know are sending spam, we use anti-virus software to prevent infection of our machines, and more. When Conficker was detected and analyzed, researchers realized that this infection was a new evolution of attack. Conficker used cryptographic signatures to verify updates, pseudo-random lists of websites for updates, and more. The website lists are an excellent example of the costs paid by the good guys vs the bad guys.
The first generation of Conficker used a generated list of websites for updates. This list was 250 sites per day, making it difficult, but not impossible to mitigate. So, the people fighting this outbreak started buying up these domains in an attempt to prevent Conficker from updating. The authors of Conficker responded by upping this list to 50,000 per day, making it nearly impossible to buy them up. Fortunately, the people working to prevent the outbreak were able to work with ICANN and the various ccTLD companies to monitor and block purchases of these sites. Sites that already existed were thoroughly checked to ensure they weren't hosting the new version of Conficker.
Vixie brought up an interesting point about all of this activity, though. The authors of Conficker made a relatively simple change to Conficker to make it use 50,000 domains. The people fighting Conficker spent many hours and days, not to mention a significant amount of money, to mitigate this. Smaller ccTLD companies that don't have 24x7 abuse staff are unable to cope. They don't have the budget to be able to do all of this work for free. As the workload climbs, they're more likely to turn a blind eye.
All of this, in turn, means that our current mode of reacting to these attacks and mitigating them does not scale. It merely results in lost revenue and frustration. Additionally, creating lists of places to avoid, generating lists of bad content, etc. will never be able to scale over time. There is a breaking point, somewhere, and at that point we have no recourse unless we change our way of thinking.
Along the same line of thought, I came across a pretty decent quote today, originally posted by Don Franke from ISC(2):
"PC security is no longer about a virus that trashes your hard drive. It's about botnets made up of millions of unpatched computers that attack banks, infrastructures, governments. Bandwidth caps will contribute to this unless the thinking of Internet providers and OS vendors change. Because we are all inter-connected now."
If you read the original post, it explains how moving to bandwidth caps will only exacerbate the security problem because users will no longer be interested in wasting time downloading updates, but rather saving that bandwidth for things they're interested in.
Overall, it was a very interesting talk and a very different way of thinking. There is no definitive answer as to what direction we need to go in to resolve this, but it's definitely something that needs to be investigated.
Tuesday, June 16. 2009
So yeah, the background of the site is green. Why? Simply put, it's a show of support for those in Iran fighting for their freedom. Check out the main media outlets, CNN, BBC, etc. And you can follow more on my other blog if you are so inclined. I'm not going to update at all here about Iran related stuff, this is a tech blog. But I'll show my support nonetheless.
|