Hard drive failure reports

FAST ’07, the File and Storage Technology conference, was held from February 13th through the 16th. During the conference, a number of interesting papers were presented, two of which I want to highlight. I learned of these papers through posts on Slashdot rather than actually attending the conference. Honestly, I’m not a storage expert, but I find these studies interesting.

The first study, “Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?” was written by a Carnegie Mellon University professor, Garth Gibson, and a recent PhD graduate, Bianca Schroeder.

This study looked into the manufacturer’s specifications of MTTF, mean time to failure, and AFR, annual failure rate, compared to real-world hard drive replacement rates. The paper is heavily littered with statistical analysis, making it a rough read for some. However, if you can wade through all of the statistics, there is some good information here.

Manufacturers generally list MTTF rates of 1,000,000 to 1,500,000 hours. AFR is calculated by taking the number of hours in a year and dividing it by the MTTF. This means that the AFR ranges from 0.54% to 0.88%. In a nutshell, this means you have a 0.5 to 0.9% chance of your hard drive failing each year.

As explained in the study, determining whether a hard drive has failed or not is problematic at best. Manufacturers report that up to 40% of drives returned as bad are found to have no defects.

The study concludes that real world usage shows a much higher failure rate than that of the published MTTF values. Also, the failure rates between different types of drives such as SCSI, SATA, and FC, are similar. The authors go on to recommend some changes to the standards based on their findings.

The second study, “Failure Trends in a Large Disk Drive Population” was presented by a number of Google researchers, Eduardo Pinheiro, Wolf-Dietrich Weber and Luiz Andr´e Barroso. This paper is geared towards trying to find trends in the failures. Essentially, the goal is to create a reliable model to predict a drive failure so that the drive can be replaced before essential data is lost.

The researchers used an extensive database of hard drive statistics gathered from the 100,000+ hard drives deployed throughout their infrastructure. Statistics such as utilization, temperature, and a variety of SMART (Self-Monitoring Analysis and Reporting Technology) signals were collected over a five year period.

This study is well written and can be easily understood by non-academicians and those without statistical analysis training. The data is clearly laid out and each parameter studied is clearly explained.

Traditionally, temperature and utilization were pinpointed as the root cause of most failures. However, this study clearly shows a very small correlation between failure rates and these two parameters. In fact, failure rates due to high utilization seemed to be highest for drives under one year old, and stayed within 1% of low utilization drives. It was only at the end of a given drives expected lifetime that the failure rate due to high utilization jumped up again. Temperature was even more of a surprise showing that low temperature drives failed more often than high temperature drives until about the third year of life.

The report basically concludes that a reliable model of failure detection is mostly impossible at this time. The reason for this is that there is no clear indication of a reliable parameter for detecting imminent failure. SMART signals were useful in indicating impending failures and most drives fail within 60 days of the first reported errors. However, 36% of their failed drives reported no errors at all, making SMART a poor overall predictor.

Unfortunately, neither of these studies elaborated on the manufacturer or model of the drives used. This is likely due to professional courtesy and a lack of interest in being sued for defamation of character. While these studies will doubtlessly be useful to those designing large-scale storage networks, manufacturer specific information would be of great help.

For me, I mostly rely on Seagate hard drives. I’ve had very good luck with them, having had only a handful fail on me over the past few years. Maxtor used to be my second choice for drives, but they were acquired by Seagate at the end of 2005. I tend to stay away from Western Digital drives having had several bad experiences with them in the past. In fact, my brother had one of their drives literally catch fire and destroy his computer. IBM has also had some issues in the past, especially with their Deskstar line of drives which many people nicknamed the “Death Star” drive.

With the amount of information currently stored on hard drives today, and the massive amount in the future, hard drive reliability is a concern for many vendors. It should be a concern for end-users as well, although end-users are not likely to take this concern seriously. Overall these two reports are excellent overview of the current state of reliability and the trends seen today. Hopefully drive manufacturers can take these reports and use them to design changes to increase reliability, and to facilitate earlier detection of impending failures.

Book Review : 19 Deadly Sins of Software Security

Security is a pretty hot topic these days. Problems range from zombie computers acquired through viral attacks, to targeted intrusion of high-visibility targets. In many cases, insecure software is to blame for the security breach. With the increasing complexity of today’s software and the increased presence of criminals online, security is of the utmost importance.

19 Deadly Sins of Software Security was written by a trio of security researchers and software developers. The original list of 19 sins was developed by John Viega at the behest of Amit Yoran who was the Director of the Department of Homeland Security’s National Cyber Security Division. The list details 19 of the most common security flaws that exist in computer software.

The book details each flaw and the potential security risks posed when the flaw exists in your code. Examples of flawed software are presented to provide an insight into the seriousness of these flaws. The authors also detail ways to find these flaws in your code, and steps to prevent the problem in the future.

Overall the book covers most of the commonly known security flaws. These include SQL Injection, Cross Site Scripting, and Buffer Overruns. There are also a few lesser known flaws such as Integer Overflows and Format String problems.

The authors recognize that software flaws can also consist of conceptual and usability errors. For instance, one of the sins covered is the failure to protect network traffic. While the book goes into greater detail, this flaw generally means that the designer did not take into account the open network and failed to encrypt important data.

The last chapter covers usability. The authors detail how many applications leave too many options open for the user while making dialogs cryptic in nature. Default settings are either set too loose for proper security, or the fallback mechanisms used in the event of a failure cause more harm than good. As the Microsoft Security Response Center put it, “Security only works if the secure way also happens to be the easy way.”

This book is great for both novice and seasoned developers. As with most security books, it covers much of the same material, but is presented in new ways. Continual reminders about security can only help developers produce more secure code.

[Other References]

10 Immutable Laws of Security Administration

10 Immutable Laws of Security

Michael Howard’s Weblog

John Viega’s HomePage

Linux Software Raid

I had to replace a bad hard drive in a Linux box recently and I thought perhaps I’d detail the procedure I used.  This particular box uses software raid, so there are a few extra steps to getting the drive up and running.

Normally when a hard drive fails, you lose any data on it.  This is, of course, why we back things up.  In my case, I have two drives in a raid level 1 configuration.  There are a number of raid levels that dictate various states of redundancy (or lack thereof in the instance of level 0).  The raid levels are as follows (Copied from Wikipedia):

  • RAID 0: Striped Set
  • RAID 1: Mirrored Set
  • RAID 3/4: Striped with Dedicated Parity
  • RAID 5: Striped Set with Distributed Parity
  • RAID 6: Striped Set with Dual Distributed Parity

There are additional raid levels for nested raid as well as some non-standard raid levels.  For more information on those, see the Wikipedia article referenced above.

 

The hard drive in my case failed in kind of a weird way.  Only one of the partitions on the drive was malfunctioning.  Upon booting the server, however, the bios complained about the drive being bad.  So, better safe than sorry, I replaced the drive.

Raid level 1 is a mirrored raid.  As with most raid levels, the hard drives being raided should be identical.  It is possible to use different models and sizes in the same raid, but there are drawbacks such as a reduction in speed, possible increased failure rates, wasted space, etc.  Replacing a drive in a mirrored raid is pretty straightforward.  After identifying the problem drive, I physically removed the faulty drive and replaced it with a new one.

The secondary drive was the failed drive, so this replacement was pretty easy.  In the case of a primary drive failure, it’s easiest to move the secondary drive into the primary slot before replacing the failed drive.

Once the new drive has been installed, boot the system up and it should load up your favorite Linux distro.  The system should boot normally with a few errors regarding the degraded raid state.

After the system has booted, login to the system and use fdisk to partition the new drive.  Make sure you set the drive IDs back to Linux raid.  When finished, the partition table will look something like this :

   Device Boot      Start         End      Blocks   Id  System
/dev/hdb1   *           1          26      208813+  fd  Linux raid autodetect
/dev/hdb2              27        3850    30716280   fd  Linux raid autodetect
/dev/hdb3            3851        5125    10241437+  fd  Linux raid autodetect
/dev/hdb4            5126       19457   115121790    f  W95 Ext'd (LBA)
/dev/hdb5            5126        6400    10241406   fd  Linux raid autodetect
/dev/hdb6            6401        7037     5116671   fd  Linux raid autodetect
/dev/hdb7            7038        7164     1020096   82  Linux swap
/dev/hdb8            7165       19457    98743491   fd  Linux raid autodetect

Once the partitions have been set up, you need to format the drive with a filesystem.  This is a pretty painless process depending on your filesystem of choice.  I happen to be using ext3 as my filesystem, so I use the mke2fs program to format the drive.  To format an ext3 partition use the following command (This command, as well as the commands that follow, need to be run as root, so be sure to use sudo.) :

mke2fs -j /dev/hdb1

Once all of the drives have been formatted you can move on to creating the swap partition.  This is done using the mkswap program as follows :

mkswap /dev/hdb7

Once the swap drive has been formatted, activate it so the system can use it.  The swapon command achieves this goal :

swapon -a /dev/hdb7

And finally you can add the drives to the raid using mdadm.  mdadm is a single command with a plethora of uses.  It builds, monitors, and alters raid arrays.  To add a drive to the array use the following :

mdadm -a /dev/md1 /dev/hdb1

And that’s all there is to it.  If you’d like to watch the array rebuild itself, about as much fun as watching paint dry, you can do the following :

watch cat /proc/mdstat

And that’s all there is to it.  Software raid has come a long way and it’s quite stable these days.  I’ve been happily running it on my Linux machines for several years now.  It works well when hardware raid is not available or as a cheaper solution.  I’m quite happy with the performance and reliability of software raid and I definitely recommend it.

Carmack on the PS3 and 360

John Carmack, the 3D game engine guru from id Software and a game developer I hold in very high regard, and Todd Hollenshead, CEO of id Software, were recently interviewed by GameInformer. Carmack recently received a Technology Emmy for his work and innovation on 3D engines, a well deserved award.

I was a bit surprised while reading the interview. Carmack seems to be a pretty big believer in DirectX these days, and thinks highly of the XBox 360. On the flip side, he’s not a fan of the asymmetric CPU of the PS3 and thinks Sony has dropped the ball when it comes to tools. I never realized that Carmack was such a fan of DirectX. He used to tout OpenGL so highly.

Todd and Carmack also talked about episodic gaming. Their general consensus seems to be that episodic gaming just isn’t there yet. It doesn’t make sense because by the time you get the first episode out, you’ve essentially completed all of the development. Shipping episodes at that point doesn’t make sense since you’ve already spent the capital to make the game to begin with.

Episodic games seem like a great idea from the outside, but perhaps they’re right. Traditionally, the initial games have sold well, but expansion packs don’t. Episodic gaming may be similar in nature with respect to sales. If the content is right, however, perhaps episodes will work. But then there’s the issue of release times. If you release a 5-10 hour episode, when is the optimal time to release the next episode? You’ll have gamers who play the entire episode on the day it’s released and then get bored waiting for more. And then there’s the gamers who take their time and finish the episode in a week or two. If you release too early, you upset those some people who don’t want to have to pay for content constantly, while waiting may cause those bored customers to lose interest.

The interview covered a few more areas such as DirectX, Quakecon, and Hollywood. I encourage you to check it out, it makes for good reading!

Nerdcore Rising

I can’t remember when exactly I was introduced to MC Frontalot, but I do know it was a few years ago. It probably had something to do with Penny Arcade at the time.

Regardless, MC Frontalot is a rapper. I’m not really a rap type of person, but this particular rapper grabbed my attention. He raps about technology, gaming, and other topics that so-called Nerds are into. If you’re interested, he has a bunch of MP3s available on his site.

The interesting part of all of this is that he has a movie coming out called Nerdcore Rising. Well, that is, a movie is coming out that has him in it. Well, it’s more of a documentary, but you get the idea.

I’m actually finding myself pretty excited about seeing it and I thought I’d pass on the info. There are, to my knowledge, no confirmed bookings at this time, but you can request a booking via their homepage. And if you don’t get to see it in a theater, then perhaps you can pick it up on DVD when it comes out.

Check out the site, and check out some of the other Nerdcore rappers :

And if you’re interested in video game music in general, check these out :

SpamAssassin and Bayes

I’ve been messing around with SpamAssassin a lot lately and the topic of database optimization came up. I’m using Bayesian filtering to improve the spam scores and, to increase speed and manageability, I have SpamAssassin set to use MySQL as the database engine. Bayes is fairly resource intensive on both I/O and CPU depending on the current action being performed. Since I decided to use MySQL as the storage engine, most of the I/O is handled there.

I started looking into performance issues with Bayes recently and noticed a few “issues” that I’ve been trying to work out. The biggest issue is performance on the MySQL side. The Bayes database is enormous and it’s taking a while to deal with the queries. So, my initial thought was to look into reducing the size of the database.

There are a few different tables used by Bayes. The main table that grows the largest is the bayes_token table. That’s where all of the core statistical data is stored and it just takes up a lot of room. There’s not a lot that can be done about it. Or so I thought. Apparently if you have SpamAssassin set up to train Bayes automatically, it doesn’t always train the mail for the correct user. For instance, if you receive mail that is BCCed to you, then the mail could be learned for the user listed in the To: field. This means the Bayes database can contain a ton of “junk” in it that you’ll never use. So my first order of business then is to trim out the non-existent users.

The bayes_seen table is used to track the message IDs of messages that have already been parsed and learned by Bayes. A useful table to prevent unnecessary CPU utilization, but there is no automatic trimming function. This means the database grows indefinitely. The awl table is similar to this in that it can grow indefinitely and has no autotrim mechanism. For both of these tables I’ve added a timestamp field to monitor additions and updates. With that in place, I can write some simple Perl code to automatically trim entries that are sufficiently old enough to be irrelevant. For the bayes_seen database I plan on using a default lifetime of 1 month. For the awl I’m looking at dropping any entries with a single hit over 3 months old, and any entries over 1 month old with less than 5 hits. Since MySQL automatically updates the timestamp field for any changes to the row, this should be sufficient enough to keep any relevant entries from being deleted.

While researching all of this I was directed to a site about MySQL optimization. The MySQL Performance Blog is run by Peter Zaitsev and Vadim Tkachenko, both former MySQL employees. The entry I was directed to dealt with general MySQL optimization and is a great starting point for anyone using MySQL. I hate to admit it, but I was completely unaware that this much performance could be coaxed out of MySQL with these simple settings. While I was aware that tuning was possible, I just never dealt with a large enough database to warrant it.

I discovered, through the above blog and further research, that the default settings in MySQL are extremely conservative! By default, most of the memory allocation variables are maxed out at a mere 8 Megs of memory. I guess the general idea is to ship with settings that are almost guaranteed to work and allow the admin to tune the system from there.

I’m still tuning and playing with the parameters, but it looks like I’ve easily increased the speed of this beast by a factor of 5. It’s to the point now where a simple ‘show processlist’ is hardly listing any processes anymore because they’re completing so fast! I’ve been a fan of MySQL for a while now and I’ve been pretty impressed with the performance I’ve seen from it. With these changes and further tuning, I’m sure I’ll be even more impressed.

So today’s blog entry has a lesson to be learned. Research is key when deploying services like this, even if they’re for yourself. Definitely check into performance tuning for your systems. You’ll thank me later.

Linux Upgrades – Installation from a software raid

I recently had to upgrade a few machines with a newer version of Linux. Unfortunately, the CD-ROM drives in these machines were not functioning, so I decided to upgrade the system via a hard drive install. Hard drive installs are pretty simple and are part of the standard distro for Redhat. However, the machines I had to upgrade were all set up with software raid 1.

My initial thought was to merely put in the raid location where the install media was located. However, the installation program does not allow this and actually presents a list of acceptable locations.

So from here, I decided to choose one of the 2 raid drives and use that instead. The system graciously accepted this and launched the GUI installer to complete the process. All went smoothly until I reached the final step in the wizard, the actual install. At this point the installer crashed with a Python error. Upon inspection of the error, it appeared that the drive I was trying to use for the installation media was not available. Closer inspection revealed the truth, mdadm started up and activated all of the raid partitions on the system, making the partition I needed unavailable.

So what do I do now? I re-ran the installation and deleted the raid partition, being careful to leave the physical partitions in-tact. Again, the installer crashed at the same step. It seems that the installer scanned the entire hard drive for raid partitions whether they had valid mount points or not.

I finally solved the problem through a crude hack during the setup phase of the installation. I made sure to delete the raid partition once again, leaving the physical drives in-tact. I stepped through the entire process but stopped just before the final install step. At this point, I switched to the CLI console via Ctrl-Shift-F2. I created a small bash script that looked something like this :

#!/bin/bash

mdadm –stop /dev/md3

exec /bin/bash ./myscript.bash

I ran this script and switched back to the installer via Ctrl-F6. I proceeded with the installation and the installer happily installed the OS onto the drives. Once completed, I switched back to the CLI console, edited the new /etc/fstab file and added a mount point for the raid drive I used, and rebooted. The system came up without any issues and ran normally.

Just thought I’d share this with the rest of you, should you run into the same situation. It took me a while to figure out how to make the system do what I wanted. Just be sure that your install media is on a drive that the installer will NOT need to write files to.

Voting in an electronic world

Well, I did my civic duty and voted this morning. I have my misgivings about the entire election process and the corruption that abounds in the government, but if I refuse to vote, then I really can’t complain, can I.

So, after waking up and getting ready for work, I headed to the local polling location to check out the Diebold AccuVote TSX system they wanted me to vote on. It’s a neat looking machine from afar, but once I got up close, I was sorely disappointed.

I can’t put my finger on it exactly, but these seemed to be very flimsy, rushed systems. The touchscreen didn’t feel right, tho it was presumably accurate, lighting up my choices as I chose them. There was a slight delay after I touched the screen, however, and that was annoying. The first time I tried to vote, it rejected the card I was given and flashed an error about being cleared. Well, I hope that’s what it said. Thinking back on it now, I’m upset that I didn’t take more time to read the screen. I’m honestly not sure if the error stated that the card was cleared, or that the machine was cleared. And when I returned the card for one that worked, the lady I gave it to mentioned that there were a bunch of cards she was having problems with. Not good..

On a positive note, the mechanism that held and ejected the voting card seemed to be well built. It worked well. I think that’s about the only piece that I thought was decent though. Kinda pathetic actually.

Speaking of the Diebold machines, I urge you to check out the HBO special, “Hacking Democracy.” The entire show is up on Google Video for your viewing pleasure. You can access the video here.

Firefox 2.0 Released!

Firefox 2.0 was released earlier today. I wrote previously about this latest release while it was still in Beta. I recommended then that you check it out, and with the final release here, I’ll say it again! This is one of the best browsers out there. Give it a try, it’s easy to uninstall if you find you don’t like it!