I had to replace a bad hard drive in a Linux box recently and I thought perhaps I’d detail the procedure I used. This particular box uses software raid, so there are a few extra steps to getting the drive up and running.
Normally when a hard drive fails, you lose any data on it. This is, of course, why we back things up. In my case, I have two drives in a raid level 1 configuration. There are a number of raid levels that dictate various states of redundancy (or lack thereof in the instance of level 0). The raid levels are as follows (Copied from Wikipedia):
- RAID 0: Striped Set
- RAID 1: Mirrored Set
- RAID 3/4: Striped with Dedicated Parity
- RAID 5: Striped Set with Distributed Parity
- RAID 6: Striped Set with Dual Distributed Parity
There are additional raid levels for nested raid as well as some non-standard raid levels. For more information on those, see the Wikipedia article referenced above.
The hard drive in my case failed in kind of a weird way. Only one of the partitions on the drive was malfunctioning. Upon booting the server, however, the bios complained about the drive being bad. So, better safe than sorry, I replaced the drive.
Raid level 1 is a mirrored raid. As with most raid levels, the hard drives being raided should be identical. It is possible to use different models and sizes in the same raid, but there are drawbacks such as a reduction in speed, possible increased failure rates, wasted space, etc. Replacing a drive in a mirrored raid is pretty straightforward. After identifying the problem drive, I physically removed the faulty drive and replaced it with a new one.
The secondary drive was the failed drive, so this replacement was pretty easy. In the case of a primary drive failure, it’s easiest to move the secondary drive into the primary slot before replacing the failed drive.
Once the new drive has been installed, boot the system up and it should load up your favorite Linux distro. The system should boot normally with a few errors regarding the degraded raid state.
After the system has booted, login to the system and use fdisk to partition the new drive. Make sure you set the drive IDs back to Linux raid. When finished, the partition table will look something like this :
Device Boot Start End Blocks Id System /dev/hdb1 * 1 26 208813+ fd Linux raid autodetect /dev/hdb2 27 3850 30716280 fd Linux raid autodetect /dev/hdb3 3851 5125 10241437+ fd Linux raid autodetect /dev/hdb4 5126 19457 115121790 f W95 Ext'd (LBA) /dev/hdb5 5126 6400 10241406 fd Linux raid autodetect /dev/hdb6 6401 7037 5116671 fd Linux raid autodetect /dev/hdb7 7038 7164 1020096 82 Linux swap /dev/hdb8 7165 19457 98743491 fd Linux raid autodetect
Once the partitions have been set up, you need to format the drive with a filesystem. This is a pretty painless process depending on your filesystem of choice. I happen to be using ext3 as my filesystem, so I use the mke2fs program to format the drive. To format an ext3 partition use the following command (This command, as well as the commands that follow, need to be run as root, so be sure to use sudo.) :
mke2fs -j /dev/hdb1
Once all of the drives have been formatted you can move on to creating the swap partition. This is done using the mkswap program as follows :
mkswap /dev/hdb7
Once the swap drive has been formatted, activate it so the system can use it. The swapon command achieves this goal :
swapon -a /dev/hdb7
And finally you can add the drives to the raid using mdadm. mdadm is a single command with a plethora of uses. It builds, monitors, and alters raid arrays. To add a drive to the array use the following :
mdadm -a /dev/md1 /dev/hdb1
And that’s all there is to it. If you’d like to watch the array rebuild itself, about as much fun as watching paint dry, you can do the following :
watch cat /proc/mdstat
And that’s all there is to it. Software raid has come a long way and it’s quite stable these days. I’ve been happily running it on my Linux machines for several years now. It works well when hardware raid is not available or as a cheaper solution. I’m quite happy with the performance and reliability of software raid and I definitely recommend it.