The Hard Drive Shuffle

I recently decided to pick up some new hard drives to replace the ones in my server that have been dutifully running for far too many years (somewhere in the 5-10 years range) without any failures. Because the process of swapping things out was a tad more involved that simply replacing the drives, I thought I’d write it up.

My server has nine drives in it across several SATA controllers. One of the drives is an SSD where I put the OS, two drives are mirrored on a slower SATA controller, and the remaining six are in a RAID6 array. The RAID6 array is the target of this upgrade. I picked up some shiny new Seagate Exos X16 16TB drives to replace them.

Before I could start replacing the drives, I needed to figure out which slots corresponded to which drives. Unfortunately, there are no lights I can blink on the case to figure this out, so a quick shutdown of the system so I can pull all the drives and write down the serial number and what slot they were in. Once complete, I was able to use udevadm to figure out which device corresponded to which serial number.

[root@server root]# udevadm info --query=all --name=/dev/sda | grep ID_SERIAL
E: ID_SERIAL=ST16000NM001G-SERIALNUM

The RAID array is set up using Linux Software Raid, so mdadm is the tool I’ll be using to fail and replace the drives, one by one. After some reading, the process is as follows :

First, fail and then remove the device to be replaced from the array :

[root@server root]# mdadm --manage /dev/md0 --fail /dev/sda1
[root@server root]# mdadm --manage /dev/md0 --remove /dev/sda1

The drive can now be replaced with the new drive. I took an additional step to print out labels for each drive with the serial number on them and labelled the slots. Once the drive has been re-added, you can watch /var/log/messages to see when the new drive is detected by the OS. It can take a minute or two for this to happen.

May 5 12:06:00 server kernel: ata11: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
May 5 12:06:00 server kernel: ata11.00: ATA-11: ST16000NM001G-2KK103, SNA3, max UDMA/133
May 5 12:06:00 server kernel: ata11.00: 31251759104 sectors, multi 16: LBA48 NCQ (depth 32), AA
May 5 12:06:00 server kernel: ata11.00: Features: NCQ-sndrcv
May 5 12:06:00 server kernel: ata11.00: configured for UDMA/133
May 5 12:06:00 server kernel: scsi 10:0:0:0: Direct-Access ATA ST16000NM001G-2K SNA3 PQ: 0 ANSI: 5
May 5 12:06:00 server kernel: sd 10:0:0:0: [sda] 31251759104 512-byte logical blocks: (16.0 TB/14.6 TiB)
May 5 12:06:00 server kernel: sd 10:0:0:0: [sda] 4096-byte physical blocks
May 5 12:06:00 server kernel: sd 10:0:0:0: [sda] Write Protect is off
May 5 12:06:00 server kernel: sd 10:0:0:0: Attached scsi generic sg0 type 0
May 5 12:06:00 server kernel: sd 10:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 5 12:06:00 server kernel: sd 10:0:0:0: [sda] Preferred minimum I/O size 4096 bytes
May 5 12:06:00 server kernel: sd 10:0:0:0: [sda] Attached SCSI disk

Now that the drive has been detected, you can partition it and then re-add it to the array. These are pretty huge disks, so I used gdisk to partition them. The process is quite easy. Run gdisk on the drive you want to partition, choose “o” to write a new GPT, choose “n” to create a new partition. I simply hit enter to choose the partition number, first, and last sector. Then enter “fd00” to set the partition to Linux RAID. Finally, you can print out what you’ve done with “p” or jump right to entering “x” to write the partition table and exit.

[root@server root]# gdisk /dev/sda
GPT fdisk (gdisk) version 1.0.7

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present

Creating new GPT entries in memory.

Command (? for help): o
This option deletes all partitions and creates a new protective MBR.
Proceed? (Y/N): y

Command (? for help): n
Partition number (1-128, default 1):
First sector (34-31251759070, default = 2048) or {+-}size{KMGTP}:
Last sector (2048-31251759070, default = 31251759070) or {+-}size{KMGTP}:
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): fd00
Changed type of partition to 'Linux RAID'

Command (? for help): p
Disk /dev/sdf: 31251759104 sectors, 14.6 TiB
Model: ST16000NM001G-2K
Sector size (logical/physical): 512/4096 bytes
Disk identifier (GUID): XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 31251759070
Partitions will be aligned on 2048-sector boundaries
Total free space is 2014 sectors (1007.0 KiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048     31251759070   14.6 TiB    FD00  Linux RAID

Command (? for help): w

Final checks complete. About to write GPT data. THIS WILL OVERWRITE EXISTING
PARTITIONS!!

Do you want to proceed? (Y/N): y
OK; writing new GUID partition table (GPT) to /dev/sda.
mdThe operation has completed successfully.

Now that the drive is paritioned, you can add it into the array.

[root@server root]# mdadm --manage /dev/md0 --add /dev/sda1

And now you have to wait as the array resyncs. With a RAID6 array, you can technically replace two drives at the same time, but you increase your risk because if an additional drive fails, you can kiss goodbye to the data. In fact, when I started this process on my server, the first drive I pulled caused some sort of restart on the SATA controller and I ended up with two failed drives instead of just the one. Since the array already saw the drives as failed, I just went ahead and replaced both at the same time. I have backups, though, so I wouldn’t recommend this to everyone.

You can see the current state of the resync by watching the /proc filesystem. Specifically, the /proc/mdstat file.

[root@server root]# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid6 sdf1[11] sdg1[10] sdh1[9] sdi1[8] sdb1[7] sda1[6]
      7813525504 blocks super 1.2 level 6, 512k chunk, algorithm 2 [6/5] [UU_UUU]
      [>....................]  recovery =  0.0% (148008/1953381376) finish=659.8min speed=49336K/sec
      bitmap: 3/15 pages [12KB], 65536KB chunk

unused devices: <none>

It can take a while. Each drive took about 12-14 hours to resync.

Once you’ve replace the last drive and the array has resynced (YAY!), you have a few more steps to follow. First, we need to tell the system that the array size has increased. Again, a simple process.

[root@server root]# mdadm --grow /dev/md0 --size=max

And once again, you’re resyncing, but your array has now grown to it’s new size.

I wasn’t quite complete, however. On my system, I have an LVM on top of the RAID array, so I needed to tell LVM that the array was larger so I could then allocate that space where needed. This was easily accomplished with one simple command.

[root@server root]# pvresize /dev/md0
  Physical volume "/dev/md0" changed
  1 physical volume(s) resized or updated / 0 physical volume(s) not resized

And poof, the PV was resized as was the associated VG. Lots more space available to allocate where necessary.