Instant Kernel-ification

 

Server downtime is the scourge of all administrators, sometimes to the extent of bypassing necessary security upgrades, all in the name of keeping machines online.  Thanks to an MIT graduate student, Jeffery Brian Arnold, keeping a machine online, and up to date with security patches, may be easier than ever.

Ksplice, as the project is called, is a small executable that allows an administrator the ability to patch security holes in the Linux kernel, without rebooting the system.  According to the Ksplice website :

“Ksplice allows system administrators to apply security patches to the Linux kernel without having to reboot. Ksplice takes as input a source code change in unified diff format and the kernel source code to be patched, and it applies the patch to the corresponding running kernel. The running kernel does not need to have been prepared in advance in any way.”

Of course, Ksplice is not a perfect silver bullet, some patches cannot be applied using Ksplice.  Specifically, any patch that require “semantic changes to data structures” cannot be applied to the running kernel.  A semantic change is a change “that would require existing instances of kernel data structures to be transformed.”

But that doesn’t mean that Ksplice isn’t useful.  Jeffery looked at 32 months of kernel security patches and found that 84% of them could be applied using Ksplice.  That’s sure to increase the uptime.

I have to wonder, though, what is so important that you need that much uptime.  Sure, it’s nice to have the system run all the time, but if you have something that is absolutely mission critical, that must run 24×7, regardless, won’t you have a backup or two?  Besides which, you generally want to test patches before applying them to such sensitive systems.

There are, of course, other uses for this technology.  As noted on the Ksplice website, you can also use Ksplice to “add debugging code to the kernel or to make any other code changes that do not modify data structure semantics.”  Jeffery has posted a paper detailing how the technology works.

Pretty neat technology.  I wonder if this will lead to zero downtime kernel updates direct from Linux vendors.  As it is now, you’ll need to locate and manually apply kernel patches using this tool.

 

Backups? Where?

It’s been a bit hectic, sorry for the long time between posting.

 

So, backups.  Backups are important, we all know that.  So how many people actually follow their own advice and back their data up?  Yeah, it’s a sad situation for desktops.  The server world is a little different, though, with literally tens, possibly hundreds of different backup utilities available.

 

My preferred backup tool of choice is the Advanced Maryland Automatic Network Disk Archiver, or AMANDA for short.  AMANDA has been around since before 1997 and has evolved into a pretty decent backup system.  Initially intended for single tape-based backups, options have been added recently to allow for tape spanning and disk-based backups as well.

Getting started with AMANDA can be a bit of a chore.  The hardest part, at least for me, was getting the tape backup machine running.  Once that was out of the way, the rest of it was pretty easy.  The config can be a little overwhelming if you don’t understand the options, but there are a lot of guides on the Internet to explain it.  In fact, the “tutorial” I originally used is located here.

Once it’s up and running, you’ll receive a daily email from Amanda letting you know how the previous nights backup went.  All of the various AMANDA utilities are command-line based.  There is no official GUI at all.  Of course, this causes a lot of people to shy away from the system.  But overall, once you get the hang of it, it’s pretty easy to use.

Recovery from backup is a pretty simple process.  On the machine you’re recovering, run the amrecover program.  You then use regular filesystem commands to locate the files you want to restore and add them to the restore list.  When you’ve added all the files, issue the extract command and it will restore all of the files you’ve chosen.  It’s works quite well, I’ve had to use it once or twice…  Lemme tell ya, the first time I had to restore from backups I was sweatin bullets..  After the first one worked flawlessly, subsequent restores were completed with a much lower stress level.  It’s great to know that there are backups available in the case of an emergency.

AMANDA is a great tool for backing up servers, but what about clients?  There is a Windows client as well that runs using Cygwin, a free open-source Linux-like environment for Windows.  Instructions for setting something like this up are located in the AMANDA documentation.  I haven’t tried this, but it doesn’t look too hard.  Other client backup options include remote NFS and SAMBA shares.

Overall, AMANDA is a great backup tool that has saved me a few times.  I definitely recommend checking it out.

Linux Software Raid

I had to replace a bad hard drive in a Linux box recently and I thought perhaps I’d detail the procedure I used.  This particular box uses software raid, so there are a few extra steps to getting the drive up and running.

Normally when a hard drive fails, you lose any data on it.  This is, of course, why we back things up.  In my case, I have two drives in a raid level 1 configuration.  There are a number of raid levels that dictate various states of redundancy (or lack thereof in the instance of level 0).  The raid levels are as follows (Copied from Wikipedia):

  • RAID 0: Striped Set
  • RAID 1: Mirrored Set
  • RAID 3/4: Striped with Dedicated Parity
  • RAID 5: Striped Set with Distributed Parity
  • RAID 6: Striped Set with Dual Distributed Parity

There are additional raid levels for nested raid as well as some non-standard raid levels.  For more information on those, see the Wikipedia article referenced above.

 

The hard drive in my case failed in kind of a weird way.  Only one of the partitions on the drive was malfunctioning.  Upon booting the server, however, the bios complained about the drive being bad.  So, better safe than sorry, I replaced the drive.

Raid level 1 is a mirrored raid.  As with most raid levels, the hard drives being raided should be identical.  It is possible to use different models and sizes in the same raid, but there are drawbacks such as a reduction in speed, possible increased failure rates, wasted space, etc.  Replacing a drive in a mirrored raid is pretty straightforward.  After identifying the problem drive, I physically removed the faulty drive and replaced it with a new one.

The secondary drive was the failed drive, so this replacement was pretty easy.  In the case of a primary drive failure, it’s easiest to move the secondary drive into the primary slot before replacing the failed drive.

Once the new drive has been installed, boot the system up and it should load up your favorite Linux distro.  The system should boot normally with a few errors regarding the degraded raid state.

After the system has booted, login to the system and use fdisk to partition the new drive.  Make sure you set the drive IDs back to Linux raid.  When finished, the partition table will look something like this :

   Device Boot      Start         End      Blocks   Id  System
/dev/hdb1   *           1          26      208813+  fd  Linux raid autodetect
/dev/hdb2              27        3850    30716280   fd  Linux raid autodetect
/dev/hdb3            3851        5125    10241437+  fd  Linux raid autodetect
/dev/hdb4            5126       19457   115121790    f  W95 Ext'd (LBA)
/dev/hdb5            5126        6400    10241406   fd  Linux raid autodetect
/dev/hdb6            6401        7037     5116671   fd  Linux raid autodetect
/dev/hdb7            7038        7164     1020096   82  Linux swap
/dev/hdb8            7165       19457    98743491   fd  Linux raid autodetect

Once the partitions have been set up, you need to format the drive with a filesystem.  This is a pretty painless process depending on your filesystem of choice.  I happen to be using ext3 as my filesystem, so I use the mke2fs program to format the drive.  To format an ext3 partition use the following command (This command, as well as the commands that follow, need to be run as root, so be sure to use sudo.) :

mke2fs -j /dev/hdb1

Once all of the drives have been formatted you can move on to creating the swap partition.  This is done using the mkswap program as follows :

mkswap /dev/hdb7

Once the swap drive has been formatted, activate it so the system can use it.  The swapon command achieves this goal :

swapon -a /dev/hdb7

And finally you can add the drives to the raid using mdadm.  mdadm is a single command with a plethora of uses.  It builds, monitors, and alters raid arrays.  To add a drive to the array use the following :

mdadm -a /dev/md1 /dev/hdb1

And that’s all there is to it.  If you’d like to watch the array rebuild itself, about as much fun as watching paint dry, you can do the following :

watch cat /proc/mdstat

And that’s all there is to it.  Software raid has come a long way and it’s quite stable these days.  I’ve been happily running it on my Linux machines for several years now.  It works well when hardware raid is not available or as a cheaper solution.  I’m quite happy with the performance and reliability of software raid and I definitely recommend it.

Linux Upgrades – Installation from a software raid

I recently had to upgrade a few machines with a newer version of Linux. Unfortunately, the CD-ROM drives in these machines were not functioning, so I decided to upgrade the system via a hard drive install. Hard drive installs are pretty simple and are part of the standard distro for Redhat. However, the machines I had to upgrade were all set up with software raid 1.

My initial thought was to merely put in the raid location where the install media was located. However, the installation program does not allow this and actually presents a list of acceptable locations.

So from here, I decided to choose one of the 2 raid drives and use that instead. The system graciously accepted this and launched the GUI installer to complete the process. All went smoothly until I reached the final step in the wizard, the actual install. At this point the installer crashed with a Python error. Upon inspection of the error, it appeared that the drive I was trying to use for the installation media was not available. Closer inspection revealed the truth, mdadm started up and activated all of the raid partitions on the system, making the partition I needed unavailable.

So what do I do now? I re-ran the installation and deleted the raid partition, being careful to leave the physical partitions in-tact. Again, the installer crashed at the same step. It seems that the installer scanned the entire hard drive for raid partitions whether they had valid mount points or not.

I finally solved the problem through a crude hack during the setup phase of the installation. I made sure to delete the raid partition once again, leaving the physical drives in-tact. I stepped through the entire process but stopped just before the final install step. At this point, I switched to the CLI console via Ctrl-Shift-F2. I created a small bash script that looked something like this :

#!/bin/bash

mdadm –stop /dev/md3

exec /bin/bash ./myscript.bash

I ran this script and switched back to the installer via Ctrl-F6. I proceeded with the installation and the installer happily installed the OS onto the drives. Once completed, I switched back to the CLI console, edited the new /etc/fstab file and added a mount point for the raid drive I used, and rebooted. The system came up without any issues and ran normally.

Just thought I’d share this with the rest of you, should you run into the same situation. It took me a while to figure out how to make the system do what I wanted. Just be sure that your install media is on a drive that the installer will NOT need to write files to.