Headless Linux Testing Clients

As part of my day to day job, I’ve been working on a headless Linux client that can be transported from site to site to automate some network testing.  I can’t really go into detail on what’s being tested, or why, but I did want to write up a (hopefully) useful entry about headless client and some of the changes I made to the basic CentOS install to get everything to work.

First up was the issue of headless operation.  We’re using Cappuccino SlimPRO SP-625 units with the triple Gigabit Ethernet option.  They’re not bad little machines, though I do have a gripe with the back cover on them.  It doesn’t properly cover all of the ports on the back, leaving rather large holes where dust and dirt can get in.  What’s worse is that the power plug is not surrounded and held in place by the case, so I can foresee the board cracking at some point from the stress of the power cord…  But, for a sub-$800 machine, it’s not all that bad.

Anyway, on to the fun.  These machines will be transported to various locations where testing is to be performed.  On-site, there will be no keyboard, no mouse, and no monitor for them.  However, sometimes things go wrong and subtle adjustments may need to be made.  This means we need a way to get into the machine, locally, just in case there’s a problem with the network connection.  Luckily, there’s a pretty simple means of accessing a headless Linux machine without the need to lug around an extra monitor, keyboard, and mouse.  If you’ve ever worked on a switch or router, you’ll know where I’m going with this.

Most technician have access to a laptop, especially if they have to configure routers or switches.  Why not access a Linux box the same way?  Using the agetty command, you can.  A getty is a program that manages terminals within Unix.  Those terminals can be physical, like the local keyboard, or virtual, like a telnet or ssh session.  The agetty program is an alternative getty program that has some non-standard features such as baud rate detection, adaptive tty, and more.  In short, it’s perfect for direct serial, or even dial-in connections.

Setting this all up is a snap, too.  By default, CentOS (and most Linux distros) set up six gettys for virtual terminals.  These virtual terminals use yet another getty, mingetty, which is a minimalized getty program with only enough features for virtual terminals.  In order to provide serial access, we need to add a few lines to enable agettys on the serial ports.

But wait, what serial ports do we have?  Well, assuming they are enabled in the BIOS, we can see them using the dmesg and setserial commands.  The dmesg command prints out the current kernel message buffer to the screen.  This is usually the output from the boot sequence, but if your system has been up a while, it may contain more recent messages.  We can use dmesg to determine the serial interfaces like this :

[friz@test ~]$ dmesg | grep serial
serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A

As you can see from the output above, we have both a ttyS0 and a ttyS1 port available on this particular machine.  Now, we use setserial to make sure the system recognizes the ports:

[friz@test ~]$ sudo setserial -g /dev/ttyS[0-1]
/dev/ttyS0, UART: 16550A, Port: 0x03f8, IRQ: 4
/dev/ttyS1, UART: 16550A, Port: 0x02f8, IRQ: 3

The output is similar to dmesg, but setserial actually polls the port to get the necessary information, thereby ensuring that it’s active.  Also note, you will likely need to run this command as root to make it work.

Now that we know what serial ports we have, we just need to add them to the inittab and reload the init daemon.  Adding these to the inittab is pretty simple.  Your inittab will look something like this:

#
# inittab       This file describes how the INIT process should set up
#               the system in a certain run-level.
#
# Author:       Miquel van Smoorenburg, <miquels@drinkel.nl.mugnet.org>
#               Modified for RHS Linux by Marc Ewing and Donnie Barnes
#

# Default runlevel. The runlevels used by RHS are:
#   0 – halt (Do NOT set initdefault to this)
#   1 – Single user mode
#   2 – Multiuser, without NFS (The same as 3, if you do not have networking)
#   3 – Full multiuser mode
#   4 – unused
#   5 – X11
#   6 – reboot (Do NOT set initdefault to this)
#
id:3:initdefault:

# System initialization.
si::sysinit:/etc/rc.d/rc.sysinit

l0:0:wait:/etc/rc.d/rc 0
l1:1:wait:/etc/rc.d/rc 1
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6

# Trap CTRL-ALT-DELETE
ca::ctrlaltdel:/sbin/shutdown -t3 -r now

# When our UPS tells us power has failed, assume we have a few minutes
# of power left.  Schedule a shutdown for 2 minutes from now.
# This does, of course, assume you have powerd installed and your
# UPS connected and working correctly.
pf::powerfail:/sbin/shutdown -f -h +2 “Power Failure; System Shutting Down”

# If power was restored before the shutdown kicked in, cancel it.
pr:12345:powerokwait:/sbin/shutdown -c “Power Restored; Shutdown Cancelled”

# Run gettys in standard runlevels
1:2345:respawn:/sbin/mingetty tty1
2:2345:respawn:/sbin/mingetty tty2
3:2345:respawn:/sbin/mingetty tty3
4:2345:respawn:/sbin/mingetty tty4
5:2345:respawn:/sbin/mingetty tty5
6:2345:respawn:/sbin/mingetty tty6

# Run xdm in runlevel 5
x:5:respawn:/etc/X11/prefdm -nodaemon

Just add the following after the original gettys lines:

# Run agettys in standard runlevels
s0:2345:respawn:/sbin/agetty -L -f /etc/issue.serial 9600 ttyS0 vt100
s1:2345:respawn:/sbin/agetty -L -f /etc/issue.serial 9600 ttyS1 vt100

Let me explain quickly what the above means.  Each line is broken into multiple fields, separated by a colon.  At the very beginning of the line is an identifier, s0 and s1 in our case.  Next comes a list of the runlevels for which this program should be spawned.  Finally, the command to run is last.

The agetty command takes a number of arguments:

    • The -L switch disables carrier detect for the getty.
    • The next switch, -f, tells agetty to display the contents of a file before the login prompt, /etc/issue.serial in our case.
    • Next is the baud rate to use.  9600 bps is a good default value to use.  You can specify speeds up to 152,200 bps, but they may not work with all terminal programs.
    • Next up is the serial port, ttyS0 and ttyS1 in our example.
    • Finally, the terminal emulation to use.  VT100 is probably the most common, but you can use others.

Now that you’ve added the necessary lines, reload the init daemon via this command:

[friz@test ~]$ sudo /sbin/init q

At this point, you should be able to connect a serial cable to your Linux machine and access it via a program such as minicom, PuTTY, or hyperterminal.  And that’s all there is to it.

You can also redirect the kernel to output all console messages to the serial port as well.  This is accomplished by adding a switch to the kernel line in your /etc/grub.conf file like this:

# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/hdc3
#          initrd /initrd-version.img
#boot=/dev/hdc
default=0
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title CentOS (2.6.18-53.1.21.el5)
root (hd0,0)
kernel /vmlinuz-2.6.18-53.1.21.el5 ro root=LABEL=/ console=ttyS0,9600
initrd /initrd-2.6.18-53.1.21.el5.img

The necessary change is highlighted above.  The console switch tells the kernel that you want to re-direct the console output.  The first option is the serial port to re-direct to, and the second is the baud rate to use.

And now you have a headless Linux system!  These come in handy when you need a Linux machine for remote access, but you don’t want to deal with having a mouse, keyboard, and monitor handy to access the machine locally.

Instant Kernel-ification

 

Server downtime is the scourge of all administrators, sometimes to the extent of bypassing necessary security upgrades, all in the name of keeping machines online.  Thanks to an MIT graduate student, Jeffery Brian Arnold, keeping a machine online, and up to date with security patches, may be easier than ever.

Ksplice, as the project is called, is a small executable that allows an administrator the ability to patch security holes in the Linux kernel, without rebooting the system.  According to the Ksplice website :

“Ksplice allows system administrators to apply security patches to the Linux kernel without having to reboot. Ksplice takes as input a source code change in unified diff format and the kernel source code to be patched, and it applies the patch to the corresponding running kernel. The running kernel does not need to have been prepared in advance in any way.”

Of course, Ksplice is not a perfect silver bullet, some patches cannot be applied using Ksplice.  Specifically, any patch that require “semantic changes to data structures” cannot be applied to the running kernel.  A semantic change is a change “that would require existing instances of kernel data structures to be transformed.”

But that doesn’t mean that Ksplice isn’t useful.  Jeffery looked at 32 months of kernel security patches and found that 84% of them could be applied using Ksplice.  That’s sure to increase the uptime.

I have to wonder, though, what is so important that you need that much uptime.  Sure, it’s nice to have the system run all the time, but if you have something that is absolutely mission critical, that must run 24×7, regardless, won’t you have a backup or two?  Besides which, you generally want to test patches before applying them to such sensitive systems.

There are, of course, other uses for this technology.  As noted on the Ksplice website, you can also use Ksplice to “add debugging code to the kernel or to make any other code changes that do not modify data structure semantics.”  Jeffery has posted a paper detailing how the technology works.

Pretty neat technology.  I wonder if this will lead to zero downtime kernel updates direct from Linux vendors.  As it is now, you’ll need to locate and manually apply kernel patches using this tool.

 

Backups? Where?

It’s been a bit hectic, sorry for the long time between posting.

 

So, backups.  Backups are important, we all know that.  So how many people actually follow their own advice and back their data up?  Yeah, it’s a sad situation for desktops.  The server world is a little different, though, with literally tens, possibly hundreds of different backup utilities available.

 

My preferred backup tool of choice is the Advanced Maryland Automatic Network Disk Archiver, or AMANDA for short.  AMANDA has been around since before 1997 and has evolved into a pretty decent backup system.  Initially intended for single tape-based backups, options have been added recently to allow for tape spanning and disk-based backups as well.

Getting started with AMANDA can be a bit of a chore.  The hardest part, at least for me, was getting the tape backup machine running.  Once that was out of the way, the rest of it was pretty easy.  The config can be a little overwhelming if you don’t understand the options, but there are a lot of guides on the Internet to explain it.  In fact, the “tutorial” I originally used is located here.

Once it’s up and running, you’ll receive a daily email from Amanda letting you know how the previous nights backup went.  All of the various AMANDA utilities are command-line based.  There is no official GUI at all.  Of course, this causes a lot of people to shy away from the system.  But overall, once you get the hang of it, it’s pretty easy to use.

Recovery from backup is a pretty simple process.  On the machine you’re recovering, run the amrecover program.  You then use regular filesystem commands to locate the files you want to restore and add them to the restore list.  When you’ve added all the files, issue the extract command and it will restore all of the files you’ve chosen.  It’s works quite well, I’ve had to use it once or twice…  Lemme tell ya, the first time I had to restore from backups I was sweatin bullets..  After the first one worked flawlessly, subsequent restores were completed with a much lower stress level.  It’s great to know that there are backups available in the case of an emergency.

AMANDA is a great tool for backing up servers, but what about clients?  There is a Windows client as well that runs using Cygwin, a free open-source Linux-like environment for Windows.  Instructions for setting something like this up are located in the AMANDA documentation.  I haven’t tried this, but it doesn’t look too hard.  Other client backup options include remote NFS and SAMBA shares.

Overall, AMANDA is a great backup tool that has saved me a few times.  I definitely recommend checking it out.

Linux Software Raid

I had to replace a bad hard drive in a Linux box recently and I thought perhaps I’d detail the procedure I used.  This particular box uses software raid, so there are a few extra steps to getting the drive up and running.

Normally when a hard drive fails, you lose any data on it.  This is, of course, why we back things up.  In my case, I have two drives in a raid level 1 configuration.  There are a number of raid levels that dictate various states of redundancy (or lack thereof in the instance of level 0).  The raid levels are as follows (Copied from Wikipedia):

  • RAID 0: Striped Set
  • RAID 1: Mirrored Set
  • RAID 3/4: Striped with Dedicated Parity
  • RAID 5: Striped Set with Distributed Parity
  • RAID 6: Striped Set with Dual Distributed Parity

There are additional raid levels for nested raid as well as some non-standard raid levels.  For more information on those, see the Wikipedia article referenced above.

 

The hard drive in my case failed in kind of a weird way.  Only one of the partitions on the drive was malfunctioning.  Upon booting the server, however, the bios complained about the drive being bad.  So, better safe than sorry, I replaced the drive.

Raid level 1 is a mirrored raid.  As with most raid levels, the hard drives being raided should be identical.  It is possible to use different models and sizes in the same raid, but there are drawbacks such as a reduction in speed, possible increased failure rates, wasted space, etc.  Replacing a drive in a mirrored raid is pretty straightforward.  After identifying the problem drive, I physically removed the faulty drive and replaced it with a new one.

The secondary drive was the failed drive, so this replacement was pretty easy.  In the case of a primary drive failure, it’s easiest to move the secondary drive into the primary slot before replacing the failed drive.

Once the new drive has been installed, boot the system up and it should load up your favorite Linux distro.  The system should boot normally with a few errors regarding the degraded raid state.

After the system has booted, login to the system and use fdisk to partition the new drive.  Make sure you set the drive IDs back to Linux raid.  When finished, the partition table will look something like this :

   Device Boot      Start         End      Blocks   Id  System
/dev/hdb1   *           1          26      208813+  fd  Linux raid autodetect
/dev/hdb2              27        3850    30716280   fd  Linux raid autodetect
/dev/hdb3            3851        5125    10241437+  fd  Linux raid autodetect
/dev/hdb4            5126       19457   115121790    f  W95 Ext'd (LBA)
/dev/hdb5            5126        6400    10241406   fd  Linux raid autodetect
/dev/hdb6            6401        7037     5116671   fd  Linux raid autodetect
/dev/hdb7            7038        7164     1020096   82  Linux swap
/dev/hdb8            7165       19457    98743491   fd  Linux raid autodetect

Once the partitions have been set up, you need to format the drive with a filesystem.  This is a pretty painless process depending on your filesystem of choice.  I happen to be using ext3 as my filesystem, so I use the mke2fs program to format the drive.  To format an ext3 partition use the following command (This command, as well as the commands that follow, need to be run as root, so be sure to use sudo.) :

mke2fs -j /dev/hdb1

Once all of the drives have been formatted you can move on to creating the swap partition.  This is done using the mkswap program as follows :

mkswap /dev/hdb7

Once the swap drive has been formatted, activate it so the system can use it.  The swapon command achieves this goal :

swapon -a /dev/hdb7

And finally you can add the drives to the raid using mdadm.  mdadm is a single command with a plethora of uses.  It builds, monitors, and alters raid arrays.  To add a drive to the array use the following :

mdadm -a /dev/md1 /dev/hdb1

And that’s all there is to it.  If you’d like to watch the array rebuild itself, about as much fun as watching paint dry, you can do the following :

watch cat /proc/mdstat

And that’s all there is to it.  Software raid has come a long way and it’s quite stable these days.  I’ve been happily running it on my Linux machines for several years now.  It works well when hardware raid is not available or as a cheaper solution.  I’m quite happy with the performance and reliability of software raid and I definitely recommend it.

Linux Upgrades – Installation from a software raid

I recently had to upgrade a few machines with a newer version of Linux. Unfortunately, the CD-ROM drives in these machines were not functioning, so I decided to upgrade the system via a hard drive install. Hard drive installs are pretty simple and are part of the standard distro for Redhat. However, the machines I had to upgrade were all set up with software raid 1.

My initial thought was to merely put in the raid location where the install media was located. However, the installation program does not allow this and actually presents a list of acceptable locations.

So from here, I decided to choose one of the 2 raid drives and use that instead. The system graciously accepted this and launched the GUI installer to complete the process. All went smoothly until I reached the final step in the wizard, the actual install. At this point the installer crashed with a Python error. Upon inspection of the error, it appeared that the drive I was trying to use for the installation media was not available. Closer inspection revealed the truth, mdadm started up and activated all of the raid partitions on the system, making the partition I needed unavailable.

So what do I do now? I re-ran the installation and deleted the raid partition, being careful to leave the physical partitions in-tact. Again, the installer crashed at the same step. It seems that the installer scanned the entire hard drive for raid partitions whether they had valid mount points or not.

I finally solved the problem through a crude hack during the setup phase of the installation. I made sure to delete the raid partition once again, leaving the physical drives in-tact. I stepped through the entire process but stopped just before the final install step. At this point, I switched to the CLI console via Ctrl-Shift-F2. I created a small bash script that looked something like this :

#!/bin/bash

mdadm –stop /dev/md3

exec /bin/bash ./myscript.bash

I ran this script and switched back to the installer via Ctrl-F6. I proceeded with the installation and the installer happily installed the OS onto the drives. Once completed, I switched back to the CLI console, edited the new /etc/fstab file and added a mount point for the raid drive I used, and rebooted. The system came up without any issues and ran normally.

Just thought I’d share this with the rest of you, should you run into the same situation. It took me a while to figure out how to make the system do what I wanted. Just be sure that your install media is on a drive that the installer will NOT need to write files to.