Data Reliance

As we become a more technologically evolved society, our reliance on data increases.  E-Mail, web access, electronic documents, bank accounts, you name it.  The loss of any one of these can have devastating consequences, from loss of productivity, to loss of home, health, or even, in extreme cases, life.

Unfortunately, I get to experience this first hand.  At the beginning of the week, there was a failure on the shared system I access at work.  Initially it seemed this was merely a permissions issue, we had just lost access to the files for a short time.  However, as time passed, we learned that the reality of the situation was much worse.

Like most companies, we rely heavily on shared drive access for collaboration and storage.  Of course, this means that the majority of our daily work exists on those shared drives, making them pretty important.  Someone noticed this at some point and decided that it was a really good idea to back them up on a regular basis.  Awesome, so we’re covered, right?  Well, yeah..  sort of, but not really.

Backups are a wonderful invention.  They ensure that you don’t lose any data in the event of a critical failure.  Or, at the very least, they minimize the amount of data you lose..  Backups don’t run on a constant basis, so there’s always some lag time in there…  But, regardless, they do keep fairly up-to-date records of what was on the drive.

To make matters even better, we have a procedure for backups which includes keeping them off-site.  Off-site storage ensures that we have backups in the event of something like a fire or a flood.  This usually means there’s a bit of time between a failure and a restore because someone has to go get those backups, but that’s ok, it’s all in the name of disaster recovery.

So here we are with a physical drive failure on our shared drive.  Well, that’s not so bad, you’d think, it’s a RAID array, right?  Well, no.  Apparently not.  Why don’t we use RAID arrays?  Not a clue, but it doesn’t much matter right now, all my work from that past year is inaccessible.  What am I supposed to do for today?

No big deal, I’ll work on some little projects that don’t need shared drive access, and they’ll fix the drive and restore our files.  Should only take a few hours, it’ll be finished by tomorrow.  Boy, was I wrong…

Tomorrow comes and goes, as does the next day, and the next.  Little details leak out as time goes on.  First we have a snafu with the wrong backup tapes being retrieved.  Easily fixed, they go get the correct ones.  Next, we receive reports of intermittent corruption of files, but it’s nothing to worry about, it’s only a few files here and there.  Of course, we still have no access to anything, so we can’t verify any of these reports.  Finally, they determine that the access permissions were corrupted and they need to fix them.  Once completed, we re-gain access to our files.

A full work week passes before we finally have drive access back.  Things should go back to normal now, we’ll just get on with our day-to-day business.  *click*  Hrm..  Can’t open the file, it’s corrupt.  Oh well, I’ll just have to re-write that one..  It’s ok though, the corruption was limited.  *click*  That’s interesting..  all the files in this directory are missing..  Maybe they forgot to restore that directory..  I’ll have to let them know…  *click*  Another corrupt file…  Man, my work is piling up…

Dozens of clicks later, the full reality hits me…  I have lost hundred of hours of work.  Poof, gone.  Maybe, just maybe, they can do something to restore it, but I don’t hold much hope…  How could something like this happen?  How could I just lose all of that work?  We had backups!  We stored them off-site!

So, let this be a lesson to you.  Backups are not the perfect solution.  I don’t know all the details, but I can guess what happened.  Tape backup is pretty reliable, I’ve used it myself for years.  I’ve since graduated to hard drive backup, but I still use tapes as a secondary backup solution.  There are problems with tape, though.  Tapes tend to stretch over time, ruining the tape and making them unreliable.  Granted, they do last a while, but it can be difficult to determine when a tape has gone bad.  Couple that with a lack of RAID on the server and you have a recipe for disaster.

In addition to all of this, I would be willing to bet that they did not test backups on a regular basis.  Random checks of data from backups is an integral part of the backup process.  Sure, it seems pointless now, but imagine how pointless it’ll be after hours of restoring files, you find that they’re all corrupt.  Random checks aren’t so bad when you think of it that way…

So I’ve lost a ton of data, and a ton of time.  Sometimes, life just sucks.  Moving forward, I’ll make my own personal backup of files I deem important, and I’ll check them on a regular basis too…

Backups? Where?

It’s been a bit hectic, sorry for the long time between posting.

 

So, backups.  Backups are important, we all know that.  So how many people actually follow their own advice and back their data up?  Yeah, it’s a sad situation for desktops.  The server world is a little different, though, with literally tens, possibly hundreds of different backup utilities available.

 

My preferred backup tool of choice is the Advanced Maryland Automatic Network Disk Archiver, or AMANDA for short.  AMANDA has been around since before 1997 and has evolved into a pretty decent backup system.  Initially intended for single tape-based backups, options have been added recently to allow for tape spanning and disk-based backups as well.

Getting started with AMANDA can be a bit of a chore.  The hardest part, at least for me, was getting the tape backup machine running.  Once that was out of the way, the rest of it was pretty easy.  The config can be a little overwhelming if you don’t understand the options, but there are a lot of guides on the Internet to explain it.  In fact, the “tutorial” I originally used is located here.

Once it’s up and running, you’ll receive a daily email from Amanda letting you know how the previous nights backup went.  All of the various AMANDA utilities are command-line based.  There is no official GUI at all.  Of course, this causes a lot of people to shy away from the system.  But overall, once you get the hang of it, it’s pretty easy to use.

Recovery from backup is a pretty simple process.  On the machine you’re recovering, run the amrecover program.  You then use regular filesystem commands to locate the files you want to restore and add them to the restore list.  When you’ve added all the files, issue the extract command and it will restore all of the files you’ve chosen.  It’s works quite well, I’ve had to use it once or twice…  Lemme tell ya, the first time I had to restore from backups I was sweatin bullets..  After the first one worked flawlessly, subsequent restores were completed with a much lower stress level.  It’s great to know that there are backups available in the case of an emergency.

AMANDA is a great tool for backing up servers, but what about clients?  There is a Windows client as well that runs using Cygwin, a free open-source Linux-like environment for Windows.  Instructions for setting something like this up are located in the AMANDA documentation.  I haven’t tried this, but it doesn’t look too hard.  Other client backup options include remote NFS and SAMBA shares.

Overall, AMANDA is a great backup tool that has saved me a few times.  I definitely recommend checking it out.