Journey of a DevOps Engineer

“The advantage of a bad memory is that one enjoys several times the same good things for the first time.”

Friedrich Nietzsche

This post first appeared on Redhat’s Enable Sysadmin community. You can find the post here.

Memory is a funny construct. What we remember is often not what happened, or even the order it happened in. One of the earliest memories I have about technology was a trip my father and I took. We drove to Manhattan, on a mission to buy a Christmas gift for my mother. This crazy new device, The Itty Bitty Book Light, had come out recently and, being an avid reader, my mother had to have one.

We drove to Manhattan, on a mission to buy a Christmas gift for my mother. This crazy new device, The Itty Bitty Book Light, had come out recently and, being an avid reader, my mother had to have one.

After navigating the streets of Manhattan and finding a parking spot, we walked down the block to what turned out to be a large bookstore. You’ve seen bookstores like this on TV and in the movies. It looks small from the outside, but once you walk in, the store is endless. Walls of books, sliding ladders, tables with books piled high. It was pretty incredible, especially for someone like me who loves reading.

But in this particular store, there was something curious going on. One of the tables was surrounded by adults, awed and whispering between one another. Unsure of what was going on, we approached. After pushing through the crowd, what I saw drew me in immediately. On the table, surrounded by books, was a small grey box, the Apple Macintosh. It was on, but no one dared approach it. No one, that is, except me. I was drawn like a magnet, immediately grokking that the small puck like device moved the pointer on the screen. Adults gasped and murmured, but I ignored them all and delved into the unknown. The year was, I believe, 1984.

Somewhere around the same time, though likely a couple years before, my father brought home a TI-99/4A computer. From what I remember, the TI had just been released, so this has to be somewhere around 1982. This machine served as the catalyst for my love of computer technology and was one of the first machines I ever cut code on.

My father tells a story about when I first started programming. He had been working on an inventory database, written from scratch, that he built for his job. I would spend hours looking over his shoulder, absorbing everything I saw. One time, he finished coding, saved the code, and started typing the command to run his code (“RUN”). Accordingly to him, I stopped him with a comment that his code was going to fail. Ignoring me as I was only 5 or 6 at the time (according to his recollection), he ran the code and, as predicted, it failed. He looked at me with awe and I merely looked back and replied “GOSUB but no RETURN”.

At this point I was off and running. Over the years I got my hands on a few other systems including the Timex Sinclair, Commodore 64 and 128, TRS-80, IBM PS/2, and finally, my very own custom built PC from Gateway 2000. Since them I’ve built hundreds of machines and spend most of my time on laptops now.

The Commodore 64 helped introduce me to the online world of Bulletin Board Systems. I spent many hours, much to the chagrin of my father and the phone bill, calling into various BBS systems in many different states. Initially I was just there to play the various door games that were available, but eventually discovered the troves of software available to download. Trading software become a big draw until I eventually stumbled upon the Usenet groups that some boards had available. I spent a lot of time reading, replying, and even ended up in more than one flame war.

My first introduction to Unix based operating systems was in college when I encountered an IBM AIX mainframe as well as a VAX. I also had access to a shell account at the local telephone company turned internet service provider. The ISP I was using helpfully sent out an email to all subscribers about the S.A.T.A.N. toolkit and how accounts found with the software in their home directories would be immediately banned. Being curious, I immediately responded looking for more information. And while my account was never banned, I did learn a lot.

At the same time, my father and I started our own BBS which grew into a local Internet Service Provider offering dial-up services. That company still exists today, though the dial-up service died off several years ago. This introduced me to networks and all the engineering that comes along with it.

From here my journey takes a bit of a turn. Since I was a kid, I’ve wanted to build video games. I’ve read a lot of books on the subject, talked to various developers (including my idol, John Carmack) and written a lot of code. I actually wrote a pacman clone, of sorts, on the Commodore 64 when I was younger. But, alas, I think I was born on the wrong coast. All of the game companies, at the time, were located on the west coast and I couldn’t find a way to get there. So, I redirected my efforts a bit and focused on the technology I could get my hands on.

Running a BBS, and later an ISP, was a great introduction into the world of networking. After working a few standard school-age jobs (fast food, restaurants, etc), I finally found a job doing tech support for an ISP. I paid attention, asked questions, and learned everything I could. I took initiative where I could, even writing a cli-based ticketing system with an mSQL database backing it.

This initiative paid of as I was moved later to the NOC and finally to Network Engineering. I lead the way, learning everything I could about ATM and helping to design and build the standard ATM-based node design used by the company for over a decade. I took over development of the in-house monitoring system, eventually rewriting it in KSH and, later, Perl. I also had a hand in development of a more modern ticketing system written in Perl that is still in use today.

I spent 20+ years as a network engineer, taking time along the way to ensure that we had Linux systems available for the various scripting and monitoring needed to ensure the network performed as it should. I’ve spent many hours writing code in shell, expect, perl, and other languages to automate updates and monitoring. Eventually, I found myself looking for a new role and a host of skills ranging from network and systems administration to coding and security.

About this time, DevOps was quickly becoming the next new thing. Initially I rejected the idea, solely responding to the “Development” and “Operations” tags that make up the name. Over time, however, I came to realize that DevOps is yet another fancy name for what I’ve been doing for decades, albeit with a few twists here and there. And so, I took a role as a DevOps Engineer.

DevOps is a fun discipline, mixing in technologies from across the spectrum in an effort to automate away everything you can. Let the machine do the work for you and spent your time on more interesting projects like building more automation. And with the advent of containers and orchestration, there’s more to do than ever.

These days I find myself leading a team of DevOps engineers, both guiding the path we take as we implement new technology and automate existing processes. I also spend a lot of time learning about new technologies both on my own and for my day job. My trajectory seems to be changing slightly as I head towards the world of the SRE. Sort of like DevOps, but a bit heavier on the development side of things.

Life throws curves and sometimes you’re not able to move in the direction you want. But if you keep at it, there’s something out there for everyone. I still love learning and playing with technology. And who knows, maybe I’ll end up writing games at some point. There’s plenty of time for another new hobby.

Does it do the thing?

Back in 2012 I gave a talk at Derbycon 2.0. This was my first infosec talk and I was a little nervous, to say the least. Anyway, I described a system I wanted to write that handled distributed baseline scanning.

After a lot of starts and stops, I finished a basic 1.0 version in 2014. It’s still quite rough and I’ve since been working, intermittently, on making the system more robust and solid. I’ve been working on a python replacement for the GUI as well, instead of the current PHP one. The repository is located here, if you’re interested in taking a look.

Why am I telling you all of this? Well, as part of the updates I’m making, I wanted to do things the “right way” and make sure I have unit testing in place before I start making additional changes to the code. Problem is, while I learned about unit testing, I’ve never really implemented it in any meaningful way, so this is a bit new to me.

So why unit testing? Well, the hypothesis is that by creating tests that check every line of code, you ensure that the code is working as expected. Thus, if the tests pass, then the code should be solid and bug free. In reality, this is rarely the case. Tests can be just as flawed as any other code. Additionally, you may miss testing certain corner cases and miss potential bugs. In the end, the general consensus is that unit testing is a complicated religious argument.

Let’s assume that we want to unit test anyway and move on to the actual testing bits, shall we? We’ll start with a contrived example to make things easier. Assume we have the following code in a file called


def add(value1, value2):
    return value1 + value2

Simple enough, just a simple function to return the value of two numbers added together. Let’s create some test cases, shall we?


from mytestcode import add

class TestAdd(object):
    def test_add(self):
        assert add(1,1) == 2

    def test_add_fail(self):
        assert add(1,1) != 3

What we have here are two simple test cases. First, we test to make sure that if we call the add function with two values, 1 and 1, we get a 2 as a return value. Second, we test that providing the same values as input does not return a 3. Simple, right? But have we really tested all of the corner cases? What happens if we feed the function a negative? How about a non-numeric value? Are there cases where we can cause an exception?

To be fair, the original function is poorly written and is merely being used as a simple example. This is the problem with contrived examples, of course. They miss important details, often simply things too much, and can lead to beginners making big mistakes when using them as teaching tools. So please, be aware, the above code really isn’t very good code. It’s intended to be simple to understand.

Let’s take a look at some “real” code directly from my distributed scanner project. This particular code is something I found on Stack Overflow when I was looking for a way to identify whether a process was still running or not.


import errno
import os
import sys

def pid_exists(pid):
    """Check whether pid exists in the current process table.
    UNIX only.
    if pid < 0:
        return False
    if pid == 0:
        # According to "man 2 kill" PID 0 refers to every process
        # in the process group of the calling process.
        # On certain systems 0 is a valid PID but we have no way
        # to know that in a portable fashion.
        raise ValueError('invalid PID 0')
        os.kill(pid, 0)
    except OSError as err:
        if err.errno == errno.ESRCH:
            # ESRCH == No such process
            return False
        elif err.errno == errno.EPERM:
            # EPERM clearly means there's a process to deny access to
            return True
            # According to "man 2 kill" possible error values are
            # (EINVAL, EPERM, ESRCH)
        return True

Testing this code should be relatively straightforward, with the exception of the os.kill call. For that, we’ll need to delve into mock objects. Let’s tackle the simple cases first:

#!/usr/bin/env python

import pytest

from libs.funcs import pid_exists

class TestFuncs(object):
    def test_pid_negative(self):
        assert pid_exists(-1) == False

    def test_pid_zero(self):
        with pytest.raises(ValueError) as e_info:

    def test_pid_typeerror(self):
        with pytest.raises(TypeError):
        with pytest.raises(TypeError):
        with pytest.raises(TypeError):

That’s relatively simple. We verify that False is returned for a negative PID and a ValueError is returned for a PID of zero. We also test that a TypeError is returned if we don’t provide an integer value. What’s left is handling a valid PID and testing that it returns True for a running process and False otherwise. In order to test the rest, we could go through a lot of elaborate setup to start a process, get the PID, and then test our code, but there’s a lot that can go wrong there. Additionally, we’re looking to test our logic and not the entirety of another module. So, what we really want is a way to provide an arbitrary return value for a given call. Enter the mock module.

The mock module is part of the unittest framework in python. Essentially, the mock module allows you to identify a call or an object that you want to create a fake version of, and then provide the behavior you’re expecting that mocked version to have. So, for instance, you can mock a function call and simply provide the return value you’re looking for instead of having to call the function directly. This functionality allows you to precisely test your logic versus doing a deeper integration test.

To finish up our testing code for the pid_exists() function, we want to mock the os.kill() function and have it return specific values so we can check the various branches of code we have.

    def test_pid_exists(self, oskillobj):
        oskillobj.return_value = None
        assert pid_exists(100) == True

    def test_pid_does_not_exist(self, oskillobj):
        oskillobj.side_effect = OSError(errno.ESRCH, 'No such process')
        assert pid_exists(1234) == False

    def test_pid_no_permissions(self, oskillobj):
        oskillobj.side_effect = OSError(errno.EPERM, 'Operation not permitted')
        assert pid_exists(1234) == True

    def test_pid_invalid(self, oskillobj):
        oskillobj.side_effect = OSError(errno.EINVAL, 'Invalid argument')
        with pytest.raises(OSError):

    def test_pid_os_typeerror(self, oskillobj):
        oskillobj.side_effect = TypeError('an integer is required (got type str)')
        with pytest.raises(TypeError):

The above code tests all of the branching available in the rest of the code, verifying the logic we’ve written. The code should be pretty straightforward. The return_value attribute of a mock object directly defines what we want the mocked function to recall while the side_effect attribute allows us to throw an exception in response to the function call. With those two features of a mocked object, we’re able to successfully test the rest of the cases we need.

This little journey to learn how to write unit tests has been fun and informative. I just need to finish up the rest of the code, striving to hit as close to 100% coverage as I can while keeping the test cases reasonable. It’s taken a while to get going, but the more code I’ve been writing, the faster and more accurate I’m getting. As they say, “practice makes perfect,” though I’d settle with functionally complete and relatively bug-free.

One final word of caution. I’m a sole developer working on this code, so I’m the only one around to write test cases. In a larger shop, the originator of the logic should not be the one writing the test cases. The reason for this is that the original coder typically knows their code quite well and has expectations regarding how the code will be used. For instance, I’m expecting that anyone calling the add() function I wrote above to only supply numbers and I haven’t added any sort of type checking or input validation. As a result, I avoided adding test cases that supply invalid inputs, knowing that would fail. Someone else writing the test cases would likely have provided a number of different inputs and found that input validation was missing. So if you’re in a larger shop, do yourself a favor and have someone else write your test cases. And to ensure they provide robust test cases, only provide the function prototypes and not the full function definitions.

Fixing the Serendipity XMLRPC plugin

A while ago I purchased a copy of BlogPress for my iDevices.. It’s pretty full-featured, and seems to work pretty well. Problem was, I couldn’t get it to work with my Serendipity-based blog. Oh well, a wasted purchase.

But not so fast! Every once in a while I go back and search for a possible solution. This past week I finally hit paydirt. I came across this post on the s9y forums.

This explained why BlogPress was crashing when I used it. In short, it was expecting to see a categoryName tag in the resulting XML from the Serendipity XMLRPC plugin. Serendipity, however, used description instead, likely because Serendipity has better support for the MetaWeblog API.

Fortunately, fixing this problem is very straightforward. All you really need to do is implement both APIs and return all of the necessary data for both APIs at the same time. To fix this particular problem, it’s a single line addition to the file located in $S9YHOME/plugins/serendipity_event_xmlrpc. That addition is as follows :

if ($cat['categoryid']) $xml_entries_vals[] = new XML_RPC_Value(
      'description'   => new XML_RPC_Value($cat['category_name'], 'string'),
      // XenoPhage: Add 'categoryName' to support mobile publishing (Thanks PigsLipstick)
      'categoryName'  => new XML_RPC_Value($cat['category_name'], 'string'),
      'htmlUrl'       => new XML_RPC_Value(serendipity_categoryURL($cat, 'serendipityHTTPPath'), 'string'),
      'rssUrl'        => new XML_RPC_Value(serendipity_feedCategoryURL($cat, 'serendipityHTTPPath'), 'string')

And poof, you now have the proper category support for Movable Type.

SQL Query Conundrum…

I have a brain teaser for ya.. I’m looking for a way to solve a SQL problem efficiently, specifically using MySQL. The goal is to get a count of the number of unique rows returned for a complex query. It’s actually for a pagination system so I can determine the limits necessary to efficiently query the database for the right amount of data rather than return everything and try to brute force it.

Let’s say I have three tables as follows :

mysql> describe person;
| Field | Type             | Null | Key | Default | Extra |
| id    | int(10) unsigned | NO   | PRI | NULL    |       |
| first | char(15)         | YES  |     | NULL    |       |
| last  | char(15)         | YES  |     | NULL    |       |
3 rows in set (0.02 sec)

mysql> describe interests;
| Field    | Type             | Null | Key | Default | Extra |
| id       | int(10) unsigned | NO   | PRI | NULL    |       |
| interest | char(15)         | YES  |     | NULL    |       |
2 rows in set (0.00 sec)

mysql> describe interest_link;
| Field       | Type             | Null | Key | Default | Extra |
| person_id   | int(10) unsigned | NO   |     | NULL    |       |
| interest_id | int(10) unsigned | NO   |     | NULL    |       |
2 rows in set (0.00 sec)

Simple enough. I’m mapping interests to people. I’ve entered data into these tables as follows :

mysql> select * from person;
| id | first | last     |
|  1 | John  | Doe      |
|  2 | Bob   | Jones    |
|  3 | Joe   | Smith    |
3 rows in set (0.00 sec)

mysql> select * from interests;
| id | interest  |
|  1 | Computers |
|  2 | Music     |
|  3 | Food      |
|  4 | Beer      |
|  5 | Gaming    |
5 rows in set (0.00 sec)

mysql> select * from interest_link;
| person_id | interest_id |
|         1 |           1 |
|         1 |           2 |
|         1 |           4 |
|         2 |           1 |
|         2 |           5 |
|         2 |           4 |
|         3 |           3 |
|         3 |           2 |
|         3 |           4 |
9 rows in set (0.00 sec)

So far, so good. Now, I want to do a search to find which users are interested in music. Simple enough search, I’d do this with a simple select statement as follows :

mysql> select * from person as p left join interest_link as il on il.person_id = where interest_id = 2;
| id | first | last   | person_id | interest_id |
|  1 | John  | Doe    |         1 |           2 |
|  3 | Joe   | Smith  |         3 |           2 |
2 rows in set (0.00 sec)

But what if I want to find out who’s interested in music *and* beer?

mysql> select * from person as p left join interest_link as il on il.person_id = where interest_id in (2,4);
| id | first | last     | person_id | interest_id |
|  1 | John  | Doe      |         1 |           2 |
|  1 | John  | Doe      |         1 |           4 |
|  2 | Bob   | Jones    |         2 |           4 |
|  3 | Joe   | Smith    |         3 |           2 |
|  3 | Joe   | Smith    |         3 |           4 |
5 rows in set (0.00 sec)

That’s a problem, now I have 5 rows.. How do I make this a unique list? Well, I’m merely interested in names and ids, so I can do this :

mysql> select, p.first, p.last from person as p left join interest_link as il on il.person_id = where interest_id in (2,4);
| id | first | last     |
|  1 | John  | Doe      |
|  1 | John  | Doe      |
|  2 | Bob   | Jones    |
|  3 | Joe   | Smith    |
|  3 | Joe   | Smith    |
5 rows in set (0.00 sec)

but that’s still 5 rows.. so what now?

mysql> select distinct, p.first, p.last from person as p left join interest_link as il on il.person_id = where interest_id in (2,4);
| id | first | last     |
|  1 | John  | Doe      |
|  2 | Bob   | Jones    |
|  3 | Joe   | Smith    |
3 rows in set (0.00 sec)

Aha! perfect. That’s what I need.. almost. For this particular application, I want to paginate, so I need a total number of matching rows so I can properly identify the limits as well as the upper bound on page numbers. So, I’ll just replace the specific field names with a count(*) :

mysql> select distinct count(*) from person as p left join interest_link as il on il.person_id = where interest_id in (2,4);
| count(*) |
|        5 |
1 row in set (0.00 sec)

And here is where I’m stuck. I need the total count of DISTINCT names, not the total number of rows returned. I tried a GROUP BY, but that didn’t help much :

mysql> select count(*) from person as p left join interest_link as il on il.person_id = where interest_id in (2,4) group by;
| count(*) |
|        2 |
|        1 |
|        2 |
3 rows in set (0.00 sec)

Sure, I get 3 rows, but what I’m looking for here is a single row with the total number of items. … So, what if I count the number of returned rows! :

mysql> select count(*) from (select count(*) from person as p left join interest_link as il on il.person_id = where interest_id in (2,4) group by as foo;
| count(*) |
|        3 |
1 row in set (0.00 sec)

BUT… at what cost? This seems like a rather complex query that might break down, significantly, when there’s a lot of data.. And the examples above are rather simplistic. In reality, we’re talking about more fields and more tables, so the simpler query gets a little complex to begin with. I’m open to ideas on how to do this properly via SQL. Yes, I am aware of indexing and how that speeds things up. I use indexing, I just eliminated it from the above example to simplify things. I’m open to ideas on how to do this properly via SQL.

I can simply return all the rows with the distinct clause, count them programmatically, and then proceed with the rest of the program, but depending on the selections made by the user, there could be a significant amount of data returned. I’m worried about both memory exhaustion on the part of the scripting language, as well as the processing and transmission time required to pass all of that data back to the program from the SQL database. Besides, this is the sort of problem that SQL was designed to solve.

I don’t think this is a unique problem, so someone out there has a solution. Perhaps the subselect *is* the better solution, but I don’t think so. I’m open to ideas. You can leave a comment here, or hit me up on twitter.


phpTodo … In Fedora!!

Apparently I’m always the last to know… But.. I found out today that phpTodo, the todo list manager I wrote (and continue to write) has been included in Fedora. In fact, it seems it’s been in there since Fedora 7. It’s not in the main distribution, nor should it be, but apparently it’s a maintained package. Thanks, Marc!

Honestly, I’m truly honored. I wrote this on a whim and it has served me well. I use it every day! And since writing it, I’ve had a handful of people make suggestions and offer patches. I think it’s been pretty successful for a small project.

So, how about an update? Well, I’ve been working on phpTodo in my spare time, which, unfortunately, has been relatively lacking as of late. I have been able to add in a number of fixes and new features, however. The biggest change in the next release will be the removal of categories in favor of having tags. In using phpTodo over the years, I’ve noticed a number of times where I’d like to be able to put an item in multiple categories, and display multiple categories at once. While this may have been possible with categories as they were implemented, I think tags works a bit better. I’ve borrowed an idea from the Serendipity blogging platform to implement tags in a user-friendly manner, so I think the implementation works pretty well. I still have some more work to tie it all together, but it is coming along.

Another change is the addition of the Prototype and Scriptaculous javascript frameworks. There are a few reasons I decided to go this route. First and foremost, it significantly reduces the amount of work necessary to perform cross-platform javascript operations. To date, I’ve used relatively simple javascript functions, mostly for front-line input validation, but with the addition of tags, I wanted to move into some more advanced techniques. I’m striving to keep it simple and not overdo it, so don’t worry.

And, of course, there are the various bug fixes that need to be added. Overall, I’m excited about the next release of phpTodo. I don’t have a timetable as of now, but I’m hopeful that my free time will increase shortly, giving me more time to work on it. If so, then I’m optimistic about a new release sometime in the next 3-4 months. We’ll see what happens.

If you’re using phpTodo, I’d like to hear from you. I’m interested in what you like and what you dislike about the program, the interface, the workflow, etc. What features would you like to see? What features would you hate to see?



CVS to Subversion…

I’ve been using CVS for a number of years and it has served me well. I have looked at Subversion a number of times, but never really had the time to deal with it. That has changed somewhat and I have had the chance to use SVN a bit more recently. SVN feels a bit more elegant, and, in most cases, faster than CVS. But, I’m also having a bit of trouble as well. Perhaps someone out there can provide me with some insight into my problems.

Most, if not all, of my recent coding has been in languages such as Perl and PHP. Additionally, I mainly code alone, so my use of a revisioning system is purely for historical data rather than proper merging. I also use CVS to handle updates of deployed code. This alone has proven to be the strongest reason to continue using a revisioning system.

With CVS, I develop code until I’m ready to deploy it. At that point, I tag the current revision, usually with a tag of RELEASE. Code is then deployed by checking out the code currently tagged as RELEASE. From here, when I update the code for a new release, I use the -F flag to force the RELEASE tag onto the new code. A simple cvs update handles updating the deployed code to the latest release. If the deployed code was changed for some reason, as sometimes happens, CVS handles merging and I can make and necessary adjustments. Overall, this has worked quite well for some time. There are hiccups here and there, but overall it has been pretty flawless.

Recently, I used cvs2svn to convert my existing CVS repositories over to SVN. After some false starts, some research, and a few minor headaches, I have all of my code converted over to SVN. I was able to get websvn running as well, which is a nice change as I can browse the repositories freely. I started playing around a bit and noticed that all of the imports have three additional directories, trunk, tags, and branches. More research and I discovered that SVN doesn’t handle tags the same way that CVS does… This concerned me as I used tags pretty heavily for deployment.

So now we come to my problem. I have identified how to create new tags using svn copy. This works great for the first copy of a given tag, but it breaks down when updating a tag. A second copy fails because the files already exist. I can use svn delete to remove the files before copying the new ones, but that’s an additional step I have no desire to do. After all, the purpose of moving to SVN is to make life easier, not harder.

After some more reading, I find that I can merge releases. Presumably, I can check out the tagged version and then merge changes from the trunk version. However, this is still more complicated as I have to merge the code and then commit it back to the repository. So, again, we have more steps than I want to deal with.

I think I understand the reason behind not being able to copy twice. I’m also aware that the way I was using CVS was fairly non-standard, but it worked for me. The code base I normally worked on could have multiple features I’m implementing at any given time, and deployment of one feature may get prioritized. So, merely copying the base to a new tag doesn’t quite work as not everything in that code may be complete at a given time.

So what are my options here? SVN has some advantages that I really like, including the web view and better handling of authentication and permissions. However, being unable to re-tag is kind of a pain. One way or another, I think I’ll be using SVN anyway, but I was kind of hoping to find a decent way to handle everything… Anyone out there have any suggestions?

Programming *is* Rocket Science!

John Carmack is something of an idol to me. He’s an incredible programmer that has created some of the most advanced graphical games ever seen on the PC. He also dabbles in amateur rocketry with his rocketry company, Armadillo Aerospace, whom I’ve written about before.

I joined the Amateur Rocketry mailing list a couple years ago. The aRocket list is a great place to read about what’s going on in the amateur rocketry scene. The various rocket scientists on the list openly discuss designs, fuel mixtures, and a host of other topics. There’s also a lot of advice for both those getting into the game, as well as those who have been in a while.

Recently, John posted a note about the Rocket Racing League and some advice about the programming controlling vital components of the jets. Unfortunately, the mailing list archives require you to be a member of the list to view, but I’ll include some snippets of his post here.

The test pilot for the rocket racing league project made the suggestion that we should not allow the computer to shutdown the engine during the critical five to fifteen second period where the plane is at takeoff speed, but too close to the ground to make the turn needed to get backdown on a runway. We currently shut the engine down whenever a sensor goes out of expected range, and there are indeed plausible conditions where this can happen even when the engine is operating acceptably, such as a pressure transducer line cracking from vibration. On the other hand, there are plausible conditions where you really do want the computer to shut the valves immediately, such as a manifold explosion blowing the chamber off.

Disregarding the question of whether it was a good idea or not, this seems a really straightforward thing to implement. However, I cautioned everyone that this type of modification has all the earmarks of something that will almost certainly cause some problems when implemented.

Shutting off the engines on a regular plane is bad enough, but we’re talking about a full-blown rocket with wings here. I can imagine that a sudden loss of engines is enough to cause a good deal of stress for any pilot, but losing the engines just as the plane is taking off could be devastating. Of course, the engine exploding could be pretty devastating too.

We did implement it, and guess what? It caused a problem. We caught it in a static test and fixed it, and haven’t seen another problem with it since, but it still fell into the category of “risky to implement”. If we weren’t operating at a high testing tempo, I wouldn’t have done it. I certainly wouldn’t have done it if we only got one testing opportunity a year (of course, I wouldn’t undertake a project that only got one testing opportunity a year…).

Our flight control code really isn’t all that complicated, the change was only a few lines of code, and I’m a pretty good programmer. The exact nature of why I considered it a bit risky deal with internal details, but the point is that even fairly trivial sounding changes aren’t risk free to make. There are certainly some classes of changes that I make to our codebase regularly that I don’t bat an eyelash at, but you can’t usually tell the difference without intimate knowledge of the code.

I’ve found similar situations in my own programs. There are areas of the code that I’ll change, knowing it will have no real effect on anything else, and then there are those areas where changes are trivial, but they cause odd problems that come back to bite you later. Testing is, of course, the best way to find these problems, but testing isn’t always possible. But then, I’m not writing code that could mean the difference between life and death for a pilot. Not *that* has to be some serious stress.

Many software guys do not have a reasonable gut check feel for the assessment of software changes in an aerospace project. My normal coding practice has over an order of magnitude more test cycles than Armadillo does physical tests, and Armadillo does over an order of magnitude more tests than most other aerospace companies. Things are fundamentally different at different orders of magnitude.

John’s team probably runs tests more than any other team out there. He has successfully married the typical programming cycle with aerospace engineering. They constantly make incremental improvements and then run out to test them. And as surprising as it sounds, it seems to cost them less to do this. By making incremental improvements, they can control, to some degree, the impact on the system. What this means in the end is that they don’t spend an inordinate amount of time building this huge, complex system, only to have it explode on the first test. Not that they haven’t had their share of failures, but they’ve been a bit less spectacular than some.

John also presented some additional info from his other job.

As another cautionary tale, I recently had the entire codebase for our current Id Software project analyzed by a high end static analysis company. I was very pleased when they reported that our discovered defect rate was well under half the average that they find on codebases of comparable size. However, there were still over a hundred things that we could look at and say, “yes, that is broken”. Sure, most of them wouldn’t really be a problem, but it illustrates the inherent danger of large software solutions. The best defense, by far, is to be as small and simple as possible.

Small and simple is definitely the best. The more complexity you add, the more bugs and odd behavior pop up. Use the KISS principle!

Instant Kernel-ification


Server downtime is the scourge of all administrators, sometimes to the extent of bypassing necessary security upgrades, all in the name of keeping machines online.  Thanks to an MIT graduate student, Jeffery Brian Arnold, keeping a machine online, and up to date with security patches, may be easier than ever.

Ksplice, as the project is called, is a small executable that allows an administrator the ability to patch security holes in the Linux kernel, without rebooting the system.  According to the Ksplice website :

“Ksplice allows system administrators to apply security patches to the Linux kernel without having to reboot. Ksplice takes as input a source code change in unified diff format and the kernel source code to be patched, and it applies the patch to the corresponding running kernel. The running kernel does not need to have been prepared in advance in any way.”

Of course, Ksplice is not a perfect silver bullet, some patches cannot be applied using Ksplice.  Specifically, any patch that require “semantic changes to data structures” cannot be applied to the running kernel.  A semantic change is a change “that would require existing instances of kernel data structures to be transformed.”

But that doesn’t mean that Ksplice isn’t useful.  Jeffery looked at 32 months of kernel security patches and found that 84% of them could be applied using Ksplice.  That’s sure to increase the uptime.

I have to wonder, though, what is so important that you need that much uptime.  Sure, it’s nice to have the system run all the time, but if you have something that is absolutely mission critical, that must run 24×7, regardless, won’t you have a backup or two?  Besides which, you generally want to test patches before applying them to such sensitive systems.

There are, of course, other uses for this technology.  As noted on the Ksplice website, you can also use Ksplice to “add debugging code to the kernel or to make any other code changes that do not modify data structure semantics.”  Jeffery has posted a paper detailing how the technology works.

Pretty neat technology.  I wonder if this will lead to zero downtime kernel updates direct from Linux vendors.  As it is now, you’ll need to locate and manually apply kernel patches using this tool.


To optimize…

… or not to optimize, that is the programmer’s dilemma.  Optimization is a powerful tool in the programmer’s arsenal, but one that should be used sparingly and with care.  The key to optimization is identifying what exactly to optimize.


Shawn Hargreaves, creator of the Allegro game programming library recently posted a link to the blog of Thomas Aylesworth, a.k.a. SwampThingTom.  According to his blog, Thomas is a software engineer for the aerospace and defense industry.  He’s also a hobbyist game developer.

Tom’s first foray into the world of blogging is a series of posts about optimization, specifically targeting XNA based games.  In his first post, he explains the importance of design and what Big O notation is.  His second post delves into prototyping and benchmarking, complete with examples.  In part three, he introduces us to the NProf execution profiler as well as pointing out a few other potential bottlenecks.

Optimization in general is something you shouldn’t really need to worry about until the very end of the development cycle, if ever.  Optimization is a great tool for squeezing just a few more cycles out of your code when you really need it.  What you normally don’t see, however is a significant increase in speed.  If you’re looking for a significant increase in speed, take a look at your underlying algorithm.

Generally speaking, unless you’re writing extremely specialized code, optimization should be used very sparingly.  You’re better off looking for the elegant way to solve your programming dilemma.  Look into alternative algorithms or possibly re-design the data flow.  If you’re not sure where the bottleneck is, look into using a code profiler, or simply add debugging statement that surround suspected “slow” code.  You’ll usually find that a poor design decision is causing the bottleneck and that a simple re-design can result in huge speed increases while keeping your code readable and maintainable.


If you’re interested in optimization, or even just curious, take a look at Tom’s articles, they’re a great read.

phpTodo 0.8 Beta Released

A new version of phpTodo, version 0.8 Beta, was released today.  It’s been almost six months since the last release, mostly due to lack of time.  My primary goal for this release was to add ATOM support and get all the bugs fixed.  I feel I was able to accomplish both of these goals.

I think an official 1.0 release is imminent, assuming I have time to work on the program.  I have a few features I’d like to add before 1.0 if I can.  If they do get added, a 0.9 gamma version will be released before 1.0 becomes official.

After the 1.0 release, I’d like to get group support added.  In addition, I’m thinking about switching from single category based tasks to tags.  This would allow a single todo item to be placed into several categories at the same time.  Feed support will be updated as well, keeping in-line with the current feature set.


Overall, I’m quite happy with this project.  It’s helped me out in numerous ways, organizing my personal todo lists as well as giving me the opportunity to work on an open-source project.  I’d love to hear some feedback concerning this project, especially if you’re using it on a daily basis.  I’m definitely open to suggestions for improvements and I’d like to get some additional CSS layouts to include with the distribution.  You can leave any comments you may have right here on this blog entry.

Thanks to everyone who has already sent me suggestions and bug reports.  I hope to hear from more of you soon!  If you’re interested in trying out phpTodo, check out the demo site.