Posts Tagged ‘programming’

Contemplating the Future

Thursday, January 26th, 2012

In 2005 I obtained a job at a regional ILEC as a Data Operations Technician. As part of this job, I took over development of one of the tools we used to diagnose customer DSL connections. Problem was, this tool was written in PHP, a programming language I was, as yet, unfamiliar with.

At the same time, I was also looking for a web-based tool I could use to keep track of various tasks. While there were a few open-source tools I could use, none had the features I was looking for. So I decided to write one myself, and to write it in PHP so I could learn the language better. In the end, I’m glad I did as PHP has become indispensable for writing web-based tools.

The tool I wrote was a web-based todo manager called phpTodo. Since the alpha release in 2005, I have released 7 more versions. Work on phpTodo has ebbed and lowed with time, often interrupted by work and life in general. In fact, the last formal release was made almost 5 years ago, bringing the current version up to 0.8.1. In 2009, I found out that phpTodo was being packaged and released with Fedora as well.

After releasing 0.8.1, I decided to switch from using categories to using tags, similar to how the blogging system I use, Serendipity, uses them. This required rewriting a good deal of the back end of the system, as well as making extensive changes to the front end. I also started using the Prototype and Scriptaculous Javascript frameworks, and then later switched to jQuery. In all, a great deal of code has been rewritten.

I’m quite happy with the general feel of the new version I’ve been working on. While there is a good deal more code to be written, I’m confident there will be a code release soon enough.

I’ve been thinking a lot about the future of phpTodo and where I want to take it. When I originally started, I wrote the system such that I could see my todo list items via an RSS feed. At the time, I had a Blackberry phone and this worked brilliantly. Of course, this was purely a one-way feed with no way to update any todo items on the go. Since that time, I started working on a mobile view for the system, but stopped quickly after I realized how horrible working with WAP was. Fortunately, technology has progressed quickly since that time and WAP is no longer necessary. So, I’m considering working on a mobile version again.

A mobile version brings new challenges, however. It should be trivial to develop a mobile view that can be used while online, but my hope was to have an offline version as well that can be synchronized with the online version. One possibility is to develop an app that can be loaded onto a phone. That, of course, severely limits the platforms it can be run on. Another possibility is an HTML5 version, though that brings challenges of its own.

Another thought was to build a web service into phpTodo. The basic premise is an XML generator that, given a set of parameters, can supply an XML feed for external systems to use as input. And an XML parser that can receive data from external systems in order to update phpTodo data. I believe this can be used as the interface for the mobile view.

A web service can also be used to power another idea I had. I stumbled across the website of Brett Terpstra a while back and found a treasure trove of interesting ideas and useful code snippets. Among these is an obsession for recording notes to keep track of projects, interesting ideas, and helpful code snippets. Brett uses a number of custom scripts and software packages, most of which are exclusive to his platform of choice, OS X. To be honest, I find this incredibly intriguing, and potentially useful. So, I’ve been thinking about developing a command-line tool I can use to interact with phpTodo. A web service could make this a great deal easier.

I have no plans to stop working on the project, and, in fact, I’m eager to keep moving forward. As I continue to rely on phpTodo itself for my daily work, I rely on improvements I can make to the system. So overall, the future of phpTodo is bright.

Blacklisted!

Thursday, January 12th, 2012

Back in October of 2011, a bill was introduced in the House of Representatives called HR.3261, or the “Stop Online Privacy Act (SOPA).” Go take a look, I’ll wait. It’s a relatively straightforward bill, especially compared to others I’ve looked at. Hell, it’s only 15 pages long! And it’s going to kill the Internet.

Ok,ok.. It won’t *KILL* the Internet, but it has the potential to ruin what we consider to be the Internet. Personally, I believe that if this passes, it has the potential to turn the Internet into nothing more than a collection of business websites, at least in the US.

So how does this thing work? Well, it’s actually pretty straightforward. If your website is suspected of infringing on copyrighted material, your website is taken down, any advertising you have on your site is cut, and you are removed from search engines. But so what, you deserve it! You were breaking copyright law!

Not so fast. This applies to *any* content on your website. So if someone comments on a blog entry, or you innocently link to a website that infringes copyright, or other situations out of your control, you’re responsible. Basically, you have to police every single comment, link, etc. that appears on your website.

It’s even worse for service providers since they have to do the blocking. Every infringing site is blocked via DNS. And since the US doesn’t have control of all of DNS, and some infringing sites are not located in the US, this means we move into the realm of having DNS blacklist files. The ISP becomes the responsible party if they fail to block these sites, which in turn means more overhead for the ISP. Think you pay a lot for Internet access now?

So what can you do? Well, for one, you can contact your representative and tell them how insane this whole idea is. And you can protest SOPA itself by putting up a protest overlay on your site. There’s a github project with all of the source code you need to add an overlay to your website. Or, if you have a Serendipity web blog, you can download the Stop SOPA plugin I’ve written.

Get out there and protest!

Fixing the Serendipity XMLRPC plugin

Sunday, June 26th, 2011

A while ago I purchased a copy of BlogPress for my iDevices.. It’s pretty full-featured, and seems to work pretty well. Problem was, I couldn’t get it to work with my Serendipity-based blog. Oh well, a wasted purchase.

But not so fast! Every once in a while I go back and search for a possible solution. This past week I finally hit paydirt. I came across this post on the s9y forums.

This explained why BlogPress was crashing when I used it. In short, it was expecting to see a categoryName tag in the resulting XML from the Serendipity XMLRPC plugin. Serendipity, however, used description instead, likely because Serendipity has better support for the MetaWeblog API.

Fortunately, fixing this problem is very straightforward. All you really need to do is implement both APIs and return all of the necessary data for both APIs at the same time. To fix this particular problem, it’s a single line addition to the serendipity_xmlrpc.inc.php file located in $S9YHOME/plugins/serendipity_event_xmlrpc. That addition is as follows :


if ($cat['categoryid']) $xml_entries_vals[] = new XML_RPC_Value(
    array(
      'description'   => new XML_RPC_Value($cat['category_name'], 'string'),
      // XenoPhage: Add 'categoryName' to support mobile publishing (Thanks PigsLipstick)
      'categoryName'  => new XML_RPC_Value($cat['category_name'], 'string'),
      'htmlUrl'       => new XML_RPC_Value(serendipity_categoryURL($cat, 'serendipityHTTPPath'), 'string'),
      'rssUrl'        => new XML_RPC_Value(serendipity_feedCategoryURL($cat, 'serendipityHTTPPath'), 'string')
    ),
    'struct'
);

And poof, you now have the proper category support for Movable Type.

Sort By Sound ?

Monday, September 20th, 2010

I ran across this a few weeks ago and I thought it was simply brilliant. Sorting algorithms are, for better or worse, one of the most used algorithms in a programmers toolbox. For many, sorting is just something you need to learn to pass a computer science course. For others, they devote their lives to researching them.

The following two videos show an interesting view of sorting. An enterprising programmer decided to add a bit of sound to the sorting. There are endless ways the initial data can be arranged, so these sounds don’t represent how every sort of that type will sound. But the sound coupled with the visual representation of the sort make these videos worth a glance.

SQL Query Conundrum…

Thursday, August 19th, 2010

I have a brain teaser for ya.. I’m looking for a way to solve a SQL problem efficiently, specifically using MySQL. The goal is to get a count of the number of unique rows returned for a complex query. It’s actually for a pagination system so I can determine the limits necessary to efficiently query the database for the right amount of data rather than return everything and try to brute force it.

Let’s say I have three tables as follows :

mysql> describe person;
+——-+——————+——+—–+———+——-+
| Field | Type             | Null | Key | Default | Extra |
+——-+——————+——+—–+———+——-+
| id    | int(10) unsigned | NO   | PRI | NULL    |       |
| first | char(15)         | YES  |     | NULL    |       |
| last  | char(15)         | YES  |     | NULL    |       |
+——-+——————+——+—–+———+——-+
3 rows in set (0.02 sec)

mysql> describe interests;
+———-+——————+——+—–+———+——-+
| Field    | Type             | Null | Key | Default | Extra |
+———-+——————+——+—–+———+——-+
| id       | int(10) unsigned | NO   | PRI | NULL    |       |
| interest | char(15)         | YES  |     | NULL    |       |
+———-+——————+——+—–+———+——-+
2 rows in set (0.00 sec)

mysql> describe interest_link;
+————-+——————+——+—–+———+——-+
| Field       | Type             | Null | Key | Default | Extra |
+————-+——————+——+—–+———+——-+
| person_id   | int(10) unsigned | NO   |     | NULL    |       |
| interest_id | int(10) unsigned | NO   |     | NULL    |       |
+————-+——————+——+—–+———+——-+
2 rows in set (0.00 sec)

Simple enough. I’m mapping interests to people. I’ve entered data into these tables as follows :

mysql> select * from person;
+—-+——-+———-+
| id | first | last     |
+—-+——-+———-+
|  1 | John  | Doe      |
|  2 | Bob   | Jones    |
|  3 | Joe   | Smith    |
+—-+——-+———-+
3 rows in set (0.00 sec)

mysql> select * from interests;
+—-+———–+
| id | interest  |
+—-+———–+
|  1 | Computers |
|  2 | Music     |
|  3 | Food      |
|  4 | Beer      |
|  5 | Gaming    |
+—-+———–+
5 rows in set (0.00 sec)

mysql> select * from interest_link;
+———–+————-+
| person_id | interest_id |
+———–+————-+
|         1 |           1 |
|         1 |           2 |
|         1 |           4 |
|         2 |           1 |
|         2 |           5 |
|         2 |           4 |
|         3 |           3 |
|         3 |           2 |
|         3 |           4 |
+———–+————-+
9 rows in set (0.00 sec)

So far, so good. Now, I want to do a search to find which users are interested in music. Simple enough search, I’d do this with a simple select statement as follows :

mysql> select * from person as p left join interest_link as il on il.person_id = p.id where interest_id = 2;
+—-+——-+——–+———–+————-+
| id | first | last   | person_id | interest_id |
+—-+——-+——–+———–+————-+
|  1 | John  | Doe    |         1 |           2 |
|  3 | Joe   | Smith  |         3 |           2 |
+—-+——-+——–+———–+————-+
2 rows in set (0.00 sec)

But what if I want to find out who’s interested in music *and* beer?

mysql> select * from person as p left join interest_link as il on il.person_id = p.id where interest_id in (2,4);
+—-+——-+———-+———–+————-+
| id | first | last     | person_id | interest_id |
+—-+——-+———-+———–+————-+
|  1 | John  | Doe      |         1 |           2 |
|  1 | John  | Doe      |         1 |           4 |
|  2 | Bob   | Jones    |         2 |           4 |
|  3 | Joe   | Smith    |         3 |           2 |
|  3 | Joe   | Smith    |         3 |           4 |
+—-+——-+———-+———–+————-+
5 rows in set (0.00 sec)

That’s a problem, now I have 5 rows.. How do I make this a unique list? Well, I’m merely interested in names and ids, so I can do this :

mysql> select p.id, p.first, p.last from person as p left join interest_link as il on il.person_id = p.id where interest_id in (2,4);
+—-+——-+———-+
| id | first | last     |
+—-+——-+———-+
|  1 | John  | Doe      |
|  1 | John  | Doe      |
|  2 | Bob   | Jones    |
|  3 | Joe   | Smith    |
|  3 | Joe   | Smith    |
+—-+——-+———-+
5 rows in set (0.00 sec)

but that’s still 5 rows.. so what now?

mysql> select distinct p.id, p.first, p.last from person as p left join interest_link as il on il.person_id = p.id where interest_id in (2,4);
+—-+——-+———-+
| id | first | last     |
+—-+——-+———-+
|  1 | John  | Doe      |
|  2 | Bob   | Jones    |
|  3 | Joe   | Smith    |
+—-+——-+———-+
3 rows in set (0.00 sec)

Aha! perfect. That’s what I need.. almost. For this particular application, I want to paginate, so I need a total number of matching rows so I can properly identify the limits as well as the upper bound on page numbers. So, I’ll just replace the specific field names with a count(*) :

mysql> select distinct count(*) from person as p left join interest_link as il on il.person_id = p.id where interest_id in (2,4);
+———-+
| count(*) |
+———-+
|        5 |
+———-+
1 row in set (0.00 sec)

And here is where I’m stuck. I need the total count of DISTINCT names, not the total number of rows returned. I tried a GROUP BY, but that didn’t help much :

mysql> select count(*) from person as p left join interest_link as il on il.person_id = p.id where interest_id in (2,4) group by p.id;
+———-+
| count(*) |
+———-+
|        2 |
|        1 |
|        2 |
+———-+
3 rows in set (0.00 sec)

Sure, I get 3 rows, but what I’m looking for here is a single row with the total number of items. … So, what if I count the number of returned rows! :

mysql> select count(*) from (select count(*) from person as p left join interest_link as il on il.person_id = p.id where interest_id in (2,4) group by p.id) as foo;
+———-+
| count(*) |
+———-+
|        3 |
+———-+
1 row in set (0.00 sec)

BUT… at what cost? This seems like a rather complex query that might break down, significantly, when there’s a lot of data.. And the examples above are rather simplistic. In reality, we’re talking about more fields and more tables, so the simpler query gets a little complex to begin with. I’m open to ideas on how to do this properly via SQL. Yes, I am aware of indexing and how that speeds things up. I use indexing, I just eliminated it from the above example to simplify things. I’m open to ideas on how to do this properly via SQL.

I can simply return all the rows with the distinct clause, count them programmatically, and then proceed with the rest of the program, but depending on the selections made by the user, there could be a significant amount of data returned. I’m worried about both memory exhaustion on the part of the scripting language, as well as the processing and transmission time required to pass all of that data back to the program from the SQL database. Besides, this is the sort of problem that SQL was designed to solve.

I don’t think this is a unique problem, so someone out there has a solution. Perhaps the subselect *is* the better solution, but I don’t think so. I’m open to ideas. You can leave a comment here, or hit me up on twitter.

 

The Authentication Problem

Friday, March 5th, 2010

Authentication is a tricky problem. The goal of authentication is to verify the identify of the person, device, machine, etc. that is attempting to gain access to the protected system. There are many factors to consider when designing an authentication system. Here is a brief sampling:

  • How much security is necessary?
  • Do we require username?
  • How strong should the password be?
  • Do we need multi-factor authentication?

The need for authentication typically means that the data being accessed is sensitive in some way. This can be something as simple as a todo list or a user’s email, or as important as banking or top secret information. It can also mean that the data being accessed is valuable in some way such as a site that requires a subscription. So, the security necessary is dependent on the data being protected.

Usually, authentication systems require a username and some form of a password. For more secure systems, multi-factor authentication is used. Multi-factor authentication means that multiple pieces of information are used to authenticate the user. These vary depending on the security required. In the United States, federal regulators recognize the following factors:

  • Something the user knows (e.g., password, PIN)
  • Something the user has (e.g., ATM card, smart card)
  • Something the user is (e.g., biometric characteristic such as a fingerprint)

A username and a password is an example of a single-factor authentication mechanism. When you use an ATM machine, you supply it with an ATM card and then use a PIN. This is an example of two-factor authentication.

The U.S. Federal Financial Institutions Examination Council (FFIEC) recommends the use of multi-factor authentication for financial institutions. Unfortunately, most of the authentication systems currently in place are still single-factor authentication systems, despite asking for several pieces of information. For example, if you log into your bank system you use a username and password. Once the username and password pass, you are often asked for additional information such as answers to challenge questions. These are all examples of things the user knows, thus only a single factor.

Some institutions have begun using additional factors to identify the user such as a one-time password sent to an email address or cell phone. This can be cumbersome, however, as it can often take additional time to receive this information. To combat this, browser cookies are used after the first successful authentication. After the user logs in for the first time, they are offered a chance to have the system place a “secure token” on their system. Subsequent logins use this secure token in addition to the username and password to authenticate the user. This is arguably a second factor as it’s something the user has, as opposed to something they know. On the other hand, it is extremely easy to duplicate or steal cookies.

There are other ways that two-factor authentication can be circumvented as well. Since most institutions only use a single communication mechanism, hijacking that communication medium can result in a security breach.

Man-in-the-middle attacks use fake websites to lure users in and steal the authentication information the user uses to authenticate. This can happen transparently to the user by forwarding the information to the actual institution and letting the user continue to access the system. More sophisticated attacks have the user “fail” authentication the first time and let them in on subsequent tries. The attacker can then use the first authentication attempt to gain access themselves.

Another method is the use of Trojans. If a user can be tricked into installing malicious software into their system, an attacker can ride on the user’s session, injecting their own transactions into the communications channel.

Defending against these attacks is not easy and may be impossible in many situations. For instance, requiring a second method of communication for authentication may help to authenticate the user, but if an attacker can hijack the main communication path, they can still obtain access to the user’s current session. Use of encryption and proper training of users can help mitigate these types of attacks, but ultimately, any system using a public communication mechanism is susceptible to hijacking.

Session Security

Once authentication is complete, session security comes into play. Why go through all the trouble of authenticating the user if you’re not protecting the data they’re accessing? Assuming that the data itself is protected, we need to focus on protecting the data being transferred to and from the user. Additionally, we need to protect the user’s session itself.

Session hijacking is the term used to identify the stealing of a user’s session information to gain access to the information the user is accessing. There are four primary method of session hijacking.

  • Session Fixation
  • Session Sidejacking
  • Physical Access
  • Cross-site Scripting

Physical access is pretty straightforward. This involves an attacker directly accessing the user’s computer terminal and copying the session data. Session data can be something as simple as an alphanumeric token displayed right in the URL of the site being accessed. Or, it can be a piece of data on the machine such as a browser cookie.

Session fixation refers to a method by which an attacker can trick a user into using a pre-determined session ID. Once the user authenticates, the attacker gains access by using the same session ID. The system recognized the session ID as an authenticated session and lets the user in without verification.

Session Sidejacking involves an attacker intercepting the traffic between a user and the system. If a session is not encrypted, the attacker can obtain the session ID or cookie used to identify the user’s session. Once this information is obtained, the attacker can use the same information to gain access to the user’s session.

Finally, cross-side scripting is when an attacker tricks the user’s computer into sending session information to the attacker. This can happen when a user accesses a website that contains malicious code. For instance, an attacker can create a website with a special link to a well-known site such as a bank. The link contains additional code that, when run, sends the user’s authentication or session information to the attacker.

Encryption of the communications channel can mitigate some of these attack scenarios, but not all of them. Programmers should ensure that additional information is used to verify a user’s session. For instance, something as simple as verifying the user’s source IP address in addition to a session cookie is often enough to mitigate both physical access and session sidejacking. Not allowing a pre-defined session ID can prevent session fixation. And finally, proper coding can prevent cross-side scripting.

Additionally, any session information stored on the remote system being accessed should be properly secured as well. Merely securing the data accessed isn’t enough if an attacker can access the remote system and steal session information.

Unauthentication

Finally, how and when should a user be unauthenticated? Unauthentication is often overlooked when designing a secure system. If the user fails to log out, then attacks such as session hijacking become easier. Unauthentication can be tricky, however. There a number of factors to consider such as:

  • How and when should a user’s session be closed?
  • Should a user’s session time out?
  • How long should the timer be?

Most unauthentication currently consists of a user’s session timing out. After a pre-determined period of inactivity, the system will log a user out, deleting their session. Depending on the situation, this can be incredibly disruptive. For example, if a user’s email system has a short time out, they run the risk of losing a long email they’ve been working on. Some systems can mitigate this by recording the user’s data prior to logging them out, making it available again upon login so the user doesn’t lose it. Regardless, the longer the time out, the less secure a session can be.

Other unauthentication mechanisms have been discussed as well. When a physical token such as a USB key is used, the user can be unauthenticated if the key is removed from the system. Or, a device with some sort of radio in it, such as bluetooth, can unauthenticate the user if it is removed from the proximity of the system. Unfortunately, user’s will likely end up leaving these devices behind, significantly reducing their effectiveness.

As with authentication, unauthentication methods can depend on the sensitivity of the data being protected. Ultimately, though, every system should have some form of automatic unauthentication.

Data security in general can be a difficult nut to crack. System designers are typically either very lax in their security design, often overlooking session security and unauthentication, or they can be very draconian, opting to make the system very secure at the expense of the user. Designing a user-friendly, but secure, system is difficult, at best.

 

phpTodo … In Fedora!!

Thursday, April 30th, 2009

Apparently I’m always the last to know… But.. I found out today that phpTodo, the todo list manager I wrote (and continue to write) has been included in Fedora. In fact, it seems it’s been in there since Fedora 7. It’s not in the main distribution, nor should it be, but apparently it’s a maintained package. Thanks, Marc!

Honestly, I’m truly honored. I wrote this on a whim and it has served me well. I use it every day! And since writing it, I’ve had a handful of people make suggestions and offer patches. I think it’s been pretty successful for a small project.

So, how about an update? Well, I’ve been working on phpTodo in my spare time, which, unfortunately, has been relatively lacking as of late. I have been able to add in a number of fixes and new features, however. The biggest change in the next release will be the removal of categories in favor of having tags. In using phpTodo over the years, I’ve noticed a number of times where I’d like to be able to put an item in multiple categories, and display multiple categories at once. While this may have been possible with categories as they were implemented, I think tags works a bit better. I’ve borrowed an idea from the Serendipity blogging platform to implement tags in a user-friendly manner, so I think the implementation works pretty well. I still have some more work to tie it all together, but it is coming along.

Another change is the addition of the Prototype and Scriptaculous javascript frameworks. There are a few reasons I decided to go this route. First and foremost, it significantly reduces the amount of work necessary to perform cross-platform javascript operations. To date, I’ve used relatively simple javascript functions, mostly for front-line input validation, but with the addition of tags, I wanted to move into some more advanced techniques. I’m striving to keep it simple and not overdo it, so don’t worry.

And, of course, there are the various bug fixes that need to be added. Overall, I’m excited about the next release of phpTodo. I don’t have a timetable as of now, but I’m hopeful that my free time will increase shortly, giving me more time to work on it. If so, then I’m optimistic about a new release sometime in the next 3-4 months. We’ll see what happens.

If you’re using phpTodo, I’d like to hear from you. I’m interested in what you like and what you dislike about the program, the interface, the workflow, etc. What features would you like to see? What features would you hate to see?

Thanks!

 

CVS to Subversion…

Sunday, January 25th, 2009

I’ve been using CVS for a number of years and it has served me well. I have looked at Subversion a number of times, but never really had the time to deal with it. That has changed somewhat and I have had the chance to use SVN a bit more recently. SVN feels a bit more elegant, and, in most cases, faster than CVS. But, I’m also having a bit of trouble as well. Perhaps someone out there can provide me with some insight into my problems.

Most, if not all, of my recent coding has been in languages such as Perl and PHP. Additionally, I mainly code alone, so my use of a revisioning system is purely for historical data rather than proper merging. I also use CVS to handle updates of deployed code. This alone has proven to be the strongest reason to continue using a revisioning system.

With CVS, I develop code until I’m ready to deploy it. At that point, I tag the current revision, usually with a tag of RELEASE. Code is then deployed by checking out the code currently tagged as RELEASE. From here, when I update the code for a new release, I use the -F flag to force the RELEASE tag onto the new code. A simple cvs update handles updating the deployed code to the latest release. If the deployed code was changed for some reason, as sometimes happens, CVS handles merging and I can make and necessary adjustments. Overall, this has worked quite well for some time. There are hiccups here and there, but overall it has been pretty flawless.

Recently, I used cvs2svn to convert my existing CVS repositories over to SVN. After some false starts, some research, and a few minor headaches, I have all of my code converted over to SVN. I was able to get websvn running as well, which is a nice change as I can browse the repositories freely. I started playing around a bit and noticed that all of the imports have three additional directories, trunk, tags, and branches. More research and I discovered that SVN doesn’t handle tags the same way that CVS does… This concerned me as I used tags pretty heavily for deployment.

So now we come to my problem. I have identified how to create new tags using svn copy. This works great for the first copy of a given tag, but it breaks down when updating a tag. A second copy fails because the files already exist. I can use svn delete to remove the files before copying the new ones, but that’s an additional step I have no desire to do. After all, the purpose of moving to SVN is to make life easier, not harder.

After some more reading, I find that I can merge releases. Presumably, I can check out the tagged version and then merge changes from the trunk version. However, this is still more complicated as I have to merge the code and then commit it back to the repository. So, again, we have more steps than I want to deal with.

I think I understand the reason behind not being able to copy twice. I’m also aware that the way I was using CVS was fairly non-standard, but it worked for me. The code base I normally worked on could have multiple features I’m implementing at any given time, and deployment of one feature may get prioritized. So, merely copying the base to a new tag doesn’t quite work as not everything in that code may be complete at a given time.

So what are my options here? SVN has some advantages that I really like, including the web view and better handling of authentication and permissions. However, being unable to re-tag is kind of a pain. One way or another, I think I’ll be using SVN anyway, but I was kind of hoping to find a decent way to handle everything… Anyone out there have any suggestions?

Programming *is* Rocket Science!

Wednesday, August 13th, 2008

John Carmack is something of an idol to me. He’s an incredible programmer that has created some of the most advanced graphical games ever seen on the PC. He also dabbles in amateur rocketry with his rocketry company, Armadillo Aerospace, whom I’ve written about before.

I joined the Amateur Rocketry mailing list a couple years ago. The aRocket list is a great place to read about what’s going on in the amateur rocketry scene. The various rocket scientists on the list openly discuss designs, fuel mixtures, and a host of other topics. There’s also a lot of advice for both those getting into the game, as well as those who have been in a while.

Recently, John posted a note about the Rocket Racing League and some advice about the programming controlling vital components of the jets. Unfortunately, the mailing list archives require you to be a member of the list to view, but I’ll include some snippets of his post here.

The test pilot for the rocket racing league project made the suggestion that we should not allow the computer to shutdown the engine during the critical five to fifteen second period where the plane is at takeoff speed, but too close to the ground to make the turn needed to get backdown on a runway. We currently shut the engine down whenever a sensor goes out of expected range, and there are indeed plausible conditions where this can happen even when the engine is operating acceptably, such as a pressure transducer line cracking from vibration. On the other hand, there are plausible conditions where you really do want the computer to shut the valves immediately, such as a manifold explosion blowing the chamber off.

Disregarding the question of whether it was a good idea or not, this seems a really straightforward thing to implement. However, I cautioned everyone that this type of modification has all the earmarks of something that will almost certainly cause some problems when implemented.

Shutting off the engines on a regular plane is bad enough, but we’re talking about a full-blown rocket with wings here. I can imagine that a sudden loss of engines is enough to cause a good deal of stress for any pilot, but losing the engines just as the plane is taking off could be devastating. Of course, the engine exploding could be pretty devastating too.

We did implement it, and guess what? It caused a problem. We caught it in a static test and fixed it, and haven’t seen another problem with it since, but it still fell into the category of “risky to implement”. If we weren’t operating at a high testing tempo, I wouldn’t have done it. I certainly wouldn’t have done it if we only got one testing opportunity a year (of course, I wouldn’t undertake a project that only got one testing opportunity a year…).

Our flight control code really isn’t all that complicated, the change was only a few lines of code, and I’m a pretty good programmer. The exact nature of why I considered it a bit risky deal with internal details, but the point is that even fairly trivial sounding changes aren’t risk free to make. There are certainly some classes of changes that I make to our codebase regularly that I don’t bat an eyelash at, but you can’t usually tell the difference without intimate knowledge of the code.

I’ve found similar situations in my own programs. There are areas of the code that I’ll change, knowing it will have no real effect on anything else, and then there are those areas where changes are trivial, but they cause odd problems that come back to bite you later. Testing is, of course, the best way to find these problems, but testing isn’t always possible. But then, I’m not writing code that could mean the difference between life and death for a pilot. Not *that* has to be some serious stress.

Many software guys do not have a reasonable gut check feel for the assessment of software changes in an aerospace project. My normal coding practice has over an order of magnitude more test cycles than Armadillo does physical tests, and Armadillo does over an order of magnitude more tests than most other aerospace companies. Things are fundamentally different at different orders of magnitude.

John’s team probably runs tests more than any other team out there. He has successfully married the typical programming cycle with aerospace engineering. They constantly make incremental improvements and then run out to test them. And as surprising as it sounds, it seems to cost them less to do this. By making incremental improvements, they can control, to some degree, the impact on the system. What this means in the end is that they don’t spend an inordinate amount of time building this huge, complex system, only to have it explode on the first test. Not that they haven’t had their share of failures, but they’ve been a bit less spectacular than some.

John also presented some additional info from his other job.

As another cautionary tale, I recently had the entire codebase for our current Id Software project analyzed by a high end static analysis company. I was very pleased when they reported that our discovered defect rate was well under half the average that they find on codebases of comparable size. However, there were still over a hundred things that we could look at and say, “yes, that is broken”. Sure, most of them wouldn’t really be a problem, but it illustrates the inherent danger of large software solutions. The best defense, by far, is to be as small and simple as possible.

Small and simple is definitely the best. The more complexity you add, the more bugs and odd behavior pop up. Use the KISS principle!

Switching Gears…

Friday, August 8th, 2008

Ok, so I did it. I made the switch. I bought a Mac. Or, more specifically, I bought a Macbook Pro.

Why? Well, I had a few reasons. Windows is the standard for most office applications, and it’s great for gaming, but I find it to be a real pain to code in. I’m not talking code for Windows applications, I’m talking code for web applications. Most of my code is perl and PHP and I really have no interest in fighting with Windows to get a stable development platform for these. Sure, I can remotely access the files I need, but then I’m tethered to an Internet connection. I had gotten around this (somewhat) by installing Linux on my Windows machine via VirtualBox. It worked wonderfully, but it’s slower that way, and there are still minor problems with accessibility, things not working, etc.

OSX seemed to fit the bill, though. By default, it comes with apache and PHP, you can install MySQL easily, and it’s built on top of BSD. I can drop to a terminal prompt and interact with it the same way I interact with a Linux machine. In fact, almost every standard command I use on my Linux servers is already on my Macbook.

Installing Apple’s XCode developer tools gives me just about everything else I could need, including a free IDE! Though, this particular IDE is more suited for C++, Java, Ruby, Python, and Cocoa. Still, it’s free and that’s nothing to scoff at. I have been using a trial of Komodo, though, and I’m leaning towards buying myself a copy. $295 is steep, though.

What really sold me on a Mac is the move to Intel processors and their Bootcamp software. I play games, and Mac doesn’t have the widest library of games, so having a Windows machine available is a must. Thanks to Bootcamp, I can continue to play games while keeping my development platform as well. Now I have OSX as my primary OS and a smaller Bootcamp partition for playing games. With the nVidia GeForce card in this beast, as well as a fast processor and 2GB of RAM, I’m set for a while..

There are times, though, when I’d like to have Windows apps at my fingertips, while I’m in OSX. For that, I’ve tried both Parallels and VMWare Fusion. Parallels is nice, and it’s been around for a while. It seems to work really well, and I had no real problems trying it out. VMWare Fusion 2 is currently in beta, and I installed that as well. I’m definitely leaning towards VMWare, though, because I’ve used them in the past, and they really know virtual machines. Both programs have a nifty feature that lets you run Windows apps in such as way as to make it seem like they’re running in OSX. In parallels it’s called Coherence, and in VMWare it’s called Unity. Neat features!

So far I’ve been quite pleased with my purchase. The machine is sleak, runs fast, and allows me more flexibility than I’ve ever had in a laptop. It does run a bit hot at times, but that’s what lapdesks are for.. :)

So now I’m an Apple fan… I’m sure you’ll be seeing posts about OSX applications as I learn more about my Mac. I definitely recommend checking them out if you’ve never used one. And, if you have used one in the past, pre-OSX days, check them out now. I hates the old Mac OS, but OSX is something completely different, definitely work a second look.