Intrusion Prevention Systems

This post first appeared on Redhat’s Enable Sysadmin community. You can find the post here.

“Intruder alert!  Strace to PID 45555, log and store.”
“Roger that, overseer. Target acquired.”
“Status report!”
“Suspicious activity verified, permission to terminate?”
“Permission granted, deploy Sigkill.”
“Process terminated at 2146 hours.”

Intrusion Prevention Systems, or IPS, are tools designed to detect and stop intrusions in their tracks. They come two basic flavors, network-based and host-based. As you may suspect, a network-based IPS is meant to be deployed to monitor the network and a host-based IPS is deployed on a host with the intention of monitoring just a single host.

How these tools work varies from vendor to vendor, but the basics are the same. The network-based tool monitors traffic on the network and matches it to a long list of known signatures. These signatures describe a variety of attacks ranging from simple corrupt packets to more specific attacks such as SQL injection.

Host-based tools tend to have more capabilities as they have access to the entire host. A host-based IPS can look at network traffic as well as monitor files and logs. One of the more popular tools, OSSEC-HIDS, monitors traffic, logs, file integrity, and even has signatures for common rootkits.

More advanced tools have additional detection capabilities such as statistical anomaly detection or stateful protocol inspection. Both of these capabilities use algorithms to detect intrusions. This allows detection of intrusions that don’t yet have signatures created for them.


Unlike it’s predecessor, the Intrusion Detection System, or IDS, when an IPS detects an intrusion it moves to block the traffic and prevent it from getting to its target. As you can imagine, ensuring that the system blocks only bad traffic is of utmost importance. Deploying a tool that blocks legitimate traffic is a quick way to frustrate users and get yourself in trouble. An IDS merely detects the traffic and sends an alert. Given the volume of traffic on a typical network or host today, an IDS is of limited use.

The Redhat Enterprise Packages for Enterprise Linux repository includes two great HIDS apps that every admin should look into. HIDS apps are a bit of a hybrid solution in that they’re both an IDS and an IPS. That is, they can be configured to simply alert when they see issues or they can often be configured to run scripts to react to scenarios that trigger alerts.


First up from EPEL is Tripwire, a file integrity monitoring tool, which Seth Kenlon wrote about for Enable Sysadmin back in April. Tripwire’s job in life is to monitor files on the host and send an alert if they change in ways they’re not supposed to. For instance, you can monitor a binary such as /bin/bash and send an alert if the file changes in some way such as permissions or other attributes of the file. Attackers often modify common binaries and add a payload intended to keep their access to the server active. Tripwire can detect that.


The second EPEL package is fail2ban. Fail2ban is more of an IPS style tool in that it monitors and acts when it detects something awry. One common implementation of fail2ban is monitoring the openssh logs. By building a signature that identifies a failed login, fail2ban can detect multiple attempts to login from a single source address and block that source address. Typically, fail2ban does this by adding rules to the host’s firewall, but in reality, it can run any script you can come up with. So, for instance, you can write a script to block the IP on the local firewall and then transmit that IP to some central system that will distribute the firewall block to other systems. Just be careful, however, as globally blocking yourself from every system on the network can be rather embarrassing.


OSSEC-HIDS, mentioned previously, is a personal favorite of mine. It’s much more of a swiss army knife of tools. It combines tools like tripwire and fail2ban together into a single tool. It can be centrally managed and uses encrypted tunnels to communicate with clients. The community is very active and new signatures are created all the time. Definitely worth checking out.


Snort is a network-based IDS/IPS (NIDS/NIPS). Where HIDS are installed on servers with the intention of monitoring processes on the server itself, NIDS are deployed to monitor network traffic. Snort was first introduced in 1998 and has more recently been acquired by Cisco. Cisco has continued to support Snort as open source while simultaneously incorporating it into their product line.

Snort, like most NIDS/NIPS, works by inspecting each packet as it passes through the system. Snort can be deployed as an IDS by mirroring traffic to the system, or it can be deployed as an IPS by putting it in-line with traffic. In either case, Snort inspects the traffic and compares it to a set of signatures and heuristic patterns, looking for bad traffic. Snort’s signatures are updated on a regular basis and uses libpcap, the same library used by many popular network inspection tools such as tcpdump and wireshark.

Snort can also be configured to capture traffic for later inspection. Be aware, however, that this can eat up disk space pretty rapidly.


Suricata is a relatively new IDS/IPS, released in 2009. Suricata is designed to be multi-threaded, making it much faster than competing products. Like Snort, it uses signatures and heuristic detection. In fact, it can use most Snort rules without any changes. It also has it’s own ruleset that allows it to use additional features such as file detection and extraction.

Zeek (Formerly Bro)

Bro was first release in 1994, making it one of the oldest IDS applications mentioned here. Originally named in reference to George Orwell’s book, 1984, more recent times have seen a re-branding to the arguably less offensive name, Zeek.

Zeek is often used as a network analysis tool but can also be deployed as an IDS. Like Snort, Zeek uses libpcap for packet capture. Once packets enter the system, however, any number of frameworks or scripts can be applied. This allows Zeek to be very flexible. Scripts can be written to be very specific, targeting specific types of traffic or scenarios, or Zeek can be deployed as an IDS, using various signatures to identify and report on suspect traffic.


One final word of caution. IPS tools are powerful and extremely useful in automating the prevention of intrusions. But like all tools, they need to be maintained properly. Ensure your signatures are up to date and monitor the reports coming from the system. It’s still possible for a skilled attacker to slip by these tools and gain access to your systems. Remember, there is no such thing as a security silver bullet.

Ping, traceroute, and netstat are the trifecta network troubleshooting tools for a reason

This post first appeared on Redhat’s Enable Sysadmin community. You can find the post here.

I’ve spent a career building networks and servers, deploying, troubleshooting, and caring for applications. When there’s a network problem, be it outages or failed deployments, or you’re just plain curious about how things work, three simple tools come to mind: ping, traceroute, and netstat.


Ping is quite possibly one of the most well known tools available. Simply put, ping sends an “are you there?” message to a remote host. If the host is, in fact, there, it returns a “yup, I’m here” message. It does this using a protocol known as ICMP, or Internet Control Message Protocol. ICMP was designed to be an error reporting protocol and has a wide variety of uses that we won’t go into here.

Ping uses two message types of ICMP, type 8 or Echo Request and type 0 or Echo Reply. When you issue a ping command, the source sends an ICMP Echo Request to the destination. If the destination is available, and allowed to respond, then it replies with an ICMP Echo Reply. Once the message returns to the source, the ping command displays a success message as well as the RTT or Round Trip Time. RTT can be an indicator of the latency between the source and destination.

Note: ICMP is typically a low priority protocol meaning that the RTT is not guaranteed to match what the RTT is to a higher priority protocol such as TCP.

Ping diagram (via GeeksforGeeks)

When the ping command completes, it displays a summary of the ping session. This summary tells you how many packets were sent and received, how much packet loss there was, and statistics on the RTT of the traffic. Ping is an excellent first step to identifying whether or not a destination is “alive” or not. Keep in mind, however, that some networks block ICMP traffic, so a failure to respond is not a guarantee that the destination is offline.

 $ ping
PING ( 56 data bytes
64 bytes from icmp_seq=0 ttl=56 time=15.740 ms
64 bytes from icmp_seq=1 ttl=56 time=14.648 ms
64 bytes from icmp_seq=2 ttl=56 time=11.153 ms
64 bytes from icmp_seq=3 ttl=56 time=12.577 ms
64 bytes from icmp_seq=4 ttl=56 time=22.400 ms
64 bytes from icmp_seq=5 ttl=56 time=12.620 ms
--- ping statistics ---
6 packets transmitted, 6 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 11.153/14.856/22.400/3.689 ms

The example above shows a ping session to From the output you can see the IP address being contacted, the sequence number of each packet sent, and the round trip time. 6 packets were sent with an average RTT of 14ms.

One thing to note about the output above and the ping utility in general. Ping is strictly an IPv4 tool. If you’re testing in an IPv6 network you’ll need to use the ping6 utility. The ping6 utility works roughly identical to the ping utility with the exception that it uses IPv6.


Traceroute is a finicky beast. The premise is that you can use this tool to identify the path between a source and destination point. That’s mostly true, with a couple of caveats. Let’s start by explaining how traceroute works.

Traceroute diagram (via StackPath)

Think of traceroute as a string of ping commands. At each step along the path, traceroute identifies the IP of the hop as well as the latency to that hop. But how is it finding each hop? Turns out, it’s using a bit of trickery.

Traceroute uses UDP or ICMP, depending on the OS. On a typical *nix system it uses UDP by default, sending traffic to port 33434 by default. On a Windows system it uses ICMP. As with ping, traceroute can be blocked by not responding to the protocol/port being used.

When you invoke traceroute you identify the destination you’re trying to reach. The command begins by sending a packet to the destination, but it sets the TTL of the packet to 1. This is significant because the TTL value determines how many hops a packet is allowed to pass through before an ICMP Time Exceeded message is returned to the source. The trick here is to start the TTL at 1 and increment it by 1 after the ICMP message is received.

$ traceroute
traceroute to (, 64 hops max, 52 byte packets
 1 (  1747.782 ms  1.812 ms  4.232 ms
 2 (  10.838 ms  12.883 ms  8.510 ms
 3  xx.xx.xx.xx (xx.xx.xx.xx)  10.588 ms  10.141 ms  10.652 ms
 4  xx.xx.xx.xx (xx.xx.xx.xx)  14.965 ms  16.702 ms  18.275 ms
 5  xx.xx.xx.xx (xx.xx.xx.xx)  15.092 ms  16.910 ms  17.127 ms
 6 (  13.711 ms  14.363 ms  11.698 ms
 7 (  12.802 ms (  12.647 ms  12.963 ms
 8 (  11.901 ms  13.666 ms  11.813 ms

Traceroute displays the source address of the ICMP message as the name of the hop and moves on to the next hop. When the source address matches the destination address, traceroute has reached the destination and the output represents the route from the source to the destination with the RTT to each hop. As with ping, the RTT values shown are not necessarily representative of the real RTT to a service such as HTTP or SSH. Traceroute, like ping, is considered to be lower priority so RTT values aren’t guaranteed.

There is a second caveat with traceroute you should be aware of. Traceroute shows you the path from the source to the destination. This does not mean that the reverse is true. In fact, there is no current way to identify the path from the destination to the source without running a second traceroute from the destination. Keep this in mind when troubleshooting path issues.


Netstat is an indispensable tool that shows you all of the network connections on an endpoint. That is, by invoking netstat on your local machine, all of the open ports and connections are shown. This includes connections that are not completely established as well as connections that are being torn down.

$ sudo netstat -anptu
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0    *               LISTEN      4417/master
tcp        0      0   *               LISTEN      2625/java
tcp        0      0*               LISTEN      559/slapd
tcp        0      0    *               LISTEN      1180/sshd
tcp        0      0        ESTABLISHED 2625/java
tcp        0      0      ESTABLISHED 559/slapd

The output above shows several different ports in a listening state as well as a few established connections. For listening ports, if the source address is, it is listening on all available interfaces. If there is an IP address instead, then the port is open only on that specific interface.

The established connections show the source and destination IPs as well as the source and destination ports. The Recv-Q and Send-Q fields show the number of bytes pending acknowledgement in either direction. Finally, the PID/Program name field shows the process ID and the name of the process responsible for the listening port or connection.

Netstat also has a number of switches that can be used to view other information such as the routing table or interface statistics. Both IPv4 and IPv6 are supported. There are switches to limit to either version, but both are displayed by default.

In recent years, netstat has been superseded by the ss command. You can find more information on the ss command in this post by Ken Hess.


As you can see, these tools are invaluable when troubleshooting network issues. As a network or systems administrator, I highly recommend becoming intimately familiar with these tools. Having these available may save you a lot of time troubleshooting.

Deployment Quest

So you want to deploy an application… You’ve put in the time and developed the latest and greatest widget and now you want to expose it to the world. It’s something truly ground breaking and people will be rushing to check it out. But where do you start? What’s involved with deploying an application, anyway?

The answer, it turns out, varies quite a bit, depending on what you’re trying to accomplish. So where to begin? First you need to determine what architecture you want to use. Are you deploying a simple website that can be stood up on a LAMP stack? Or perhaps this is something a bit more involved that requires additional back end services.

Are you deploying to a cloud provider such as AWS, Azure, or GCP? Will this be deployed on a virtual machine, or are you using something “newer” such as containers or serverless?

Do you have data to be stored? If so, how are you planning on storing this data? Are you using a SQL or NoSQL database? Do you need to store files?

How much traffic are you expecting? Do you need to be able to scale to larger workloads? Will you scale vertically or horizontally? Do you have or need a caching layer?

What about security? How will you protect the application? What about the data the application stores? How “secure” do you need to make your application and it’s deployment?

As you can see, there are a lot of questions to answer, despite this being a relatively simple application. Over the next few blog posts I’ll be digging into various deployment strategies, specifics on various choices you can make, and the pros and cons of each. Links to each post can be found below:

Does it do the thing?

Back in 2012 I gave a talk at Derbycon 2.0. This was my first infosec talk and I was a little nervous, to say the least. Anyway, I described a system I wanted to write that handled distributed baseline scanning.

After a lot of starts and stops, I finished a basic 1.0 version in 2014. It’s still quite rough and I’ve since been working, intermittently, on making the system more robust and solid. I’ve been working on a python replacement for the GUI as well, instead of the current PHP one. The repository is located here, if you’re interested in taking a look.

Why am I telling you all of this? Well, as part of the updates I’m making, I wanted to do things the “right way” and make sure I have unit testing in place before I start making additional changes to the code. Problem is, while I learned about unit testing, I’ve never really implemented it in any meaningful way, so this is a bit new to me.

So why unit testing? Well, the hypothesis is that by creating tests that check every line of code, you ensure that the code is working as expected. Thus, if the tests pass, then the code should be solid and bug free. In reality, this is rarely the case. Tests can be just as flawed as any other code. Additionally, you may miss testing certain corner cases and miss potential bugs. In the end, the general consensus is that unit testing is a complicated religious argument.

Let’s assume that we want to unit test anyway and move on to the actual testing bits, shall we? We’ll start with a contrived example to make things easier. Assume we have the following code in a file called


def add(value1, value2):
    return value1 + value2

Simple enough, just a simple function to return the value of two numbers added together. Let’s create some test cases, shall we?


from mytestcode import add

class TestAdd(object):
    def test_add(self):
        assert add(1,1) == 2

    def test_add_fail(self):
        assert add(1,1) != 3

What we have here are two simple test cases. First, we test to make sure that if we call the add function with two values, 1 and 1, we get a 2 as a return value. Second, we test that providing the same values as input does not return a 3. Simple, right? But have we really tested all of the corner cases? What happens if we feed the function a negative? How about a non-numeric value? Are there cases where we can cause an exception?

To be fair, the original function is poorly written and is merely being used as a simple example. This is the problem with contrived examples, of course. They miss important details, often simply things too much, and can lead to beginners making big mistakes when using them as teaching tools. So please, be aware, the above code really isn’t very good code. It’s intended to be simple to understand.

Let’s take a look at some “real” code directly from my distributed scanner project. This particular code is something I found on Stack Overflow when I was looking for a way to identify whether a process was still running or not.


import errno
import os
import sys

def pid_exists(pid):
    """Check whether pid exists in the current process table.
    UNIX only.
    if pid < 0:
        return False
    if pid == 0:
        # According to "man 2 kill" PID 0 refers to every process
        # in the process group of the calling process.
        # On certain systems 0 is a valid PID but we have no way
        # to know that in a portable fashion.
        raise ValueError('invalid PID 0')
        os.kill(pid, 0)
    except OSError as err:
        if err.errno == errno.ESRCH:
            # ESRCH == No such process
            return False
        elif err.errno == errno.EPERM:
            # EPERM clearly means there's a process to deny access to
            return True
            # According to "man 2 kill" possible error values are
            # (EINVAL, EPERM, ESRCH)
        return True

Testing this code should be relatively straightforward, with the exception of the os.kill call. For that, we’ll need to delve into mock objects. Let’s tackle the simple cases first:

#!/usr/bin/env python

import pytest

from libs.funcs import pid_exists

class TestFuncs(object):
    def test_pid_negative(self):
        assert pid_exists(-1) == False

    def test_pid_zero(self):
        with pytest.raises(ValueError) as e_info:

    def test_pid_typeerror(self):
        with pytest.raises(TypeError):
        with pytest.raises(TypeError):
        with pytest.raises(TypeError):

That’s relatively simple. We verify that False is returned for a negative PID and a ValueError is returned for a PID of zero. We also test that a TypeError is returned if we don’t provide an integer value. What’s left is handling a valid PID and testing that it returns True for a running process and False otherwise. In order to test the rest, we could go through a lot of elaborate setup to start a process, get the PID, and then test our code, but there’s a lot that can go wrong there. Additionally, we’re looking to test our logic and not the entirety of another module. So, what we really want is a way to provide an arbitrary return value for a given call. Enter the mock module.

The mock module is part of the unittest framework in python. Essentially, the mock module allows you to identify a call or an object that you want to create a fake version of, and then provide the behavior you’re expecting that mocked version to have. So, for instance, you can mock a function call and simply provide the return value you’re looking for instead of having to call the function directly. This functionality allows you to precisely test your logic versus doing a deeper integration test.

To finish up our testing code for the pid_exists() function, we want to mock the os.kill() function and have it return specific values so we can check the various branches of code we have.

    def test_pid_exists(self, oskillobj):
        oskillobj.return_value = None
        assert pid_exists(100) == True

    def test_pid_does_not_exist(self, oskillobj):
        oskillobj.side_effect = OSError(errno.ESRCH, 'No such process')
        assert pid_exists(1234) == False

    def test_pid_no_permissions(self, oskillobj):
        oskillobj.side_effect = OSError(errno.EPERM, 'Operation not permitted')
        assert pid_exists(1234) == True

    def test_pid_invalid(self, oskillobj):
        oskillobj.side_effect = OSError(errno.EINVAL, 'Invalid argument')
        with pytest.raises(OSError):

    def test_pid_os_typeerror(self, oskillobj):
        oskillobj.side_effect = TypeError('an integer is required (got type str)')
        with pytest.raises(TypeError):

The above code tests all of the branching available in the rest of the code, verifying the logic we’ve written. The code should be pretty straightforward. The return_value attribute of a mock object directly defines what we want the mocked function to recall while the side_effect attribute allows us to throw an exception in response to the function call. With those two features of a mocked object, we’re able to successfully test the rest of the cases we need.

This little journey to learn how to write unit tests has been fun and informative. I just need to finish up the rest of the code, striving to hit as close to 100% coverage as I can while keeping the test cases reasonable. It’s taken a while to get going, but the more code I’ve been writing, the faster and more accurate I’m getting. As they say, “practice makes perfect,” though I’d settle with functionally complete and relatively bug-free.

One final word of caution. I’m a sole developer working on this code, so I’m the only one around to write test cases. In a larger shop, the originator of the logic should not be the one writing the test cases. The reason for this is that the original coder typically knows their code quite well and has expectations regarding how the code will be used. For instance, I’m expecting that anyone calling the add() function I wrote above to only supply numbers and I haven’t added any sort of type checking or input validation. As a result, I avoided adding test cases that supply invalid inputs, knowing that would fail. Someone else writing the test cases would likely have provided a number of different inputs and found that input validation was missing. So if you’re in a larger shop, do yourself a favor and have someone else write your test cases. And to ensure they provide robust test cases, only provide the function prototypes and not the full function definitions.

So, new digs?

It looks a bit different around here lately. Sure, it’s roughly the same as what it was, but something is off.. A little bit here and there, so what changed?

Well, to tell the truth, I’ve switched blogging platforms. Don’t get me wrong, I love Serendipity. I’ve used it for years, love the features, love the simplicity. Unfortunately, Serendipity doesn’t have the greatest support for offline blogging, updates are relatively sparse, and it’s limited to just blogging. So I decided it’s time for a change.

Ok. Deep breath. I’ve switched to WordPress. Yes, yes, I know. I’ve decried WordPress as an insecure platform for a long time, but I’ve somewhat changed my thinking. The team at WordPress has done a great job ensuring the core platform is secure and they’re actively working to help older installations upgrade to newer releases. Plugins are where the majority of the security issues exist these days, and many of the more popular plugins are being actively scanned for security issues. So, overall, the platform has moved forward with respect to security and is more than viable.

I’ve also been leveraging Docker in recent years. We’ll definitely be talking about Docker in the coming days/weeks, so I won’t go into it here. Suffice it to say, Docker helps enhance the overall security of the system while simultaneously making it a breeze to deploy new software and keep it up to date.

So, enjoy the new digs, and hopefully more changes will be coming in the near future. WordPress is capable of doing more than just blogging and I’m planning on exploring some of those capabilities a bit more. This is very much a continuing transition, so if you see something that’s off, please leave a comment and I’ll take a look.

Network Enhanced Telepathy

I’ve recently been reading Wired for War by P.W. Singer and one of the concepts he mentions in the book is Network Enhanced Telepathy. This struck me as not only something that sounds incredibly interesting, but something that we’ll probably see hit mainstream in the next 5-10 years.

According to Wikipedia, telepathy is “the purported transmission of information from one person to another without using any of our known sensory channels or physical interaction.“ In other words, you can think *at* someone and communicate. The concept that Singer talks about in the book isn’t quite as “mystical” since it uses technology to perform the heavy lifting. In this case, technology brings fantasy into reality.

Scientists have already developed methods to “read” thoughts from the human mind. These methods are by no means perfect, but they are a start. As we’ve seen with technology across the board from computers to robotics, electric cars to rockets, technological jumps may ramp up slowly, but then they rocket forward at a deafening pace. What seems like a trivial breakthrough at the moment may well lead to the next step in human evolution.

What Singer describes in the book is one step further. If we can read the human mind, and presumably write back to it, then adding a network in-between, allowing communication between minds, is obvious. Thus we have Network Enhanced Telepathy. And, of course, with that comes all of the baggage we associate with networks today. Everything from connectivity issues and lag to security problems.

The security issues associated with something like this range from inconvenient to downright horrifying. If you thought social engineering was bad, wait until we have a direct line straight into someone’s brain. Today, security issues can result in stolen data, denial of service issues, and, in some rare instances, destruction of property. These same issues may exist with this new technology as well.

Stolen data is pretty straightforward. Could an exploit allow an attacker to arbitrarily read data from someone’s mind? How would this work? Could they pinpoint the exact data they want, or would they only have access to the current “thoughts” being transmitted? While access to current thoughts might not be as bad as exact data, it’s still possible this could be used to steal important data such as passwords, secret information, etc. Pinpointing exact data could be absolutely devastating. Imagine, for a moment, what would happen if an attacker was able to pluck your innermost secrets straight out of your mind. Everyone has something to hide, whether that’s a deep dark secret, or maybe just the image of themselves in the bathroom mirror.

I’ve seen social engineering talks wherein the presenter talks about a technique to interrupt a person, mid-thought, and effectively create a buffer overflow of sorts, allowing the social engineer to insert their own directions. Taken to the next level, could an attacker perform a similar attack via a direct link to a person’s mind? If so, what access would the attacker then attain? Could we be looking at the next big thing in brainwashing? Merely insert the new programming, directly into the user.

How about Denial of Service attacks or physical destruction? Could an attacker cause physical damage in their target? Is a connection to the mind enough access to directly modify the cognitive functions of the target? Could an attacker induce something like Locked-In syndrome in a user? What about blocking specific functions, preventing the user from being able to move limbs, or speak? Since the brain performs regulatory control over the body, could an attacker modify the temperature, heart rate, or even induce sensations in their target? These are truly scary scenarios and warrant serious thought and discussion.

Technology is racing ahead at breakneck speeds and the future is an exciting one. These technologies could allow humans to take that next evolutionary step. But as with all technology, we should be looking at it with a critical eye. As technology and biology become more and more intertwined, it is essential that we tread carefully and be sure to address potential problems long before they become a reality.

Suspended Visible Masses of Small Frozen Water Crystals

The Cloud, hailed as a panacea for all your IT related problems. Need storage? Put it in the Cloud. Email? Cloud. Voice? Wireless? Logging? Security? The Cloud is your answer. The Cloud can do it all.

But what does that mean? How is it that all of these problems can be solved by merely signing up for various cloud services? What is the cloud, anyway?

Unfortunately, defining what the cloud actually is remains problematic. It means many things to many people. The cloud can be something “simple” like extra storage space or email. Google, Dropbox, and others offer a service that allows you to store files on their servers, making them available to you from “anywhere” in the world. Anywhere, of course, if the local government and laws allow you to access the services there. These services are often free for a small amount of space.

Google, Microsoft, Yahoo, and many, many others offer email services, many of them “free” for personal use. In this instance, though, free can be tricky. Google, for instance, has algorithms that “read” your email and display advertisements based on the results. So while you may not exchange money for this service, you do exchange a level of privacy.

Cloud can also be pure computing power. Virtual machines running a variety of operating systems, available for the end-user to access and run whatever software they need. Companies like Amazon have turned this into big business, offering a full range of back-end services for cloud-based servers. Databases, storage, raw computing power, it’s all there. In fact, they have developed APIs allowing additional services to be spun up on-demand, augmenting existing services.

As time goes on, more and more services are being added to the cloud model. The temptation to drop self-hosted services and move to the cloud is constantly increasing. The incentives are definitely there. Cloud services are affordable, and there’s no need for additional staff for support. All the benefits with very little of the expense. End-users have access to services they may not have had access to previously, and companies can save money and time by moving services they use to the cloud.

But as with any service, self-hosted or not, there are questions you should be asking. The answers, however, are sometimes a bit hard to get. But even without direct answers, there are some inferences you can make based on what the service is and what data is being transferred.

Data being accessible virtually anywhere, at any time, is one of major draws of cloud services. But there are downsides. What happens when the service is inaccessible? For a self-hosted service, you have control and can spend the necessary time to bring the service back up. In some cases, you may have the ability to access some or all of the data, even without the service being fully restored. When you surrender your data to the cloud, you are at the mercy of the service provider. Not all providers are created equal and you cannot expect uniform performance and availability across all providers. This means that in the event of an outage, you are essentially helpless. Keeping local backups is definitely an option, but oftentimes you’re using the cloud so that you don’t need those local backups.

Speaking of backups, is the cloud service you’re using responsible for backups? Will they guarantee that your data will remain safe? What happens if you accidentally delete a needed file or email? These are important issues that come up quite often for a typical office. What about the other side of the question? If the service is keeping backups, are those backups secure? Is there a way to delete data, permanently, from the service? Accidents happen, so if you’ve uploaded a file containing sensitive information, or sent/received an email with sensitive information, what recourse do you have? Dropbox keeps snapshots of all uploaded data for 30 days, but there doesn’t seem to be an official way to permanently delete a file. There are a number of articles out there claiming that this is possible, just follow the steps they provide, but can you be completely certain that the data is gone?

What about data security? Well, let’s think about the data you’re sending. For an email service, this is a fairly simple answer. Every email goes through that service. In fact, your email is stored on the remote server, and even deleted messages may hang around for a while. So if you’re using email for anything sensitive, the security of that information is mostly out of your control. There’s always the option of using some sort of encryption, but web-based services rarely support that. So data security is definitely an issue, and not necessarily an issue you have any control over. And remember, even the “big guys” make mistakes. Fishnet Security has an excellent list of questions you can ask cloud providers about their security stance.

Liability is an issue as well, though you may not initially realize it. Where, exactly, is your data stored? Do you know? Can you find out? This can be an important issue depending on what your industry is, or what you’re storing. If your data is being stored outside of your home country, it may be subject to the laws and regulations of the country it’s stored in.

There are a lot of aspects to deal with when thinking about cloud services. Before jumping into the fray, do your homework and make sure you’re comfortable with giving up control to a third party. Once you give up control, it may not be that easy to reign it back in.

Looking into the SociaVirtualistic Future

Let’s get this out of the way. One of the primary reasons I’m writing this is in response to a request by John Carmack for coherent commentary about the recent acquisition of Oculus VR by Facebook. My hope is that he does, in fact, read this and maybe drop a comment in response. <fanboy>Hi John!</fanboy> I’ve been a huge Carmack fan since the early ID days, so please excuse the fanboyism.

And I *just* saw the news that Michael Abrash has joined Oculus as well, which is also incredibly exciting. Abrash is an Assembly GOD. <Insert more fanboyism here />

Ok, on to the topic a hand. The Oculus Rift is a VR headset that got its public start with a Kickstarter campaign in September of 2012. It blew away it’s meager goal of $250,000 and raked in almost $2.5 Million. For a mere $275 and some patience, contributors would receive an unassembled prototype of the Oculus Rift. Toss in another $25 and you received an assembled version.

But what is the Oculus Rift? According to the Kickstarter campaign :

Oculus Rift is a new virtual reality (VR) headset designed specifically for video games that will change the way you think about gaming forever. With an incredibly wide field of view, high resolution display, and ultra-low latency head tracking, the Rift provides a truly immersive experience that allows you to step inside your favorite game and explore new worlds like never before.

In short, the Rift is the culmination of every VR lover’s dreams. Put a pair of these puppies on and magic appears before your eyes.

For myself, Rift was interesting, but probably not something I could ever use. Unfortunately, I suffer from Amblyopia, or Lazy Eye as it’s commonly called. I’m told I don’t see 3D. Going to 3D movies pretty much confirms this for me since nothing ever jumps out of the screen. So as cool as VR sounds to me, I would miss out on the 3D aspect. Though it might be possible to “tweak” the headset and adjust the angles a bit to force my eyes to see 3D. I’m not sure if that’s good for my eyes, though.

At any rate, the Rift sounds like an amazing piece of technology. In the past year I’ve watched a number of videos demonstrating the capabilities of the Rift. From the Hak5 crew to Ben Heck, the reviews have all been positive.

And then I learned that John Carmack joined Oculus. I think that was about the time I realized that Oculus was the real deal. John is a visionary in so many different ways. One can argue that modern 3D gaming is largely in part to the work he did in the field. In more recent years, his visions have aimed a bit higher with his rocket company, Armadillo Aerospace. Armadillo started winding down last year, right about the time that John joined Oculus, leaving him plenty of time to deep dive into a new venture.

For anyone paying attention, Oculus was recently acquired by Facebook for a mere $2 Billion. Since the announcement, I’ve seen a lot of hatred being tossed around on Twitter. Some of this hatred seems to be Kickstarter backers who are under some sort of delusion that makes them believe they have a say in anything they back. I see this a lot, especially when a project is taking longer than they believe it should.

I can easily write several blog posts on my personal views about this, but to sum it up quickly, if you back a project, you’re contributing to make something a reality. Sometimes that works, sometimes it doesn’t. But Kickstarter clearly states that you’re merely contributing financial backing, not gaining a stake in a potential product and/or company. Nor are you guaranteed to receive the perks you’ve contributed towards. So suck it up and get over it. You never had control to begin with.

I think Notch, of Minecraft fame, wrote a really good post about his feeling on the subject. I think he has his head right. He contributed, did his part, and though it’s not working out the way he wanted, he’s still willing to wish the venture luck. He may not want to play in that particular sandbox, but that’s his choice.

VR in a social setting is fairly interesting. In his first Oculus blog post, Michael Abrash mentioned reading Neal Stephenson’s incredible novel, Snow Crash. Snow Crash provided me with a view of what virtual reality might bring to daily life. Around the same time, the movie Lawnmower Man was released. Again, VR was brought into the forefront of my mind. But despite the promises of books and movies, VR remained elusive.

More recently, I read a novel by Ernest Cline, Ready Player One. Without giving too much away, the novel centers around a technology called the OASIS. Funnily enough, the OASIS is, effectively, a massive social network that users interact with via VR rigs. OASIS was the first thing I thought about when I heard about the Facebook / Oculus acquisition.

For myself, my concern is Facebook. Despite being a massively popular platform, I think users still distrust Facebook quite a bit. I lasted about 2 weeks on Facebook before having my account deleted. I understand their business model and I have no interest in taking part. Unfortunately, I’m starting to miss out on some aspects of Internet life since some sites are requiring Facebook accounts for access. Ah well, I guess they miss out on me as well.

I have a lot of distrust in Facebook at the moment. They wield an incredible amount of information about users and, to be honest, they’re nowhere near transparent enough for me to believe what they say. Google is slightly better, but there’s some distrust there as well. But more than just the distrust, I’m afraid that Facebook is going to take something amazing and destroy it in a backwards attempt to monetize it. I’m afraid that Facebook is the IOI of this story. (It’s a Ready Player One reference. Go read it, you can thank me later)

Ultimately, I have no stake in this particular game. At least, not yet, anyway. Maybe I’m wrong and Facebook makes all the right moves. Maybe they become a power for good and are able to bring VR to the masses. Maybe people like Carmack and Abrash can protect Oculus and fend off any fumbling attempts Facebook may make at clumsy monetization. I’m not sure how this will play out, only time will tell.

How will we know how things are going? Well, for one, watching his Facebook interacts with this new property will be pretty telling. I think if Facebook is able to sit in the shadows and watch rather than kicking in the front door and taking over, maybe Oculus will have a chance to thrive. Watching what products are ultimately released by Oculus will be another telling aspect. While I fully expect that Oculus will add some sort of Facebook integration into the SDK over time, I’m also hoping that they continue to provide an SDK for standalone applications.

I sincerely wish Carmack, Abrash, and the rest of the Oculus team the best. I think they’re in a position where they can make amazing things happen, and I’m eager to see what comes next.

Keepin’ TCP Alive

I was debugging an odd network issue lately that turned out to have a pretty simple explanation. A client on the network was intermittently experiencing significant delays in accessing the network. Upon closer inspection, it turned out that prior to the delay, the client was being left idle for long periods of time. With this additional information it was pretty easy to identify that there was likely a connection between the client and server that was being torn down for being idle.

So in the end, the cause of the problem itself was pretty simple to identify. The fix, however, is more of a conundrum. The obvious answer is to adjust the timers and prevent the connection from being torn down. But what timers should be adjusted? There are the keepalive timers on the client, the keepalive timers on the server, and the idle teardown timers on the firewall in the middle.

TCP keepalive handling varies between operating systems. If we look at the three major operating systems, Linux, Windows, and OS X, then we can make the blanket statement that, by default, keepalives are sent after two hours of idle time. But, most firewalls seem to have a default TCP teardown timer of one hour. These defaults are not conducive to keeping idle connections alive.

The optimal scenario for timeouts is for the clients to have a keepalive timer that fires at an interval lower than that of the idle tcp timeout on the firewall. The actual values to use, as well as which devices should be changed, is up for debate. The firewall is clearly the easier point at which to make such a change. Typically there are very few firewall devices that would need to be updated as compared to the larger number of client devices. Additionally, there will likely be fewer firewalls added to the network over time, so ensuring that timers are properly set is much easier. On the other hand, the defaults that firewalls are generally configured with have been chosen specifically by the vendor for legitimate reasons. So perhaps the clients should conform to the setting on the firewall? What is the optimal solution?

And why would we want to allow idle connections anyway? After all, if a connection is idle, it’s not being used. Clearly, any application that needed a connection to remain open would send some sort of keepalive, right? Is there a valid reason to allow these sorts of connections for an extended period of time?

As it turns out, there are valid reasons for connections to remain active, but idle. For instance, database connections are often kept for longer periods of time for performance purposes. The TCP handshake can take a considerable amount of time to perform as opposed to the simple matter of retrieving data from a database. So if the database connection remains established, additional data can be retrieved without the overhead of TCP setup. But in these instances, shouldn’t the application ensure that keepalives are sent so that the connection is not prematurely terminated by an idle timer somewhere along the data path? Well, yes. Sort of. Allow me to explain.

When I first discovered the source of the network problem we were seeing, I chalked it up to lazy programming. While it shouldn’t take much to add a simple keepalive system to a networked application, it is extra work. As it turns out, however, the answer isn’t quite that simple. All three major operating systems, Windows, Linux, and OS X, all have kernel level mechanisms for TCP keepalives. Each OS has a slightly different take on how keepalive timers should work.

Linux has three parameters related to tcp keepalives :

The interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any further
The interval between subsequential keepalive probes, regardless of what the connection has exchanged in the meantime
The number of unacknowledged probes to send before considering the connection dead and notifying the application layer

OS X works quite similar to Linux, which makes sense since they’re both *nix variants. OS X has four parameters that can be set.

Amount of time, in milliseconds, that the connection must be idle before keepalive probes (if enabled) are sent. The default is 7200000 msec (2 hours).
The interval, in milliseconds, between keepalive probes sent to remote machines, when no response is received on a keepidle probe. The default is 75000 msec.
Number of probes sent, with no response, before a connection is dropped. The default is 8 packets.
Assume that SO_KEEPALIVE is set on all TCP connections, the kernel will periodically send a packet to the remote host to verify the connection is still up.

Windows acts very differently from Linux and OS X. Again, there are three parameters, but they perform entirely different tasks. All three parameters are registry entries.

This parameter determines the interval between TCP keep-alive retransmissions until a response is received. Once a response is received, the delay until the next keep-alive transmission is again controlled by the value of KeepAliveTime. The connection is aborted after the number of retransmissions specified by TcpMaxDataRetransmissions have gone unanswered.
The parameter controls how often TCP attempts to verify that an idle connection is still intact by sending a keep-alive packet. If the remote system is still reachable and functioning, it acknowledges the keep-alive transmission. Keep-alive packets are not sent by default. This feature may be enabled on a connection by an application.
This parameter controls the number of times that TCP retransmits an individual data segment (not connection request segments) before aborting the connection. The retransmission time-out is doubled with each successive retransmission on a connection. It is reset when responses resume. The Retransmission Timeout (RTO) value is dynamically adjusted, using the historical measured round-trip time (Smoothed Round Trip Time) on each connection. The starting RTO on a new connection is controlled by the TcpInitialRtt registry value.

There’s a pretty good reference page with information on how to set these parameters that can be found here.

We still haven’t answered the question of optimal settings. Unfortunately, there doesn’t seem to be a correct answer. The defaults provided by most firewall vendors seem to have been chosen to ensure that the firewall does not run out of resources. Each connection through the firewall must be tracked. As a result, each connection uses up a portion of memory and CPU. Since both memory and CPU are finite resources, administrators must be careful not to exceed the limits of the firewall platform.

There is some good news. Firewalls have had a one hour tcp timeout timer for quite a while. As time has passed and new revisions of firewall hardware are released, the CPU has become more powerful and the amount of memory in each system has grown. The default one hour timer, however, has remained in place. This means that modern firewall platforms are much better prepared to handle an increase in the number of connections tracked. Ultimately, the firewall platform must be monitored and appropriate action taken if resource usage becomes excessive.

My recommendation would be to start by setting the firewall tcp teardown timer to a value slightly higher than that of the clients. For most networks, this would be slightly over two hours. The firewall administrator should monitor the number of connections tracked on the firewall as well as the resources used by the firewall. Adjustments should be made as necessary.

If longer lasting idle connections are unacceptable, then a slightly different tactic can be used. The firewall teardown timer can be set to a level comfortable to the administrator of the network. Problematic clients can be updated to send keepalive packets at a shorter interval. These changes will likely only be necessary on servers. Desktop systems don’t have the same need as servers for long-term establishment of idle connections.

SSL “Security”

SSL, a cryptographically secure protocol, was created by Netscape in the mid-1990’s. Today, SSL, and it’s replacement, TLS, are used by web browsers and other programs to create secure connections between devices across the Internet.

SSL provides the means to cryptographically secure a tunnel between endpoints, but there is another aspect of security that is missing. Trust. While a user may be confident that the data received from the other end of the SSL tunnel was sent by the remote system, the user can not be confident that the remote system is the system it claims to be. This problem was partially solved through the use of a Public Key Infrastructure, or PKI.

PKI, in a nutshell, provides the trust structure needed to make SSL secure. Certificates are issued by a certificate authority or CA. The CA cryptographically signs the certificate, enabling anyone to verify that the certificate was issued by the CA. Other PKI constructs offer validation of the registrant, indexing of the public keys, and a key revocation system. It is within these other constructs that the problems begin.

When SSL certificates were first offered for sale, the CAs spent a great deal of time and energy verifying the identity of the registrant. Often, paper copies of the proof had to be sent to the CA before a certificate would be issued. The process could take several days. More recently, the bar for entry has been lowered significantly. Certificates are now issued on an automated process requiring only that the registrant click on a link sent to one of the email addresses listed in the Whois information. This lack of thorough verification has significantly eroded the trust a user can place in the authenticity of a certificate.

CAs have responded to this problem by offering different levels of SSL certificates. Entry level certificates are verified automatically via the click of a link. Higher level SSL certificates have additional identity verification steps. And at the highest level, the Extended Validation, or EV certificate requires a thorough verification of the registrants identity. Often, these different levels of SSL certificates are marketed as stronger levels of encryption. The reality, however, is that the level of encryption for each of these certificates is exactly the same. The only difference is the amount of verification performed by the CA.

Despite the extra level of verification, these certificates are almost indistinguishable from one another. With the exception of EV certificates, the only noticeable difference between differing levels of SSL certificates are the identity details obtained before the certificate is issued. An EV certificate, on the other hand, can only be obtained from certain vendors, and shows up in a web browser with a special green overlay. The intent here seems to be that websites with EV certificates can be trusted more because the identity of the organization running the website was more thoroughly validated.

In the end, though, trust is the ultimate issue. Users have been trained to just trust a website with an SSL certificate. And trust sites with EV certificates even more. In fact, there have been a number of marketing campaigns targeted at convincing users that the “Green Address Bar” means that the website is completely trustworthy. And they’ve been pretty effective. But, as with most marketing, they didn’t quite tell the truth. sure, the EV certificate may mean that the site is more trustworthy, but it’s still possible that the certificate is fake.

There have been a number of well known CAs that have been compromised in recent years. Diginotar and Comodo being two of the more high profile ones. In both cases, it became possible for rogue certificates to be created for any website the attacker wanted to hijack. That certificate plus some creative DNS poisoning and the attacker suddenly looks like your bank, or google, or whatever site the attacker wants to be. And, they’ll have a nice shiny green EV certificate.

So how do we fix this? Well, one way would be to use the certificate revocation system that already exists within the PKI infrastructure. If a certificate is stolen, or a false certificate is created, the CA has the ability to put the signature for that certificate into the revocation system. When a user tries to load a site with a bad certificate, a warning is displayed telling the user that the certificate is not to be trusted.

Checking revocation of a certificate takes time, and what happens if the revocation server is down? Should the browser let the user go to the site anyway? Or should it block by default? The more secure option is to block, of course, but most users won’t understand what’s going on. So most browser manufacturers have either disabled revocation checking completely, or they default to allowing a user to access the site when the revocation site is slow or unavailable.

Without the ability to verify if a certificate is valid or not, there can be no real trust in the security of the connection, and that’s a problem. Perhaps one way to fix this problem is to disconnect the revocation process from the process of loading the webpage. If the revocation check happened in parallel to the page loading, it shouldn’t interfere with the speed of the page load. Additional controls can be put into place to prevent any data from being sent to the remote site without a warning until the revocation check completes. In this manner, the revocation check can take a few seconds to complete without impeding the use of the site. And after the first page load, the revocation information is cached anyway, so subsequent page loads are unaffected.

Another option, floated by the browser builders themselves, is to have the browser vendors host the revocation information. This information is then passed on to the browsers when they’re loaded. This way the revocation process can be handled outside of the CAs, handling situations such as those caused by a CA being compromised. Another idea would be to use short term certificates that expire quickly, dropping the need for revocation checks entirely.

It’s unclear as to what direction the market will move with this issue. It has been over two years since the attacks on Diginotar and Comodo and the immediacy of this problem seems to have passed. At the moment, the only real fix for this is user education. But with the marketing departments for SSL vendors working to convince users of the security of SSL, this seems unlikely.