Bandwidth in the 21st Century

As the Internet has evolved, the one constant has been the typical Internet user.  Typical users used the Internet to browse websites, a relatively low-bandwidth activity.  Even as the capabilities of the average website evolved, bandwidth usage remained relatively low, increasing at a slow rate.

In my own experience, a typical Internet user, accessing the Internet via DSL or cable, only uses a very small portion of the available bandwidth.  Bandwidth is only consumed for the few moments it takes to load a web page, and then usage falls to zero.  The only real difference was the online gamer.  Online gamers use a consistent amount of bandwidth for long periods of time, but the total bandwidth used at any given moment is still relatively low, much lower than the available bandwidth.

Times are changing, however.  In the past few years, peer-to-peer applications such as Napster, BitTorrent, Kazaa, and others have become more mainstream, seeing widespread usage across the Internet.  Peer-to-peer applications are used to distribute files, both legal and illegal, amongst users across the Internet.  Files range in size from small music files to large video files.  Modern applications such as video games and even operating systems have incorporated peer-to-peer technology to facilitate rapid deployment of software patches and updates.

Voice and video applications are also becoming more mainstream.  Software applications such as Joost, Veoh, and Youtube allow video streaming over the Internet to the user’s PC.  Skype allows the user to make phone calls via their computer for little or no cost.  Each of these applications uses bandwidth at a constant rate, vastly different from that of web browsing.

Hardware devices such as the XBox 360, AppleTV, and others are helping to bring streaming Internet video to regular televisions within the home.  The average user is starting to take advantage of these capabilities, consuming larger amounts of bandwidth, for extended periods of time.

The end result of all of this is increased bandwidth within the provider network.  Unfortunately, most providers have based their current network architectures on outdated over-subscription models, expecting users to continue their web-browsing patterns.  As a result, many providers are scrambling to keep up with the increased bandwidth demand.  At the same time, they continue releasing new access packages claiming faster and faster speeds.

Some providers are using questionable practices to ensure the health of their network.  For instance, Comcast is allegedly using packet sniffing techniques to identify BitTorrent traffic.  Once identified, they send a reset command to the local BitTorrent client, effectively severing the connection and canceling any file transfers.  This has caught the attention of the FCC who has released a statement that they will step in if necessary.

Other providers, such as Time Warner, are looking into tiered pricing for Internet access.  Such plans would allow the provider to charge extra for users that exceed a pre-set limit.  In other words, Internet access becomes more than the typical 3/6/9 Mbps access advertised today.  Instead, the high speed access is offset by a total transfer limit.  Hopefully these limits will be both reasonable and clearly defined.  Ultimately, though, it becomes the responsibility of the user to avoid exceeding the limit, similar to that of exceeding the minutes on a cell phone.

Pre-set limits have problems as well, though.  For instance, Windows will check for updates at a regular interval, using Internet bandwidth to do so.  Granted, this is generally a small amount, but it adds up over time.  Another example is PPPoE and DHCP traffic.  Most DSL customers are configured using PPPoE for authentication.  PPPoE sends keep-alive packets to the BRAS to ensure that the connection stays up.  Depending on how the ISP calculates bandwidth usage, these packets will likely be included in the calculation, resulting in “lost” bandwidth.  Likewise, DHCP traffic, used mostly by cable subscribers, will send periodic requests to the DHCP server.  Again, this traffic will likely be included in any bandwidth calculations.

In the end, it seems that substantial changes to the ISP structure are coming, but it is unclear what those changes may be.  Tiered bandwidth usage may be making a comeback, though I suspect that consumers will fight against it.  Advances in transport technology make increasing bandwidth a simple matter of replacing aging hardware.  Of course, replacements cost money.  So, in the end, the cost may fall back on the consumer, whether they like it or not.

Troubleshooting 101

There seems to be a severe lack of understanding and technique when it comes to troubleshooting these days. It seems to me that a large amount of troubleshooting effort is completely wasted on wild ideas and theories while the simplest and most direct solutions are ignored.

Occam’s Razor states: “entities should not be multiplied beyond necessity.” Simply put, the easiest solution is often the best. This is the perfect mindset for anyone who does troubleshooting. There is no need to delve right into the most obscure reasons for a failure, start with the simple stuff.

For instance, questions like “Is the unit plugged in?”, or “Is the power on?” are perfect questions to start with. While it would be wonderful to believe that everyone you encounter has the common sense to check out these simple solutions, you’ll find that, unfortunately, the majority of the population isn’t that bright.

So, how about a real-world example. It’s 2am and you get paged that a router has gone unreachable. After notifying the proper people, you delve into the problem. Using the Occam’s Razor principle, what’s the first thing you should check? Well, for starters, let’s make sure the router really is unreachable. A simple ping should accomplish that. And just for good measure, ping something close to that router just to make sure you’re connected to the network.

Ok, so the router isn’t pingable, now what? Well, let’s look at the next easiest step, power. Since the router is in a remote location, this isn’t easy to check. However, you can check the uplink on the router. You should be able to get to the router just before the one that’s unreachable. Once there, check the interface that feeds your troubled router. Is it up or down? While you’re there, you can check for traffic and errors as well, but don’t focus on these yet, store them for later.

If the interface is down, then it’s quite possibly a physical line issue or, possibly, power. Just for good measure, I would suggest bouncing the interface to see if it’s something temporary. Sometimes, the interface will come back up and start running errors, indicating a physical line issue. What will often happen is that the interface comes back up and starts running errors, but allows limited traffic to get through. Once the error threshold is passed, the line goes back down. At this point, I’d call a technical to look at the physical line itself.

If the interface is up, try pinging the troubled router from the directly connected router. This process can help identify a routing issue in the network. Directly connected interfaces are considered to be the most specific route unless specifically overridden, which isn’t likely. If the ping is successful, take note of the ping time. If it seems overly high, you may be looking at a traffic issue. Depending on the type of router, traffic may be processor switched and cause high CPU usage. This can be identified by a sluggish interface and high ping times. Notes, high ping times don’t always indicate this. Most routers set a very low priority for ICMP traffic destined for the CPU, deeming throughput more important.

Remember the traffic and error counts you looked at previously? Those come into play now. If the traffic on the interface is very high, notably higher than usual, then this is likely the cause of the problem. Or, rather, an effect of the actual cause which may be a DoS attack or Virus outbreak. DoS, or Denial of Service, attacks are targeted attacks against a specific IP or range of IPs. A side effect of these attacks is that interfaces between the attacker and victim are often overloaded.

There are a number of different DoS attacks out there, but often when you see traffic as the cause of the DoS, you’ll notice that small packets are being used. One way to quickly identify this is to take the current bps on the interface, divide it by the packets per second, and then by 8 to get bytes per packet. Generally speaking, a normal interface ranges average packet size between 1000 and 1500 bytes. NOTE : This is referring to traffic received from a remote source such as a web site. Outgoing traffic, to the website, has a much lower average packet size because these packets generally contain control information such as acknowledgements, ICMP, etc.

Once you’ve identified that there is a traffic issue, the next step is to identify where the traffic is sourced from, or destined to. Remember, the end-goal here is to repair the problem so that normal operations can continue. Since you’re already aware of the overloaded interface, it’s easiest to concentrate your efforts there. Identifying the traffic source and destination is usually pretty easy, provided it’s not a distributed attack. On a Cisco router, you can try the “IP accounting” command. This command will show the source and destination for all output packets on an interface. Included is a count of the number of packets and the bits used by those packets. Simply look for rapidly increasing source and destination pairs and you’ll likely find your culprit.

Another option is to use an access list. If the router can handle it, place an access list on the interface that passes all traffic, but logs each packet. Then you can watch the log and try to identify large sources of traffic. Refine the access list to block that traffic until you’ve halted the attack. Be careful, however, as many routers will processor switch the traffic when an access-list is applied. This may cause a spike in CPU usage, sometimes causing a loss of connectivity to the router. If IP accounting is available, use that instead.

Once you identify the source and/or target of the attack, craft an appropriate access list to block the traffic as far upstream as you can. If the DoS attack is distributed, then the most effective means to stop the attack is probably to remove the targeted routes from the routing table and allow it to be blocked at the edges. This will likely result in an outage for that specific customer, but with a distributed attack, that’s often the only solution. From there you can work with your upstream providers to track down the perpetrator of the attack and take it offline permanently.

The preceding seems a bit long when written down, but in reality, this is a 15-30 minute process. Experienced troubleshooters can identify and resolve these problems even quicker. The point, of course, is to identify the most likely causes in the quickest manner possible. Often times, the simplest solution is the correct solution. Take the extra few seconds to check out the obvious before moving on to the more advanced. Often, you’ll resolve the solution quicker and sometimes wind up with a funny story as a bonus!

Please, troubleshoot responsibly.

Network Graphing

Visual representations of data can provide additional insight into the inner workings of your network. Merely knowing that one of your main feeds is peaking at 80% utilization isn’t very helpful when you don’t know how long the peak is, at what time, and when it started.

There are a number of graphing solutions available. Some of these are extremely simplistic and don’t do much, while others are overly powerful and provide almost too much. I prefer using Cacti for my graphing needs.

Cacti is a web-based graphing solution built on top of RRDtool. RRDtool is a round-robin data logging and graphing tool developed by Tobias Oetiker of MRTG fame, MRTG being one of the original graphing systems.

Chock full of features, Cacti allows data collection from almost anywhere. It supports SNMP and script-based collection by default, but additional methods can easily be added. Graphs are fully configurable and can display just about any information you want. You can combine multiple sources on a single graph, or create multiple graphs for better resolution. Devices, once added, can be arranged into a variety of hierarchies allowing multiple views for various users. Security features allow the administrator to tailor the data shown to each user.

Cacti is a wonderful tool to have and is invaluable when it comes to tracking down problems with the network. The ability to graph anything that spits out data makes it incredibly useful. For instance, you can create graphs to show you the temperature of equipment, utilization of CPUs, even the number of emails being sent per minute! The possibilities are seemingly endless.

There is a slight learning curve, however. Initial setup is pretty simple, and adding devices is straightforward. The tough part is understanding how Cacti gathers data and relates it all together. There are some really good tutorials on their documentation site that can help you through this part.

Overall, I think Cacti is one of the best graphing tools out there. The graphs come out very professional looking, and the feature set is amazing. Definitely worth looking into.

Host Intrusion Detection

Monitoring your network includes trying to keep the bad guys out. Unfortunately, unless you disconnect your computer and keep it in a locked vault, there’s no real way to ensure that your system is 100% hack proof. So, in addition to securing your network, you need to monitor for intrusions as well. It’s better to be able to catch an intruder early rather than find out after they’ve done a huge amount of damage.

Intrusion detection systems (IDS) are designed to detect possible intrusion attempts. There are a number of different IDS types, but this post concentrates on the Host Intrusion Detection System (HIDS).

My preferred HIDS of choice is Osiris. Osiris uses a client/server architecture, making it one of the more unique HIDS out there. The server stores all of the configurations and databases, and triggers the scanning process. SSL is used between the client and server to ensure communication integrity.

Once a new client is added, the server performs an initial scan. A configuration file is pushed to the client which then scans the computer accordingly, reporting the results back to the server. This first scan is then used as a baseline database for future comparisons.

The host periodically polls the clients and requests scans. The results of those scans are compared to the baseline database and an alert is sent if there are differences. An administrator can then determine if the changes were authorized and take appropriate action. If the changes are ok, Osiris is updated to use the new results as the baseline database. If the changes are suspect, the administrator can look further into them.

Osiris is very configurable. Scanning intervals can be set, allowing you fine-grained control over the time between scans. Multiple administrators can be set up to monitor and accept changes. Emails can be sent for each and every scan, regardless of changes.

The configuration file allows you to pick and choose what files on the client system are to be monitored. Fine-grain control over this allows the administrator to specify whole directories, or individual files. A filtering system can prevent erroneous results to be sent. For instance, some backup systems change the ctime to reflect when the file was last backed up. Without a filter, Osiris would report changes to all of the files each time a backup is run. Setting up a simple filter to ignore ctime on a file allows the administrator to ignore the backup process.

Overall, Osiris is a great tool for monitoring your server. Be prepared, though, monitoring HIDS can get cumbersome, especially with a large number of servers. Every update, change, or new program installed can trigger a HIDS alert.

There are other HIDS packages as well. I have not tested most of these, but they are included for completeness :

  • OSSEC
  • OSSEC is an actively maintained HIDS that supports log analysis, integrity checking, rootkit detection, and more.
  • AFICK
  • AFICK is another actively maintained HIDS that offers both CLI and GUI based operation
  • Samhain
  • Samhain is one of the more popular HIDS that offers a centralized monitoring system similar to that of Osiris.
  • Tripwire
  • Tripwire is a commercial HIDS that allows monitoring of configurations, files, databases and more. Tripwire is quite sophisticated and is mostly intended for large enterprises.
  • Aide
  • Aide is an open-source HIDS that models itself after Tripwire

Network Monitoring

I’ve been working a lot with network monitoring lately.  While mostly dealing with utilization monitoring, I do dabble with general network health systems as well.

There are several ways to monitor a network and determine the “health” of a given element.  The simple, classic example is the ICMP echo request.  Simply ping the device and if it responds, it’s alive and well.

This doesn’t always work out, however.  Take, for instance, a server.  Pinging the server simply indicates that the TCP/IP stack on the server is functioning properly.  But what about the processes running on the server?  How do you make sure those are running properly?

Other “health” related items are utilization, system integrity, and environment.  When designing and/or implementing a network health system, you need to take all of these items into account.

 

I have used several different tools to monitor the health of the networks I’ve dealt with.  These tools range from custom written tools to off-the-shelf products.  Perhaps at some point in the future I can release the custom tools, but for now I’ll focus on the freely available tools.

 

For general network monitoring I use a tool called Argus.  Argus is a pretty robust monitoring system written in Perl.  It’s pretty simple to set up and the config file is pretty self explanatory.  Monitoring capabilities include ping (using fping), SNMP, http, and DNS.  You can monitor specific ports on a device, allowing you to determine the health of a particular service.

Argus also has some unique capabilities that I haven’t seen in many other monitoring systems.  For instance, you can monitor a web page and detect when specific strings within that webpage change.  This is perfect for monitoring software revisions and being alerted to new releases.  Other options include monitoring of databases via the Perl DBI module.

The program can alert you in a number of different manners such as email or paging (using qpage).  Additional notification methods are certainly possible with custom code.

The program provides a web interface similar to that older versions of What’s Up Gold.  There is a fairly robust access control system that allows the administrator to lock users into specific sections of the interface with custom lists of available elements.

Elements can be configured with dependencies, allowing alerts to be suppressed for child elements.  Each element can also be independently configured with a variety of options to allow or suppress alerts, modify monitoring cycle times, send custom alert messages, and more.  Check out the documentation for more information.  There’s also an active mailing list to help you out if you have additional questions.

 

In future posts I’ll touch on some of the other tools I have in my personal toolkit such as host intrusion detection systems, graphing systems, and more.  Stay tuned!

Whois Query Fun

network

I ran across a really neat way to use the whois tool in Linux the other day. There is apparently a lot more information available than I knew about! Check out the full article for more.

Basically, in addition to the normal owner/tech contact data that you can get from the standard whois servers, and the IP block assignment information you can get from ARIN, there’s also some additional IP information you can get from Cymru. Specifically, you can run queries against ‘whois.cymru.com’ to determine what ISP hosts/owns the netblock. Check it out :

[user@localhost ~]$ whois -h whois.cymru.com 204.10.167.1

[Querying whois.cymru.com]
[whois.cymru.com]
AS | IP | AS Name

33241 | 204.10.167.1 | EMCS-AS – Endless Mountain Cyb

In addition to that, you can also check another server, ‘v4-peer.whois.cymru.com’ to check for upstream peers. Extremely useful for determining how “connected” a provider is when you’re looking for new service. Or, for determining what providers you need to talk to for help in blocking possible attacks. Check it out :

[user@localhost ~]$ whois -h v4-peer.whois.cymru.com 204.10.167.1


[Querying v4-peer.whois.cymru.com]
[v4-peer.whois.cymru.com]
PEER_AS | IP | AS Name
3593 | 204.10.167.1 | EPIX – EPIX
3737 | 204.10.167.1 | PTD-AS – PenTeleData Inc.

Overall, I find this to be quite useful and I’ll definitely be using it! I hope you find it just as useful…