This post first appeared on Redhat’s Enable Sysadmin community. You can find the post here.
Firewalls have been around in one form or another since the beginning of networking. The first firewalls weren’t even identified as firewalls. They were nothing more than physical barriers between networks. It wasn’t until the 1980’s that the first device specifically designed to be, and named, a firewall was developed by DEC. Since then, firewalls have evolved into a myriad of forms.
But what is a firewall? At its core, a firewall is a devices designed to allow or deny traffic based on a set of rules. Those rules can be as simple as “allow http and block everything else” or can be infinitely more complex including protocols, ports, addresses, and even application fingerprinting. Some modern firewalls have even incorporated machine learning into the mix.
Like other technologies, as firewalls have evolved, some niche uses have been identified. Web Application Firewalls (WAFs) are one of those niche uses. A WAF is a firewall specifically designed to handle “web” traffic. That is, traffic using the HTTP protocol. Generally speaking, the role of a WAF is to inspect all HTTP traffic destined for a web server, discard “bad” requests, and pass “good” traffic on. The details of how this works are, as you might suspect, a bit more complicated.
Much like “normal” firewalls, a WAF is expected to block certain types of traffic. To do this, you have to provide the WAF with a list of what to block. As a result, early WAF products are very similar to other products such as anti-virus software, IDS/IPS products, and others. This is what is known as signature-based detection. Signatures typically identify a specific characteristic of an HTTP packet that you want to allow or deny.
For instance, WAFs are often used to block SQL Injection attacks. A very simplistic signature may just look for key identifying elements of a typical SQL Injection attack. For instance, it may look for something like ' AND 1=1
included as part of the GET or POST request. If this matches an incoming packet, the WAF marks this as bad and discards it.
Signatures work pretty well but require a lot of maintenance to ensure that false positives are kept to a minimum. Additionally, writing signatures is often more of an art form rather than a straight-forward programming task. And signature writing can be quite complicated as well. You’re often trying to match a general attack pattern without also matching legitimate traffic. To be blunt, this can be pretty nerveracking.
To illustrate this a bit more, let’s look at ModSecurity. The ModSecurity project is an open source WAF project. It started out as a module for the Apache web server but has since evolved into a modular package that works with IIS, nginx, and others. ModSecurity is a signature-based WAF and often ships with a default set of signatures known as the OWASP ModSecurity Core Rule Set.
The Core Rule Set (CRS) is an excellent starting point for deploying a signature-based WAF. It includes signatures for all of the OWASP Top Ten web application security risks as well as a wide variety of other attacks. The developers have done their best to ensure that the CRS has few false alerts but, inevitably, anyone deploying the CRS will need to tweak the rules. This involves learning the rules language and having a deep understanding of the HTTP protocol.
Technology evolves, however, and newer WAF providers are using other approaches to block bad traffic. There has been a pretty widespread move from static configuration approaches such as allow and block lists to more dynamic methods involving APIs and machine learning. This move has been across multiple technologies including traditional firewalls, anti-virus software, and, you guessed it, WAFs.
In the brave new world of dynamic rulesets, WAFs use more intelligent approaches to identifying good and bad traffic. One of the “easier” methods employed is to put the WAF in “learning” mode so it can monitor the traffic flowing to and from the protected web server. The objective here is to “train” the WAF to identify what good traffic looks like. This may include traffic that matches patterns labeled as bad when signatures were used. Once the WAF has been trained, it’s moved to enforcement mode.
Training a WAF like this is similar to what happens when you train an email system to identify spam. Email systems often use a Bayesian filtering algorithm to identify spam. These algorithms work relatively well but can be poisoned to allow spam. Similar issues exist with algorithms used by WAF providers, especially when the WAF is in the learning mode.
More advanced WAF providers are using proprietary techniques to allow and block traffic. These techniques include algorithms that can identify whether certain attacks will work against the target system and only blocking those that would be harmful. Advanced techniques like this, however, are typically only found in WAF SaaS providers and not in self-contained WAF appliances.
WAFs, and firewalls in general, have evolved a lot over the years, moving from static to dynamic methods for identifying and blocking traffic. These techniques will only get better in the future. There are a variety of solutions available from open-source to commercial providers. No matter what your needs, there’s a WAF out there for you.