This blog has moved! Please update your
bookmarks to http://www.blog.modsecurity.org.

« Detecting Credit Card Numbers in Network Traffic | Main | Speaking About ModSecurity at ApacheCon Europe 2008 »

Set-based Pattern Matching Example

ModSecurity 2.5 introduces two new operators (@pm and @pmFromFile) which implement set-based pattern matching by using the Aho-Corasick algorithm. Set-based matching is much quicker then using Regular Expressions. For those users who are concerned with performance (meaning trying to limit latency from a legitimate client's perspective) then set-based pattern matching is a great enhancement. If rules are written properly, you can achieve the same level of security by using these new operators while simultaneously decreasing the time it takes to complete the check.

The key is to make sure that the set-based patterns (plain text strings) are critial to the success of the attack. So, when performing technical vulnerability research, you must first search for all of the necessary conditions for an attack to succeed. You then start by sending attacks that triggers the vulnerability remotely. The attack should be used to vary all the “interesting-looking” parts of the attack. Changes are made one at a time, in steps, keeping careful notes. (Strings, flags, length values, banners, version numbers, character encoding, white space… the list goes on. All are good things to vary.) If the attack succeeds even when a particular variable is set to a random value, that variable is not important for the signature or rule creation. Eventually you can identify the complete set of variables that are important to the attack’s success, and arrive at a set of criteria that must be collectively satisfied for any attack to succeed. If there are multiple distinct attack vectors, you must perform this analysis on each one separately.

Given a set of criteria that must be satisfied for an attack to succeed, it is possible to describe rule logic that has virtually zero false negatives. That is, an attack simply cannot succeed unless the HTTP request has exactly the characteristics that the rule is looking for. Once you have identified these necessary components, they can then be used as the input strings to the set-based matching operators.

While the set-based matching is very fast, you will still be missing some logic to be able to validate the attack. It is for this reason that a good approach is to combine set-based matching with regular expression rules by chaining the indivudual rules together. Essentially, the 1st part of the chained rule uses the set-based matching operator to run as a pre-qualifier to very quickly check to see if the transaction data has a high likelihood of matching. If the set-based matching portion matches, then th 2nd part of the chained rule (which uses the standard regular expression strings) is executed. The end result to this configuration is that for normal, non-malicious users, the latency for running all of the ModSecurity inspection rules will be decreased.

Let's take a look at this Blind SQL Injection rule from the Core Rules -

SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Referer
"(?:\b(?:(?:s(?:ys\.(?:user_(?:(?:t(?:ab(?:_column|le)|rigger)|object|view)s|c(?:onstraints
|atalog))|all_tables|tab)|elect\b.{0,40}\b(?:substring|ascii|user))|m(?:sys(?:(?:queri|ac)e|
relationship|column|object)s|ysql\.user)|c(?:onstraint_type|harindex)|waitfor\b\W*?\bdelay|
attnotnull)\b|(?:locate|instr)\W+\()|\@\@spid\b)" \
"capture,t:htmlEntityDecode,t:lowercase,t:replaceComments,ctl:auditLogParts=+E,log,auditlog,
msg:'Blind SQL Injection Attack. Matched signature <%{TX.0}>',id:'950007',severity:'2'"

We can now update this rule to become a chained rule and use the @pm operator to run some pre-qualifier checks -

SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Referer 
"@pm sys.user_triggers sys.user_objects @@spid msysaces instr sys.user_views sys.tab 
charindex sys.user_catalog constraint_type locate select msysobjects attnotnull sys.user_tables
sys.user_tab_columns sys.user_constraints mysql.user sys.all_tables msysrelationships 
msyscolumns msysqueries" \
"chain,t:htmlEntityDecode,t:lowercase,t:replaceComments,ctl:auditLogParts=+E,log,auditlog,
msg:'Blind SQL Injection Attack. Matched signature <%{TX.0}>',id:'950007',severity:'2'"
SecRule REQUEST_FILENAME|ARGS|ARGS_NAMES|REQUEST_HEADERS|XML:/*|!REQUEST_HEADERS:Referer 
"(?:\b(?:(?:s(?:ys\.(?:user_(?:(?:t(?:ab(?:_column|le)|rigger)|object|view)s|c(?:onstraints
|atalog))|all_tables|tab)|elect\b.{0,40}\b(?:substring|ascii|user))|m(?:sys(?:(?:queri|ac)e|
relationship|column|object)s|ysql\.user)|c(?:onstraint_type|harindex)|attnotnull)\b|(?:locate|
instr)\W+\()|\@\@spid\b)" "capture,t:htmlEntityDecode,t:lowercase,t:replaceComments"

Now, let's test out the new rules to see what the processing time is for each of these rules if the request is normal. First let's look at what the time is for the normal Core Rule -

Executing operator "rx" with param "(?:\\b(?:(?:s(?:ys\\.(?:user_(?:(?:t(?:ab(?:_column|le)|rigger)|obj
ect|view)s|c(?:onstraints|atalog))|all_tables|tab)|elect\\b.{0,40}\\b(?:substring|ascii|user))|m(?:sys(
?:(?:queri|ac)e|relationship|column|object)s|ysql\\.user)|c(?:onstraint_type|harindex)|waitfor\\b\\W*?\
\bdelay|attnotnull)\\b|(?:locate|instr)\\W+\\()|\\@\\@spid\\b)" against ARGS:LoginEmail.
Target value: "aaa"
Operator completed in 14 usec.

Notice that it took approximately 14 usec for this optimized regular expression rule to run. Now, let's contrast this with the same rule running with the @pm operator -

Executing operator "pm" with param "sys.user_triggers sys.user_objects @@spid msysaces instr sys.user_v
iews sys.tab charindex sys.user_catalog constraint_type locate select msysobjects attnotnull sys.user_t
ables sys.user_tab_columns sys.user_constraints mysql.user sys.all_tables msysrelationships msyscolumns
 msysqueries" against ARGS:LoginEmail.
Target value: "aaa"
Operator completed in 9 usec.

As you can see, the processing time was decreased down to just 9 usec! This may not seem like much, however keep in mind that this is just for one rule. The overall effect of using the set-based pattern matching operators will become apparent when you are using a larger number of rules. Keep an eye out for updates to the Core Rules as they will be changing in the future to better leverage these new operators.

Posted by rcbarnett at January 2, 2008 09:41 PM