Detecting Credit Card Numbers in Network Traffic
Posted by ofer on January 01, 2008.
1. Introduction
The Payment Card Industry Data Security Standard (PCI-DSS for short) requires that credit card numbers are not transmitted in clear and are not presented to users unmasked. Naturally a network monitoring systems such as an IDS or an IPS seems like a natural enforcement system to ensure that such information is not sent against the regulation over a network but a closer examination shows that a correct implementation is far from trivial. This writeup discusses several aspects of implementing a network monitoring system to detect leakage of credit card numbers:
- Matching a credit card number sequence
- Handling false positives using exceptions
- Additional considerations, including evasion, logging, performance and other sensitive patterns.
2. Matching a Credit Card Number
2.1 Matching a Credit Card Number Sequence
A credit card number includes 13 to 16 digits. In addition, real world presentation of a credit card number often include delimiters such as dashes or spaces, usually in specific positions. The following regular expression can be used to match credit card number sequences:
\d{4}[\- ]?\d{4}[\- ]?\d{2}[\- ]?\d{2}[\- ]?\d{1,4}
2.2 Boundaries
For long sequences of digits, which are common in network traffic, the above regular expression would match multiple sequences of the desired length. In order to avoid that, we need to define the sequence delimiters. What can or cannot be a valid delimiter might vary according to the application. Not requiring any delimiter would generate many false positives while requiring delimiters might lead to false negatives. For example, should we allow a leading "0"?
A reasonable choice for a delimiter would be any non-digit character. The resulting regular expression is:
(?<!\d)\d{4}[\- ]?\d{4}[\- ]?\d{2}[\- ]?\d{2}[\- ]?\d{1,4}(?!\d)
or if a regular expression engine does not support look-ahead and look-behind searches:
(?:^|[^\d])(\d{4}[\- ]?\d{4}[\- ]?\d{2}[\- ]?\d{2}[\- ]?\d{1,4})(?:[^\d]|$)
2.3 Validate the number against the LUHN checksum algorithm
However sequences of 13 to 16 are not always credit card numbers. There are many other long numbers in typical network traffic. For example, we often find that identification numbers such as product IDs used in online stores are also 13-16 digit numbers. Luckily, a credit card number has to conform to the LUHN checksum function. A monitoring system can implement this algorithm and check that each sequence of digits detected is a valid credit card number.
Is this enough to avoid false positives? The LUHN function is a checksum function that generates an additional digit for each number and therefore it matches 1 out of 10 consecutive numbers. Since in most cases applications use numbers of this length as identification numbers, the applications would probably use many consecutive numbers, and therefore 1 out of 10 numbers used would be a valid credit card number. Therefore validating sequences using the LUHN formula reduces false positives by 90% but does not eliminate them.
2.4 Checking Prefixes
To reduce the amount of false positive, a monitoring system can check that the credit card number is not just valid but was also assigned. Naturally the monitoring system cannot include a list of all assigned numbers, but it can check for prefixes which where assigned to different financial institutes. A pretty good table of assigned prefixes can be found on Wikipedia.
Prefixes further reduce false positives and can be implemented using a regular expression. Assigned numbers account for 1% to 17% of the valid credit card numbers, depending on the sequence length. Prefixes are especially useful for eliminating the less often used sequences of 14 and 15 digits (1.2% and 2.5% prefix coverage respectively), leaving us with mostly the 13 and 16 digits sequences. 13 digits sequences are a mystery: it is not clear whether Visa still uses them.
On the down side, using prefixes can lead to false negatives and require updates to the monitoring system. For example, the Australian Bankcard range is marked as not in use in the Wikipedia table, but we have recently saw such a number in actual traffic.
Using the LUHN formula and prefix validation, false positive rate can be reduced to approximately 1% of the rate achieved using pattern matching only.
3. Handling False Positives Using Exceptions
In real world systems 1% is still a high number, especially as sequences of digits are quite common in network traffic. If a human would have to examine even hundreds of alerts a day, the monitoring system becomes not useful. How can we make the accuracy of the detection system better?
One way to do that would be to create exceptions for traffic known to generate such false positives. Exception can be defined both for non credit card sequences as well as for intentional and legal transmission of credit card numbers.
Such exceptions are a curse as much as a blessing, as overusing them or defining them too broadly will open big security holes. Lets take for example a 16 digit sequence used as a product ID in a web site. An exception using firewall like rules which support only IP address and port would have to ignore an entire web site to take care of such an issue. On the other hand an application aware monitoring system, such as a Web Application Firewall, can define a much more fine grained exception. In the case above, a WAF rule can be defined to exclude credit card number detection for the specific field on a specific page used for the product ID.
Let's assume for example that a ModSecurity rule number 955555 detects credit cards in an application output, but the page /support/manual_payment.php, available only to store personal, must display a credit card number. The following is a simple ModSecurity exception for ignoring this rule for a single page:
<LocationMatch "/support/manual_payment.php">
SecRuleRemoveById 955555
</LocationMatch>
The exception can further check that only a single credit card number is displayed on this page, and only in a certain part of the page.
Further more, the product ID may have some unique attributes such as its own prefix or surrounding text that can help to make the exception narrower.A good example is Google AdSense. A site running Google ads needs to add the following piece of code to each page displaying ads:
<script type="text/javascript"><!-- google_ad_client = "pub-0000000000000000"; google_alternate_color = "ffffff"; ...
Many times the 16 digits ID in the google_ad_client parameter is a valid credit card number. The following modified regular expression will compensate for that:
(?<!google_ad_client = \"pub-)(?<!\d)(\d{4}\-?\d{4}\-?\d{2}\-?\d{2}\-?\d{1,4})(?!\d)
4. Other Considerations
4.1 Evasion
Evasion techniques are a serious problem for intrusion detection system in general and even more so for detecting credit card numbers. Even the simplest transformation function performed on a sequence will enable it to bypass detection. For example, an attacker performing an SQL injection attack in order to smuggle card numbers out, could craft an SQL statement in such away that each credit card number is multiplied by 2. As a result, the monitoring system would not detect the output as a valid credit card number. Once the information is out, the attacker can easily divide the number by 2 to get the original credit card number. Because it is so easy to evade them, network monitoring system, or actually any other egress inspection system are not suitable for detecting malicious theft of credit card numbers. To avoid such credit card numbers theft one must focus on inbound protection.
But even unintentional leakage might be subject to unintentional evasion. A good example is encryption: in order to provide better security, many applications use encryption when transferring information over the network. Such encryption would hide the traffic from the monitoring system.While most network layer IDS solutions fail to decrypt SSL, web layer security solutions always do that. You can read more about the SSL blind spot in this thread.
Another problem would be encoding systems built into network protocols such as Unicode encoding or compression of HTTP traffic and base64encoding of e-mail messages. Again, network only monitoring systems would not detect the encoded traffic, while an application aware monitoring system would decode prior to inspection and therefore detect the leakage.
4.2 Logging
Logging is just as important as detection for a monitoring system. This is all the more so with credit card numbers detection: in many cases a security breach can be mitigated better if the organization knows what actual information leaked. For example, different state disclosure bills such as California SB-1386 require an organization to notify all affected clients in case of a breach. If the organization does not know who the affected clients are, it must notify everyone, raising the price of the breach and the media exposure.
Unfortunately, logging credit card leakage incidents is not trivial. PCI DSS does not allow the credit card number itself to be logged. On the other hand, the logging record must include enough information to be useful. Useful implementation must keep two levels of log:
- Alert logs that can be used to analyze what happened, but do not include the actual credit card number, or possibly a masked version of it.
- Encrypted store for the credit card data itself.
4.3 Performance
Regular expressions are not very efficient and therefore most IDS try to avoid testing the payload for a large number of regular expressions. To achieve that, an IDS would first use an efficient parallel matching algorithm such as Aho-Corasick, which is super fast and uses a single cycle through the payload to check for all signatures. In the other hand parallel matching can only matches simple strings. Only if a certain simple string matches, a follow-up regular expression is tested.To reduce the number of regular expression tests required, the parallel matching algorithm searched the longest constant string extracted from every regular expression.
Unfortunately the regular expressions presented so far in this write-up do not have any fixed string in them as the look for a sequence of digits. Parallel matching algorithms can be adapted to search efficiently for a string of character groups, digits in this case, rather than a string of characters, but normal implementations found in most IDS do not support it.
Additionally, the performance cost of running a checksum algorithm over any sequence matching must be taken into account.
4.4 Other Sensitive Identifiers
While credit card numbers are the most well known sensitive identifier for which PCI DSS requires special attention, it is neither the only one, nor the most sensitive. Card Verification Code (CVV) is a 3 or 4 digit code on the back of a credit card that is often used as an additional identification number in online transactions. CVV is even more sensitive than a credit card number, but much harder to detect as it is so short and has no checksum digit.
One way to detect use of CVV numbers is to find a 3 or 4 digits value in a field on a form where a credit card number was found. This method is far from immune to false positives, but in paranoid environments might pull the trick.
5. Conclusion
Detecting theft of credit card numbers by monitoring network traffic is very difficult, but such monitoring can be useful for detecting unintentional leakage of credit card numbers. In order to do so the monitoring system has to be application and protocol aware so that it can both compensate for encoding and encryption applied to the data as well as provide a tool for creating exceptions for valid credit card numbers or other information detected as credit card numbers.
Posted by ofer at 02:08 PM