1. Technical Field
The present disclosure relates to data-pattern matching, and, in particular, to methods and systems for determining that a given subject data pattern fully matches a given signature data pattern.
2. Description of Related Art
a. Intrusion Prevention Systems (IPSs) Generally
Packet-data communication, such as that conducted over the Internet, is extremely popular, and is becoming more so every day. People, companies, educational institutions, etc. routinely use Internet-connected computers and networks to conduct their affairs. Myriad types of data are transmitted over the Internet, such as correspondence, medical information, financial information, business plans, etc. Unfortunately, not all uses of the Internet are benign; on the contrary, a significant percentage of the data that is transmitted over the Internet every day is malicious. Examples of this type of data are viruses, spyware, malware, worms, etc.
Not unexpectedly, an industry has developed to combat these attempts to disrupt and harm not only these Internet-based communications, but also the networks and computers used to conduct them. This industry, and the effort to fight these threats generally, is often and herein referred to as “intrusion prevention,” as very commonly such efforts are focused at points of access to private (e.g., corporate) networks. One important aspect of intrusion prevention involves identifying known threats (e.g., files that are or contain viruses, worms, spyware, malware, etc.) by particular data patterns contained therein. These patterns are often and herein referred to as “signatures” of these security threats, and are also often and at times herein referred to as “triggers” and by other names.
As such, data (e.g., IP) packets flowing through, towards, or from a network segment, such as a particular router, switch, or network generally, are often screened—perhaps by an intermediate device, functional component, or other entity—for the presence of these signature data patterns. When particular packets, or sequences of packets, are identified as containing at least one of these signatures, those packets (or, again, sequences of packets) may be “quarantined,” such that those packets cannot cause harm to any more networks and/or computers. These packets, removed from the normal flow of data traffic, can then be further examined without holding up that traffic generally.
In particular, systems that carry out intrusion prevention (i.e., intrusion-prevention systems (IPSs)), use pattern-matching techniques to attempt to detect malicious data, and to prevent that data from entering a given network segment. Typically, IPSs check both packet headers and packet payloads in order to detect content-based security threats. Standard detection methods consist of using pattern-matching or string-matching algorithms to search for malicious packets containing predefined signatures that characterize a threat. Typically, IPSs are deployed in-line with the network segment to be protected, such that all data that flows into and out of the protected network segment must pass through the IPS.
It can thus be appreciated that it would be advantageous for an IPS to be able to quickly and accurately identify signature data patterns across one or more packets, and to do so in a way that uses relatively few computing resources such as processing time and memory. For example, it would be advantageous for an IPS to be able to identify signature data patterns at “line” or “wire” speeds, which, in modern networks are typically at least 10 gigabits per second (Gbps). Further, it would be advantageous for an IPS to be able to efficiently identify a large number of signatures, identify signatures that overlap, and, because the location of a signature in a given packet is not always predictable, identify signatures having different lengths and starting at arbitrary locations in a data stream.
b. Pattern-Matching Techniques
Generally, pattern matching may be carried out using either approximate-pattern-matching techniques or exact-pattern-matching techniques. Approximate-pattern-matching techniques may be relatively less resource-intensive, but may result in “false positives” (i.e., the identification of given data patterns as malicious when in fact they are not). Accordingly, IPSs that employ only approximate-pattern-matching techniques may inefficiently quarantine network traffic that is actually benign.
On the other hand, exact-pattern-matching techniques—which require, for a given data pattern to be correctly identified as malicious (or at least as containing an exact signature of a threat), that the given data pattern match a signature data pattern exactly—are typically more resource-intensive than their approximate-pattern-matching counterparts, but generally do not result in as many, if any, false positives. It can thus be appreciated that it would be advantageous for an IPS to employ exact-pattern-matching techniques, but that such techniques may negatively impact effective network speeds.
i. Software-Based Solutions
Pattern matching can be carried out in software-based solutions as well as in hardware-based solutions. Software-based solutions, perhaps implemented using general-purpose processors, regularly employ pattern-matching algorithms that are well known in the art, including Knuth-Morris-Pratt, Boyer-Moore, and Aho-Corasick. It has proven difficult, however, for software-based solutions to keep up with rapidly increasing line speeds; software-based solutions typically do not support network traffic at a rate greater than a few hundred megabits per second (Mbps). As such, since software-based solutions can only support modest throughput, hardware-based solutions are often chosen.
ii. Hardware-Based Solutions
Hardware-based solutions may employ a variety of hardware types, including a variety of memory types, depending on the specific pattern-matching technique a given solution employs. For example, reconfigurable devices, such as Block random access memory (Block RAM) contained in field programmable gate arrays (FPGAs), are commonly utilized. Use of FPGA Block RAM is advantageous because, among other things, it inherently possesses parallelism that may be exploited to achieve high wire speeds, it is typically physically located relatively close to the processor and therefore associated with minimal access delays, and it is easily reconfigurable and therefore easily updated as new signature patterns become known. However, the relatively high cost of Block RAM usually limits the extent of its use in IPSs. Block RAM is therefore typically only employed in less resource-intensive approximate-pattern-matching techniques.
Other types of more conventional memory are static random access memory (SRAM) and dynamic random access memory (DRAM). In typical SRAM, each bit is stored using a combination of four transistors, two cross-coupled inverters, and two additional access transistors. In typical DRAM, each bit is stored using one transistor and one capacitor. Because capacitors inherently leak charge, DRAM must be regularly power-refreshed. Accordingly, SRAM is generally faster and less power-intensive than DRAM. On the other hand, DRAM is generally less expensive and less space-consuming than SRAM, because it is less structurally complex. In each of Block RAM, SRAM, and DRAM, data is stored, retrieved, or modified using a memory address at which the data is stored.
One example of an approximate-pattern-matching technique implemented in hardware is a Bloom filter, which may be utilized to determine whether a given data pattern definitely does not match a signature data pattern, and therefore need not be further examined for an exact match. Generally, a Bloom Filter is a data structure that reflects a set of signatures compactly by computing the result of at least one, and possibly multiple, hash equations on each signature in the set of signature data patterns, and flagging these hash results (i.e., memory addresses) by setting a simple binary indicator. In this way, a given memory device may be configured to reflect to some degree the signatures that are contained in the set of signature data patterns.
Once configured in this way, the Bloom filter may be queried to determine whether a given subject might be—or definitely is not—contained in the set of signature data patterns. Given the nature of such a filter, the answer to this query might be a false positive, but will never be false negative. Thus, approximate-pattern-matching techniques, including those that utilize a Bloom filter, may quickly indicate that a given subject is not malicious, and therefore does not need to be further examined to determine whether it exactly matches a signature data pattern. Further analysis is then typically performed on those subjects that cannot be classified as definitely benign, in order to evaluate whether such subjects exactly match a signature data pattern.