Recent years have seen an exponential growth in the quantity of data generated and available. At the same time there has been an explosion in connectivity and exchange of data. The importance of networks, including intranets, local area networks (LANs), wide area networks (WANs) and the Internet, has increased dramatically. Rapid exchange of electronic communications and data permeates modern workplaces. Unfortunately, this connectivity has also been exploited by spammers, hackers and others for unauthorized purposes.
Various systems for the detection and/or prevention of unauthorized data and software, also referred to as malware, are currently utilized with networks. For example, an intrusion detection system (“IDS”) searches data transmissions looking for strings of data that are indicative of malware. Processing data in transit, such as computer messages traversing a network, typically involves comparing message data to a set of rules that characterize instances of malware. Message data matching one or more of the rules is identified as malware. The rules are constantly updated and new rules added as new forms of malware are created and identified. When an instance of malware is identified in a network, the network may take steps to alert users, act to isolate the malware and/or prevent the malware from reaching its destination.
Data search systems frequently utilize tree search methodologies to process data for a rule set. Tree search methodologies use a tree data structure and sequentially compare the data set being evaluated to each rule of the rule set. If the data set fails to match the current rule being processed, the data set is compared to the next rule in the rule set, until either a match is identified, or the data has been processed against all the rules and it is determined that there is no match. In an IDS, if the data set does not match any of the rules in the rule set, the data set is not an instance of malware currently described in the rule set. When a tree search method is used, data communications are intercepted and maintained or held, such that an entire communication or data set is available for sequential processing against the rule set. Each time the data is compared to a rule, the entire data set should be available for comparison. Holding the data for comparison introduces a latency in transit of data. As a result, searching for malware in network data transmissions utilizing tree search methods introduces latency across the network. In addition, sequential comparisons used in tree search methods are generally slow, which makes these methods unsuitable for many high speed networks, such as those operating at Gigabit speeds.
In other data search systems, parallel processing is implemented in place of tree search methodologies, which results in increased search speed. Parallel processors are typically implemented in hardware. For example, parallel processing can be implemented using multiple processing cores or Field Programmable Gate Arrays (FPGAs). In a search system, processing resources are allocated to individual rules of the rule set. Accordingly, instead of sequentially processing the data set against each rule in the rule set, the data set may be processed against multiple rules in parallel utilizing the separate processors. While parallel processing eliminates at least some of the latency introduced by tree search methods, hardware requirements may limit the utility of this solution. For true parallel processing, a separate processor is required for each rule in the rule set. Accordingly, the addition of a rule to the rule set would require the addition of hardware. This is not practical in IDS systems, which require updates to the rule set for each new instance of malware. Moreover, in conventional parallel processing systems, the availability of processing resources limits the number of rules to only hundreds or perhaps thousands of rules. Consequently, these types of parallel processing units are unable to handle rule sets of tens of thousands, required in many applications.
Accordingly, there is a need for a system and method that provides for large number of rules, without excessive hardware requirements or introduction of large latency.