Field
Embodiments of the present invention generally relate to pattern matching of data by a context-aware accelerator. In particular, systems and methods for context based pattern identification and matching of data by a hardware acceleration device based on one or more constraints/conditions are provided.
Description of the Related Art
Pattern matching, in general, relates to a method of identifying a sequence of tokens, content/parameters of which meet constituents of one or more predefined patterns/formats. In operation, regular expressions, field based constraints, string based conditions, among other such criterions can be employed to search and match tokens as a function of a predefined pattern or set of patterns, wherein patterns typically constitute a specific syntax by which particular characters, fields, or strings are selected from a body of text/character/symbol based data. Exemplary applications of pattern matching include identification of location and length of a pattern within a token sequence in order to identify some predefined component of the matched pattern and to substitute matching pattern with some other token sequence or to take any other desired action on tokens (or data group they form part of) that match.
Large amounts of data are transmitted on a daily basis through computer networks, particularly via the Internet. It will be appreciated that the Internet is intended to provide efficient transport of data from a first location to one or more endpoints, and little consideration was given conventionally to the security of nodes on the network, giving unauthorized users a relatively easy access to networks as well as nodes on the network, via the Internet. Measures, such as Intrusion Prevention Systems (IPSs), Firewalls, Intrusion Detection Systems (IDSs), and Application Delivery Controllers (ADCs), among other access control mechanisms were then implemented to analyze network packets based on one or more rules/conditions that define the identifiers in packets that indicate whether they are desired or undesired, wherein packets that match the rules may be denied or rejected and packets that are valid and normal are transmitted to end devices. Typically, network packets are examined by parsing the packets to extract header and payload portions, and subsequently match the packets (or parsed portions thereof) with one or more rules/conditions/constraints defined by the access control devices to identify if the conditions are met, based on which the packets are accepted or rejected. Such rules/conditions/constraints can include multiple strings, character based expressions, or regular expressions, which are individually or in combination matched with the incoming and outgoing packets to detect undesired packets and handle them accordingly.
Due to the rapid increase of network bandwidth and cyber attack sophistication, a high performance context-aware pattern matching and text parsing system is desired by above mentioned access control applications. Besides the networking area, due to the massive amount of real-time generated unstructured data, data analysis also needs such a high performance context-aware pattern matching and text parsing system.
Various hardware accelerators have been developed to perform string matching and regular expression pattern matching. However, due to the multitude of the increasingly complicated rules and policies being developed by the access control devices, these existing hardware accelerators either have limitations on certain type of rule syntaxes or have limitations on compiled rule database memory footprint and performance. More importantly, with the strong context-awareness requirement by applications, integration of these context-unaware hardware accelerators has notable negative impact on the overall accuracy and system performance.
Therefore, there is a need of an accurate and precise context aware pattern matching and text parsing system and method that can minimize the performance vulnerability of the system. There is also a need for systems and methods that can identify, detect, analyze, and understand massive incoming unstructured packets at high speed and parse such packets for efficient pattern matching by a hardware acceleration device.