Generally, regular expressions (also referred to as “regex” or “regex's”), which are essentially patterns for specifying and recognizing text strings, represent a fundamental tool in all forms of processing over semi-structured or unstructured data sets. Some uses of regular expressions include flexible matching & pattern-based lookup, and pattern-based filtering. They also are used extensively in information extraction (IE). Generally, the ease of use and extensive support for regular expressions, across many if not all high level programming and scripting languages, has led to a very broad adoption of regular expressions.
More recently, there has emerged a new class of enterprise applications that involve analytics over massive volumes of unstructured and semi-structured data. These applications involve, e.g., machine log analytics, social media analytics, customer voice analytics, among a great variety of others. The analytic workflows that power these applications make extensive use of regular expressions. However, the task of evaluating regular expressions can be a very expensive and resource-intensive task. In this vein, regex computations often dominate the overall costs of many IE tasks, and such costs have continued to increase considerably as the volumes of data being processed themselves grow exponentially.