Regular expressions provide a powerful method for finding a search string within another target string, file or stream of text data such as a web document. Regular expressions are particularly useful for searching for patterns in semi-structured text data. Regular expressions are also useful for finding specific HTTP header values or specific web page content.
Regular expressions are very powerful but difficult to use. For example, expressions do not always do what a user intended; either it is too simple and produces unintended results or it is too complex and is impossible to determine whether or not the expression is correct. Another difficulty that may occur is that a user cannot easily determine what the regular expression actually did. This is due to how the regular expression search engine operates. The search engine performs a matching operation and comparison against a target string buffer or file. The result is either a match or a no match value (e.g. Boolean “found” or “not found”). The search results do not indicate “what” was found and the search engine does not retrieve the actual objects that match. Another difficulty is that regular expression execution is normally very expensive and resource intensive as compared to other search mechanisms, such as “substring” searches.
Previous ways to manage regular expressions relied on the experience of a user. For example, to improve search performance, the user needed to manually program and change their regular expressions using more complex expressions. However as expression complexity increased, the ability for the user to determine whether or not the expression is working correctly decreased, which brought up the original dilemma. Also, as expression complexity increases, execution duration and resource consumption typically increase.
Users that provide regular expressions often use the syntax incorrectly. This causes a mistake in what the user is asking for but not an error in the syntax. The regular expression search engine that processes the regular expression cannot tell if the mistake was intentional or not and often results in a larger amount of processing operations than is necessary and/or produce incorrect results.