A regular expression is often defined as a pattern matching language which can be employed to identify character strings, for example, to select specific strings from a set of character strings. More particularly, regular expressions are often defined as a context-independent syntax that can represent a wide variety of character sets and character set orderings.
In operation, regular expressions can be employed to search data based upon a predefined pattern or set of patterns. As such, this pattern matching language employs a specific syntax by which particular characters or strings are selected from a body of text. Although simple examples of regular expressions can be easily understood, oftentimes, the syntax of regular expressions are so complex that even the most experienced programmers have difficulty in understanding them.
A reoccurring issue posed by the complex syntax of regular expressions is that many users lack the knowledge necessary in order to design and/or verify an expression. Although a specific syntax can be provided by which regular expressions are constructed, the complexity of the syntax is further demonstrated in the fact that most sets of data can be described using multiple different syntactical expressions. It will further be understood that the specific syntax for a regular expression can vary among tools and application areas. This variation leads to even more complication with respect to understanding the intricacies of the regular expression mechanisms.
Although sometimes very difficult to understand, regular expressions are a very powerful and useful tool in the field data manipulation and extraction. The expressions can consist of constants and operators that denote sets of strings and operations over these sets, respectively. In operation, a user or programmer can perform advanced text pattern matching using the specific syntax of a regular expression. In most cases, regular expressions can provide more flexibility than simple wildcards in defining rules or views. The following table lists exemplary regular expression operators and their definitions. The syntax illustrated in the table is frequently employed to establish complex string pattern identifications.
Menu ItemCharacterDefinitionAny.Matches any single character.CharacterCharacter in[ ]Matches any single character from within the bracketedRangelist. Within square brackets, most characters areinterpreted literally.Character[{circumflex over ( )}]Specifies a set of characters not to be matched.Not in RangeBeginning of{circumflex over ( )}Matches the beginning of a line.LineEnd of Line$Matches the end of a line.Or|Matches either the regular expression preceding it or theregular expression following it.Group( )Groups one or more regular expressions to establish alogical regular expression consisting of sub-regularexpressions. Used to override the standard precedence ofcertain operators.0 or 1?Specifies that the preceding regular expression is matchedMatches0 or 1 time.0 or More*Specifies that the preceding regular expression is matchedMatches0 or more times.1 or More+Specifies that the preceding regular expression is matchedMatches1 or more times.Exactly n{n}Specifies that the preceding regular expression is matchedMatchesexactly n number of times.At Least n{n,}Specifies that the preceding regular expression is matchedMatchesn or more times.At Most n{,n}Specifies that the preceding regular expression is matchedMatchesn or fewer times.n to m{n,m}Specifies that the preceding regular expression is matchedMatchesa maximum of n times and a minimum of m times. If notspecified, m defaults to 0.If n is not specified, the default depends on whether thecomma is present. If no comma is present, n defaults tom. If a comma is present, n defaults to a very largenumber.New Line\nMatches a new line.CharacterTab\tMatches a tab character.Character
Because of the complex nature of the syntax involved in defining regular expressions, a reference sheet is most often required in order to assist in accurately formulating (and/or interpreting) a regular expression. As the complexity of the regular expression is illustrated in the table above, even the most skilled programmer often has difficulty designing a regular expression that coincides with a desired string pattern.