Regular expressions, or more generally patterns, describe sets of character strings. The pattern determines character strings that belong to the set. Accordingly, patterns can be employed to identify character strings, for example, to select specific strings from a set of character strings. Furthermore, regular expressions are often defined as a context-independent syntax that can represent a wide variety of character sets and character set orderings.
In operation, regular expressions can be employed to search and match data based upon a predefined pattern or set of patterns. As such, patterns employ a specific syntax by which particular characters or strings are selected from a body of text. More specifically, the expressions can consist of constants and operators that denote sets of strings and operations over these sets, respectively. Using the specific syntax of a regular expression or other pattern language, advanced text pattern matching can be performed. The following table lists exemplary regular expression operators and their definitions. The syntax illustrated in the table is frequently employed to establish both simple and complex string pattern identifications.
Menu ItemCharacterDefinitionAny.Matches any single character.CharacterCharacter in[ ]Matches any single character fromRangewithin the bracketed list. Withinsquare brackets, most characters areinterpreted literally.Character[{circumflex over ( )}]Specifies a set of characters not to beNot in Rangematched.Beginning of{circumflex over ( )}Matches the beginning of a line.LineEnd of Line$Matches the end of a line.Or|Matches either the regular expressionpreceding it or the regular expressionfollowing it.Group( )Groups one or more regular expressionsto establish a logical regular expressionconsisting of sub-regular expressions.Used to override the standard precedenceof certain operators.0 or 1?Specifies that the preceding regularMatchesexpression is matched 0 or 1 time.0 or More*Specifies that the preceding regularMatchesexpression is matched 0 or more times.1 or More+Specifies that the preceding regularMatchesexpression is matched 1 or more times.Exactly n{n}Specifies that the preceding regularMatchesexpression is matched exactly nnumber of times.At Least n{n,}Specifies that the preceding regularMatchesexpression is matched n or more times.At Most n{, n}Specifies that the preceding regularMatchesexpression is matched n or fewer times.n to m{n, m}Specifies that the preceding regularMatchesexpression is matched a maximum of ntimes and a minimum of m times. Ifnot specified, m defaults to 0.If n is not specified, the defaultdepends on whether the comma ispresent. If no comma is present, ndefaults to m. If a comma is present,n defaults to a very large number.New Line\nMatches a new line.CharacterTab\tMatches a tab character.Character
Regular expressions are a useful tool in the data flow field, which pertains to the movement and transformation of data to and amongst storage mediums. At present, structured information is stored in data files of varied formats. The structure of information depends on the format and therefore varies from format to format. The structure is known to the author and is typically documented so that data in that format can be consumed by others. The state of the art in regular expressions allows one to define a regular expression for each such format that will match data units in files of corresponding format. This permits one to ensure that a data unit conforms to a given format and to find beginning and end of data units of a given format.