In the computer science field, text stream filters find application in a variety of applications, such as parser filters, pattern matchers, etc. Such filters may be used to modify a text stream being inputted to a device so as to achieve desired properties of the text stream. For example, a text stream filter may be used to filter input to a parser, thus serving as a parser filter.
A parser is a program for extracting syntactic, symbolic and/or semantic information from source code. A typical parser includes a front end, usually a scanner, for tokenizing an input source code stream. The scanner compares the input source code stream to a set of predefined patterns. When a pattern is matched, an action which has been defined for the pattern is executed. For example, the matched source code may be sent as a token to a syntax analysis subsystem. Scanners may be stand-alone programs or included in other applications, such as compiler front ends, text editors for syntax highlighting, and text filters.
A parser filter may be used upstream of a parser to modify the input to the parser so that the parser can still process source code containing nonstandard code. Such modification may be useful, for example, for parsing a dialect of a language. For example, a parser filter might implement “ignore” rules to replace with blanks certain nonstandard text strings which might not be understandable by the parser.
Prior text filters have a number of limitations. First, they are capable of replacing input text only by blanks, or whitespace, (“ignoring”) but not by other text (“replacing”). Prior text filters do not provide positioning information indicating, for example, the length of an original text block before a replacing action. As a consequence, ignore rules must often be defined to ignore certain source code constructs completely. Significant amounts of source code may not be read by the parser, leading to missing symbols. Without positioning information, input text may only be replaced with replacement strings having the same length as the text to be replaced. When only ignore rules are used, matched text may be replaced with an equal number of whitespace characters. To perform replacements of arbitrary length, positioning information is necessary.
Secondly, prior text filters do not support context-dependent patterns, i.e., they provide no mechanism for controlling in which situations an ignore or replace rule is active. Additionally, prior text filters do not support full regular expressions as patterns. Furthermore, prior text filters used as parser filters are interwoven tightly in a specific parser's code, so for every parser a tailored parser filter is required.
Thus, prior text filters are relatively inflexible.