Parsers are used to transform programming languages into parse trees or abstract syntax trees. They are also used to convert structured data into data trees. Typically, the input to the process is a text file containing the information to be parsed. The text file is converted into tokens by a lexical scanner or similar program. The parser then transforms the string of tokens into a syntax tree. If parsing a programming language, the parse tree produced by the parser is then optionally transformed by a semantic analyzer, and finally the parse tree is used to create machine code or an intermediate file to be used by an interpreter. FIG. 1 shows the position of a parser in a typical compiler. If parsing structured data, the data tree produced can be used by any program that knows how to manipulate the data contained in the data tree.
Currently, there are two classes of parsers, top down and bottom up. Top down parsers use a grammar that is specified in BNF (Backus-Naur Form or Backus-Normal Form). Grammars described in BNF are complex and certainly not intuitive. Bottom up parsers use a variety of tables which are not easy to construct. In fact, a program called YACC was developed that takes a BNF language description and generates the source code for a bottom up parser.
Both classes of parsers are made to work on input that properly meets the language specification. If there is an error in the input, it can cause a cascading sequence of errors to be reported unless some sort of error mitigation is used. In some cases, the language specification goes so far as to include rules that indicate incorrect usage. These rules match common programming mistakes that, when found, allow the parser to take the proper corrective action.
Programming languages are recursive in nature. Top down parsers are by nature recursive. This can make them difficult to debug. Bottom up parsers, while not recursive, require complex tables in order to parse the recursive structures in programming languages and structured data. Regular expressions, used in find and replace programs such a “grep”, cannot be used to parse programming languages because regular expressions cannot handle the recursive nature of programming languages.