Programming languages are traditionally processed by first reading program text, then dividing the program text into contiguous sets of input characters commonly referred to as tokens. Some of the tokens may be ignored in further processing (for example, comments and blank lines). The program that divides program text into tokens is called a lexical analyzer, or lexer.
The sequence of tokens is then processed into a parse tree, a data structure that organizes the input according to the basic rules that describe the programming language (i.e., the grammar rules). Such rules may describe the correct construction of statements, expressions, functions, classes, or other concepts that make up correct programs in the programming language. The program that organizes the input into a parse tree is referred to as a parser.
A number of systems exist for generating lexers and parsers from input files that describe the tokens and grammar rules for a programming language. Parsers are commonly generated from context free grammars, sometimes called BNF (Backus-Naur Form) or simply grammars. Typically, a grammar input is provided to a program called a parser generator, and this program writes another program, which is the desired parser. This parser generates parse trees that conform to the input grammar when the parser's input is correctly formed. Examples of such parser generators are Yacc and Bison.
Program text can contain errors due to incorrect input provided by the programmer. One type of error is a lexical error, which occurs when the lexer incorrectly forms a token. Examples of a lexical error include a string that does not contain a closing quote mark, or an input character that is outside of the character set of the language.
Another type of error is a syntax error, where the program text is made up of correctly formed tokens, but the tokens do not conform to the grammar. Examples include expressions with unbalanced parentheses, or an IF or WHILE statement that is missing a closing delimiter. An important part of the lexing and parsing processes is to identify and report lexical and syntax errors, respectively.
Having recognized a syntax error, there are many reasons to continue processing the program. For example, further processing may uncover additional errors, or it may provide an opportunity for extracting useful information from later functions or classes that are correctly formed. However, continued processing in the face of an error often results in interpreting subsequent correct tokens as false errors.