Compilers are used to convert one language (e.g., a programming language) into another language. The language being converted, called the source language, may be readily understood by one skilled in the computer programming art. A source program (written in the source language) may be translated into a target program (written in the target language) so that it may be executed on a computer.
Each programming language uses its own syntax and semantics. During the compiling process, the syntax and semantics of programs are verified. Syntax is the structure and specification of each language according to rules established for each language. These rules are referred to as the grammar. The semantics of each language is the meaning conveyed by and associated with the syntax of that language.
Compilers are typically constructed using two main components, namely, a lexical analyzer and a parser. The lexical analyzer reads the source statements and separates each word, symbol or number from the source statement into a "token". Each token is given a symbolic reference, often a number, and this symbolic reference is passed to the parser section of the compiler. The parser analyzes a stream of program expressions to determine whether or not the program expressions are syntactically correct. Once it is determined that a stream of expressions is syntactically correct, the stream of expressions can be compiled into executable modules.
In parsing a computer program input stream, the lexical analyzer uses a set of rules to group the predetermined characters into tokens. The lexical analyzer can recognize different types of tokens, such as identifiers, decimal constants, floating point constants, etc.
The parser imposes a structure on the sequence of tokens using a set of rules appropriate for the language. Such rules are referred to as a context-free grammar. These rules may be specified in, what is known as, Backus Naur form (for example).
Each grammar rule may be referred to as "production". Tokens are detected and passed to the parser program. Each string in the input stream that is parsed as having correct syntax is accepted. For example, the string 5*2+3 is accepted while the string 9++8 is rejected because it is syntactically incorrect.
A left to right, right most derivation (LR) parser accepts a subset of a context-free grammar. Each LR parser has an input, a push down stack, an output, a driver program and a parsing table. The parsing table is created from the grammar of the language to be parsed. Thus, the parsing table is unique to each language and its grammar. The driver program reads tokens one at time from the input stream. Based upon the information in the parsing table that corresponds to the token being analyzed, and based upon the current program state, the driver program shifts input tokens into the stack, reduces it by one of the productions, accepts a string of such tokens, or rejects the string of such tokens as being syntactically incorrect. Reduction is defined as the replacement of the right hand side of a production with the left hand side.
Each LR parser, for example, consists of a known modified finite automation with an attached push-down stack. At each discrete instance during a parsing operation, parser control resides in one of the parser's machine states. The parser looks ahead in the input stream for a subsequent token.
Reductions, as mentioned above, consist of a production number P and a collection of terminal symbols R, taken as a pair, and are always considered first in each state of the parser. If look ahead symbol L is in set R for production P, then the reduction is performed. As output of the production, the number P is given to a semantic synthesizer. Then, as many states as there are symbols on the right hand side of production P are popped off the stack. The non-terminal on the left hand side of the production P is put in place for the next look ahead. The state exposed at the top of the push down stack takes control of the parser action.
In some situations, the input stream may include statements which are either similar in appearance or complex. An example of a complex statement (using, for example, AT&T DSP1616 programming language) is: EQU a0=a0+p p=x*y y=*r0++ x=*pt++
An example of a statement which is similar to the statement above is: EQU a0=a0-p p=x*y y=*r0++ x=*pt++
In such cases, a single parser may have difficulty with interpretation of the statement, especially if additional productions are used to parse individual subparts of that statement.
Many programs that attempt to interpret natural languages (e.g., English) use a technique called speculative processing. When speculative processing is used in this way, the parser builds data structures such as "parse trees" (or "abstract syntax trees") in memory. These are then speculatively reduced in an attempt to match known patterns and determine their form and meaning. This technique is more clearly described in Aho, A., et al., Compilers, Principles, Techniques, and Tools, Addison-Wesley, 1986, page 49 et seq.