Compilers are programs which run on digital computers to convert one language into another language. Typically, the language being converted, called the source language, is a language that is easily read and understood by a computer programmer skilled in the programming arts. The source language is typically translated into a target language, which is often the machine language for the computer on which the compiler itself runs. The target or machine language is typically understandable only by the computer itself.
A compiler typically reads each statement of the source language, and translates that statement into one or more target language statements. In some languages, a "forward reference" is allowed, wherein a language statement at one location in the source file refers to information that occurs at a later point in the source file. When this occurs, the compiler cannot complete the translation the first time it reads the source statement since it does not know where the forward reference is located. Thus, when forward references are allowed, the compiler must make two or more passes over the source code in order to completely compile the source code into a target language, such as machine code or an intermediate code for later optimization.
Compilers are often constructed using two major sections, called a lexical scanner and a parser. The lexical scanner, often just called the scanner, reads the source statements and separates each word symbol or number from the source statement into a "token". Each token is given a symbolic reference, often a number, and this symbolic reference is passed to the parser section of the compiler. The parser section of the compiler processes multiple tokens to separate the tokens into statements which are valid in the language. That is, the scanner separates the source code into a stream of tokens, then the parser analyzes multiple tokens to determine whether the tokens match a statement within the language. Once the parser finds a match between one or more tokens and a valid statement in the language, a parser action routine is called to output target language code corresponding to the statement of the language.
Multiple passes for a compiler can take many forms. One form is to scan and parse the source language during a first pass, obtaining the required type information for the variables within the source, and then rescan and reparse the source again during a second pass using this type information to actually create the target language code. Thus, the first pass resolves the location of all information, so that during the second pass the location of all forward references is known and correct target language code can be created. Although it is relatively easy to use this technique to change an existing one pass compiler into a multi-pass compiler, this approach requires scanning and parsing the source code twice, thus adding significantly to the time required to compile the source into the target language. Another approach is to scan and parse the source code once, saving the needed type information and the parse trees for a second pass, which uses only the parse trees. This technique also works well, however, it generally affects the fundamental design of the entire compiler, for example requiring parsing rules be defined separately for the second pass, thus making it a difficult process when retrofitting an existing compiler from one pass to two passes.
In most cases, two passes are all that are required to successfully convert the source code into target language code.
There is need in the art then for efficient two pass compilers that do not rescan and reparse the source language. There is further need in the art for compilers that are easy to understand for the compiler writer with the ability to retrofit older one pass compilers into two pass compilers. Still another need in the art is for two pass compilers that keep the code for both passes together in the compiler source code.