Compilers are programs that translate an input stream to a (usually different) output stream. Generally compilers translate a program in some programming language (such as C or C++) into the binary machine instructions required for the program's execution on a specific computer architecture. In some cases, compilers are used as program analyzers to perform “source-to-source” translations which modify a source program to support optimizations. Compilers are usually organized into several phases (or passes): lexical translation into a token stream, parsing the token stream, analysis and code generation.
Lexical translation converts the input text stream into a collection of tokens (essentially compressing the input stream). The parsing phase checks to see that the token stream constitutes a “valid” program of the programming language. Validity of a program must not be confused with the notion of a “correct program.” In the English language, for example, sentences can be valid but not correct, such as the sentences “The sky is always orange”, or “The earth is flat.” The output of this parsing phase is (usually) a graph of the program which is an intermediate representation (IR) which is used for subsequent analysis and transformations.
The analysis phase repeatedly examines the program to find patterns which can be transformed to enhance the efficient execution of the program. This phase may remove program sequences which never execute, for example. Or it may move instances of a redundant computations to a single point in the program. Finally, the code generation phase converts the IR into the machine instructions for the particular target computer architecture (e.g., IBM's Power PC, Intel's Pentium or x86, MIPS, SPARC, Java ByteCodes, etc.)
Notably, all of the above transformations are performed statically. That is, these transformations are carried out before the dynamic execution of the program. Generally, once the program has begun to execute, no further transformations are applied.
The desired output of the code generation phase is a sequence of machine instructions that mirror the behavior of the original source program. It is necessary to execute those instructions to effect the behavior of the program. If the target machine is a physical computer (such as the PowerPC or the Pentium), the program is “native”, and generally runs very quickly. The target architecture might also be a Virtual Machine (VM) such as the Java Virtual Machine (or JVM) or Zend's VM for the language PHP. Traditionally, a virtual machine is a program that simulates the behavior of an abstract computer architecture by repeating the cycle: an instruction is fetched, its operands are fetched, the operation is performed, and any results are saved.