A compiler is a computer program that reads a program written in one language--the source language--and translates it into an equivalent program in another language--the target, or object, language. Common source languages are human readable languages such as FORTRAN, BASIC and C. Programs written in a source language are comprised of source code that consists of a series of instructions. Object languages are comprised of assembly language or machine language for a target machine such as an Intel microprocessor-based computer.
There are two parts to a compiler: analysis and synthesis. The analysis part breaks up the source program into constituent pieces and creates an intermediate code representation of the source program. The synthesis part constructs the equivalent object program from the intermediate code.
The analysis part of a compiler includes lexical, syntax and semantic analysis and intermediate code generation. This part is often referred to as the "front end" of a compiler because the part depends primarily on the source language and is largely independent of the target machine. Briefly, lexical analysis consists of reading the characters of the source program and grouping them into a stream of tokens. Each token represents a logically cohesive sequence of characters. Syntax analysis then groups, or parses, the tokens of the source program into grammatical phrases that are used by the compiler to synthesize output. Semantic analysis then checks the parsed source program for semantic errors. After performing syntax and semantic analysis, the compiler generates intermediate code from the parsed source program. The intermediate code is written in an intermediate language and consists of a series of instructions.
The synthesis part of a compiler typically includes code optimization and object code generation. This part is often referred to as the "back end" of the compiler because the code generated depends on the target machine language, not the source language. Code optimization attempts to improve the intermediate code for the program so that faster running machine code will result. Object code generation then generates object code from the improved intermediate code by, among other things, translating each intermediate code instruction into a sequence of machine instructions that perform the same task.
The use of instruction-level parallel processing in newer CPUs such as the Intel Pentium.TM. microprocessor has increased the need to optimize the order of instructions. With parallel processing, following instructions are executed in parallel with preceding instructions. However, if the preceding instructions include a branch, then execution of the following instructions is unnecessary if the preceding instructions branch away from the following instructions. The CPU instead must execute the instructions that the branch leads to. This circumstance is referred to as speculative instruction execution, since it is uncertain whether the parallel-executing instructions actually need to be executed. To reduce this uncertainty, compilers attempt to assess a program's likely instruction path through the program's various branches. To select a profitable optimization, a compiler must first predict how often portions of a program execute. Once the more frequently executed portions are identified, any of a number of well known optimizations can be applied to these portions. These optimizations include rearranging the sequence of object code so that the more frequently executed portions follow each other and can be executed in parallel.
Another reason for optimizing the order of instructions is to reduce cache misses in computers that utilize cache memory between the CPU and main memory. Instructions may be arranged so that those most likely to be executed sequentially are stored in the same cache line or block. Thus when a cache line is accessed for an instruction, the instructions most likely to follow are also immediately available to the CPU.
Determining the more frequently executed portions of a program is often done through a process known as profiling. Dynamic profiling consists of compiling and then executing a program to collect the execution frequencies of the program portions. Most profiles result from dynamically counting events during a program's execution. Based on these counts, a compiler can identify the frequently executed code and optimize it with the benefit of this information. However, dynamic profiling has a number of drawbacks. First, obtaining a profile of each program to be compiled requires compiling and executing the program twice, once to obtain the program's profile and once to optimize the code with the benefit of the profile information. Second, it is often impractical to profile real time and reactive systems. Third, optimization based on dynamic profiling is not automatic, but requires programmer intervention to provide the input data and run the program in the optimizing process. End users are untrained in dynamic profiling and in using the profiling information to optimize programs they write.
An alternative is static profiling, in which a compiler estimates relative frequencies (not absolute counts) through a static analysis of the program's code. Static analysis relies upon heuristics (commonly observed program behaviors) for predicting what portions of a program most frequently execute. Heuristics are derived through observation of programs and typically are given as a probability, e.g., a chance that a branch of a certain kind will be taken by a program. Since static analysis does not require executing the program to obtain the profile information, the drawbacks of dynamic profiling are avoided.
A prime example of present static profiling techniques is described by Thomas Ball and James Larus in a 1993 paper entitled "Branch Prediction for Free," which is hereby incorporated by reference. In their paper, Ball and Larus describe a number of heuristics they may apply to branches in a program's code to predict whether the branch will be taken. These heuristics include, for example, a prediction (yes or no) that a comparison of a pointer against a null in an If statement will fail. Based on these binary branch predictions, a compiler can estimate what portions of the program are most likely to be executed.
Typically, several heuristics apply to a branch. Ball and Larus predict a branch's outcome with the first heuristic--from a pre-computed, static priority ordering--that applies to a branch and disregard the other heuristics. This approach works well for branch prediction, which simply produces a yes or no. However, it ignores valuable statistical information. Each heuristic is determined from empirical data, and associated with each heuristic is a statistical probability that the branch will be taken. It is this probability that provides the basis for the binary prediction. For example, the heuristic mentioned above empirically may have a 60% chance of being correct, and thus the prediction is that the branch will occur since the comparison fails most of the time. But this statistical information is not used beyond determining the prediction.
The primary drawback of static profiling techniques to date is their inaccuracy in predicting program behavior. The approach suggested by Bell and Larus as well as other static profiling approaches suggested by others are not as accurate as dynamic profiling.
An object of the invention, therefore, is to provide an improved static profiling method for determining frequently executed portions of a program. Another object of the invention is to provide an optimizing compiler that employs this method in optimizing the compilation of source code into object code.