Contemporary trends in hardware design are evolving. Rather than continuing to push the envelope in terms of maximizing clock speeds to increase single-threaded performance, hardware designs are evolving toward slowing clock speeds and increasing the number of concurrent threads of execution in a system, e.g., using multiple microprocessors, microprocessors with multiple cores and/or microprocessors with multiple hardware threads (e.g., with multiple hardware threads per core). Each hardware thread in a multithreaded processor is treated like an independent processor by the software resident in the computer. Such designs provide increased throughput at lower power cost. However, these designs also degrade single-threaded performance due to the slower clock speeds. A side effect of such designs is that tasks which are not parallelized to take advantage of multiple cores will appear to run more slowly on new systems than they did on older systems with faster clock speeds.
One particular area of concern for organizations that develop software is the possibility that compile times will increase. To take advantage of the trend of multiple hardware threads with slower clock speeds, where possible, compilers need to take advantage of the multiple hardware threads to perform different parts of the compiling process simultaneously. In programs that consist of many small procedures, this is relatively straightforward: a compiler can spawn multiple threads to compile individual procedures on different processors and gather the results produced by the threads to package the final program. The larger challenge is for compilation of larger or legacy procedures.
The compilation process consists of many steps. The most time-consuming of these steps tends to be the global optimization step. Most global optimizations are based on data flow analyses, which are algorithms to gather information about a program. Data flow analysis refers to a body of techniques that derive information about the flow of data along program execution paths. For example, one way to implement global common sub-expression elimination requires the determination of whether two textually identical expressions evaluate to the same value along any possible execution path of the program. As another example, if the result of an assignment is not used along any subsequent execution path, then the assignment can be eliminated as dead code. In each application of data flow analysis, every program point has associated with it a data flow value that represents an abstraction of the set of all possible program states that can be observed for that point. The set of possible data flow values is the domain for this application. For example, the domain of data flow values for reaching definitions is the set of all subsets of definitions in the program. A particular data flow value is a set of definitions and each point in the program is associated with the exact set of definitions that can reach that point. The choice of abstraction depends on the goal of the analysis. To be efficient, only the information that is relevant is tracked.
The results for data flow analyses generally have the same form: for each instruction in the program, the results specify some property that must hold every time that instruction is executed. As set forth above, the analyses differ, however, in the properties they compute. For example, a constant-propagation analysis computes, for each point in the program, and for each variable used by the program, whether that variable has a unique constant value at that point. As another example, a liveness analysis determines, for each point in the program, whether the value held by a particular variable at that point is sure to be overwritten before it is read again. If so, there is no need to preserve that value, either in a register or in a memory location.
Spawning multiple threads to compile individual procedures on different processors, however, breaks down for code that is not built in a modular fashion. Many programs in use today are constructed from single monolithic procedures, which cannot benefit from compiling procedures in parallel. A number of these programs are old enough to predate the common use of modular programming styles, while others are written in older languages that discourage the use of multiple procedures. Regardless of the cause, there are many such programs in existence on many different platforms. Further, even programs written in a more modular style often contain some very large procedures as a result of poor design or maintenance. A solution is needed, therefore, to improve compile times for large procedures as the hardware development trends continue to move toward slower clock speeds and many available hardware threads.