The present invention is related to the field of computer programming and more particularly with regard to a method and apparatus for improving irreducible region commoning compile speed.
In computer programming loops are well-known logic devices to perform similar operations for a known number of times. A region within a program is either a loop or an entire procedure. Most regions are reducible, with a single entry point. Some regions are referred to as irreducible, with multiple places at which they can be entered. In most programs all or nearly all regions are reducible. One exception is that in some implementations of exception handling (e.g., C++ try/throw/catch) the exception-handling code appears as an alternate entry into the region.
Program translators (i.e., compilers) usually include optimizers, and one of the normal optimizations is some variant of common expression elimination, code motion, or partial redundancy elimination. Many of the algorithms are similar but with different capabilities, strengths and weaknesses. Several have multiple variants. Each has a name such as CSXCM (Common Subexpression Elimination and Code Motion), MRA (Morel-Renvoise Algorithm for partial redundancy elimination), PRE (Partial Redundancy Elimination), or LCM (Lazy Code Motion). They are collectively known as commoning algorithms. Commoning algorithms are well-known in the art and need not be further explained herein.
In most commoning algorithms, each block in a reducible region must be examined either once or twice, in order from first to last. In some other algorithms examination is performed in the order from last to first (and during that, adjacent, predecessors and successors may be examined). The examination determines which elements, expressions or instructions within a block or region should be inserted or deleted to improve performance. The first block in a loop typically requires special handling during initialization, and a special cleanup processing is needed at the end. The worst-case number of visits is O(n) where n is the number of blocks in the region. The time taken depends on that and other things (like the number of expressions in the region) but is typically 2% to 5% of total compile time.
Irreducible regions are problematic. Since there are multiple entry points, no block is first, before all others. A partial ordering is used, from one entry to the bottom, but parts of the reducible region algorithm cannot work. There are only three known algorithms. One is to process all blocks from top to bottom and for other algorithms also bottom to top, repeating this process n times. The second is to process all blocks repeatedly but stopping after a pass in which no information has changed. The third is to use a worklist, so that only nodes needing initialization or recalculation need to be processed. Either way, the worst-case number of visits is O(n2). The second way adds work to detect that a change occurred but is generally used because typically only 3 or 4 passes are needed and in large regions that makes it much faster. Even then, in programs with large irreducible loops the cost of commoning may be well over 90%, or several hundred times slower than for reducible loops. The third way adds work to sometimes reduce the number of visits, but the worst case is still O(n2).
That problem has been mostly ignored because in most situations it is rare, but in some situations (e.g., exception handling), or using computed gotos or assigned gotos, irreducible regions become very common, and in some programs the number of passes required is close to n, not 3 or 4. That makes compile speed very slow. In one example with n about 575 the number of passes needed was about 565, needing 565 (not 3 or 4) passes over 575 blocks. The problem is aggravated by the fact that larger regions tend to contain more expressions, so the total time is O(n3).
The only known solutions to avoiding spending excessive time optimizing are to convince programmers to write their programs differently, to detect large irreducible regions and simply not optimize them, to detect them and do only intra-block local optimizations, or to use the worklist approach. While those may help compile speed, all but the worklist approach can harm program execution performance. The worklist approach adds work to keep track of the worklist and may be either faster or slower.
Hence, there is a need for an automated method for optimizing irreducible region commoning.