1. Field of the Invention
This invention relates to computing systems, and more particularly, to program code optimization.
2. Description of the Relevant Art
The performance of computing systems is generally dependent on both hardware and software. As generating performance from hardware design becomes increasingly costly and/or difficult, attention turns to new methods of software design. For example, regarding the hardware of a system, the geometric dimensions of devices and metal routes on each generation of semiconductor chips continues to decrease. This reduction leads to increases in cross capacitance effects on wires, parasitic inductance effects on wires, and electrostatic field effects within transistors, which in turn increases the circuit noise effects on-chip and propagation delays. In addition, the number of nodes that may switch per clock cycle significantly increases as more devices are used in each new generation. This trend leads to an increase in power consumption with each new generation of processors. Accordingly, operational frequency of the processor may be limited by these noise and power effects, which may also limit the performance of the hardware.
In addition to improvements in hardware, software developers also seek ways to increase computing performance or optimize use of computing resources. When software developers write program code, the program code may not always be written in an efficient manner. Often times program code may be too large and complex for any individual to readily identify inefficiencies or identify opportunities for optimization. Additionally, project changes or changes in the personnel developing the program code may lead to unnecessary overhead or other inefficiencies being introduced into the program code. One approach to program code optimization is to develop and use sophisticated compilers to analyze the program code and perform optimizations. For example, loop structures are one type of program construct that may lead to bottleneck points in program performance. Therefore, optimizing compilers may include techniques for performing loop optimization in order to improve program performance.
Loop fusion is a loop transformation technique which replaces multiple loops with a single one. For example, consider the following piece of pseudo-code with adjacent loops:
int i, a[100], b[100];/* line 1 */for (i = 0; i < 100; i++) { a[i] = 1;}for (i = 0; i < 100; i++) {/* line 5 */ b[i] = 2;}
The above code has two adjacent for-loop constructs. These adjacent loops have a same initial value of 0, a same trip count of 100, and a same increment value by 1. The above code is equivalent to the following code with a single for-loop construct:
int i, a[100], b[100];/* line 8 */for (i = 0; i < 100; i++) { a[i] = 1; b[i] = 2;}/* line 12 */
Loop fusion is an optimization technique that takes several loops and combines them into a single large loop. Most of the existing work on loop fusion concentrates on data reuse or creation of instruction level parallelism opportunities. The legality of each transformation is determined by data dependencies between statements. Thus, reordering transformation requires data dependence analysis beforehand.
Loop fusion may reduce loop overhead and memory accesses, increase register usage, and may also lead to other optimizations. By potentially reducing the number of parallelizable loops in a program and increasing the amount of work in each of those loops, loop fusion can greatly reduce parallelization overhead. For example, fewer spawns and joins may be necessary. However, often, the source code provided to a compiler has only small sets of loops that are control flow equivalent, normalized, have the same iteration count, are adjacent, and have no fusion disqualifying conditions, such as an early exit statement within the loop.
In view of the above, efficient methods and mechanisms for efficient optimization of code with non-adjacent loops are desired.