Computers are being used today to perform a wide variety of tasks. Many different areas of business, industry, government, education, entertainment, and most recently, the home, are tapping into the enormous and rapidly growing list of applications developed for today's increasingly powerful computer devices. Computers have also become a key technology for communicating ideas, data, and trends between and among business professionals. These devices have become so useful and ubiquitous, it would be hard to imagine today's society functioning without them.
Computers operate by executing programs, or a series of instructions stored in its memory. These programs, and their series of instructions, are collectively referred to as software. Software is key to utility of computers. Software is what makes the computer devices function and perform useful tasks. Good software makes for effective machines, whereas poor software makes for difficult to use, less effective machines. Thus, the utility of the computer device often hinges upon the utility of the software written for the device.
Software is written by professionals referred to as programmers or software engineers. As programs have become larger and more complex, the task of writing software has become correspondingly more difficult. As a result, programmers typically code in "high level languages" to improve productivity. The use of high level language makes the task of writing extremely long and complex programs more manageable. The completed program, however, must be translated into machine executable language in order to run on a computer. Programmers rely upon compilers to translate their program written in high level language into a program comprised of machine executable code, known as "machine language."
Compiler efficiency and sophistication is directly related to the speed and reliability of the machine executable code. The process of translating the program written in high level language into a program written in machine language is referred to as compiling. The actual translation is performed by a software program referred to as a compiler. The compiler operates on the program written in high level language. The high level language program is referred to as source code. The compiler translates the source code into machine executable code. Ultimately, it is the machine executable code which will run on the computer. Thus, the speed and reliability of the executable code depends upon the performance and efficiency of the compiler. If the compiler is inefficient, the machine executable code will run slower. Other attributes, such as executable code size and reliability, may also be affected. Hence, it is critical to the speed and efficiency of the program that the compiler thoroughly optimizes the executable code during the translation process.
There are several different methods and techniques used to optimize source code. One technique, commonly referred to as "loop interchange," involves changing the order in which loops are executed. Rather than executing loops according to the way that the human programmer had originally written the computer program, the compiler rearranges the loops into a different, more efficient order so that the code can be executed much faster without impacting the final result. Another technique, known as "cache tiling" or "blocking," involves the compiler breaking large operational blocks into several smaller blocks. Subdividing the larger blocks into smaller blocks generally reduces the total number of cache misses that are required. Reducing the number of cache misses, directly increases the speed at which the code may be executed. Yet another method for improving the execution speed involves "register tiling" or "unrolling." A register tiling process further subdivides operations blocks so as to minimize the number of loads and stores which are required. Associated with any given hardware design is a limited number of registers. By keeping data in registers for an elapsed period of time, data items need not be loaded to or stored from memory each time the data is accessed.
In the past, various compilers have used one or more of these techniques to optimally compile their code. First, loop interchange as applied to change the ordering of the loops. The resulting code was then altered according to the most optimal cache tiling. Next, the modified code was then changed to reflect the most optimal register tiling. However, it has been discovered by the present inventors that although each of these techniques separately optimizes the code, that their final combined effects might not produce the most optimal performance. This is due to the observation made by the present inventors that these techniques are highly interdependent. Changing one of these factors transforms the other techniques. For instance, optimizing loop ordering might cause register tiling to become worse or vice versa. In addition, the various transformations might have contradictory effects on different characteristics of the machine that contribute to performance. For example, one loop ordering may improve cache behavior, but it might also seriously degrade scheduling behavior.
Thus, there is a need in the prior art for a compiler that can automatically determine how best to transform source code considering the tree optimizations discussed and considering various machine characteristics such as caches and instruction scheduling. The present invention provides a highly effective, elegant solution to this problem by employing a total machine model to estimate performance characteristics for a wide range of transformations; the set of transformations which produces the best overall execution time estimate is then selected.