1. Field of the Invention
The present invention relates to improvements in software optimization tools and, more particularly, to a profile-based preprocessor.
2. Background Information and Description of the Related Art
A "profile" is a characterization of the execution of a program. Illustratively, a profile provides the number of times each instruction is executed in each subroutine of a program. Consequently, profiling indicates which portions of the program require the most central processing unit (CPU) time and, thus, are candidates for optimization.
Conventional profiling tools are commonly used in hardware and software performance analysis. These tools typically add instrumentation instructions to each basic block of a program to analyze and test the program. The instrumentation instructions typically increment a counter or write a record to a trace list each time a particular basic block is executed.
However, once a profile has been completed and the developer has ascertained which portions of the program requires the most CPU time, problems arise on how to optimize the program. One specific problem relates to computer systems' storage hierarchies. FIG. 1b illustrates a typical storage hierarchy, which includes central processing unit ("CPU") 130, cache 140, random access memory ("RAM") 150, and disk memory 160. CPU 130 typically retrieves instructions and/or data directly from cache 140, which is the fastest, smallest, and most expensive storage device. Cache 140 allocates its memory space in groups of words. In this example, cache 140 allocates sixteen words per cache line and has 64 lines. Therefore, CPU 130 retrieves 64 groups of sixteen words from RAM 150 and stores them into cache 140.
A memory allocation problem arises when CPU 130 must repeatedly execute specific lines of code that are separated by multiple lines of code that are seldom or never executed. Specifically, referring to FIG. 1a, lines 100, 110, and 120 contain either a conditional or while statement. At line 100, if the condition is not met, then a jump to line 110 occurs and the else statements S8-S13 and Proc A are executed. Otherwise, the statements S1-S7 are executed. However, statements S1-S7 could be, for example, error correction code which is seldom, if ever, executed. Therefore, in this example, line 110 is satisfied often and, thus, statements S8-S13 and Proc A are executed often, while line 100 is seldom satisfied and, thus, statements S1-S7 are executed seldom.
However, as previously described, to test line 100, CPU 130 will allocate sixteen sequentially ordered words of code per line of cache 130. Therefore, statements S1-S7 will be stored in cache 130. As such, this allocation fills valuable cache memory with seldom or never executed code. Consequently, cache 130 may not be capable of storing other highly executed code such as, for example, lines 110 and 120. Therefore, CPU 130 would have to retrieve that highly executed code from memory 150 or disk 160, significantly degrading operating speed and performance.
In an attempt to mitigate these problems, preprocessors have been developed to optimize software performance and reduce real memory requirements. Some conventional preprocessors statically scan and restructure a program at the source code level when the preprocessor detects a conditional statement in the program. To do so, the preprocessors make assumptions or "guesses" as to which path each conditional statement takes. Often these "guesses" are not representative of what actually occurs at program execution.
Other conventional preprocessors dynamically profile a program at the source code level, but restructure the program at the executable level. To do so, a conventional "postprocessor" restructures the executable code according to the profile information. However, this technique creates significant disadvantages, such as the inability to restructure data with guaranteed functionality due to indirect/dynamic data references, which are references to data addresses that are calculated at run time. The program can also be restructured by a compiler at compile time. To do so, however, requires an enormous amount of modification to the compiler and significantly slows compilation time.
Therefore, there is great need for a preprocessor that dynamically profiles a program at the source code level, analyzes the conditional statements, procedures, and groups of data elements in that program to determine frequency of execution, and then restructures the program accordingly at the source code level.