Compilers convert a source program that is usually written in a high level language into low level object code, which typically consists of a sequence of machine instructions or assembly language. The constructs in the source program are converted into a sequence of assembly language instructions. To achieve the highest possible efficiency of operation, it is essential to know exactly how much time the program takes to execute. In order to concentrate on those parts of the program that need improvements in efficiency, the length of time that the program takes to execute, and other statistics about the program execution, at the function level, at the loop level, and at the basic block level must be determined.
The prior art technique uses a profiler to collect information about loops. However, it can only handle loops that are single entry loops and which do not have multiple entry points. Moreover, the prior art profiler cannot handle loops which have nested loops. Also, it had problems with nested loops and loops that have multiple entry points. Collecting information about these types loops is very important because these loops form a major part of a high level programs, and determining exactly how long each loop takes to execute, as well as other characteristics, such as cache miss and hit rates is essential for optimizing the execution of the program.
The prior art profiler also has trouble collecting profiling information about nested loops with common exit points. In these cases, the prior art profiler just discontinued operation and informed the user that these loops could not be profiled. Thus, the user had no way of knowing how much time these loops would take for execution, and had no way of determining any statistics about these loops. This is a serious problem because many loops in programs are loops which have common exits. One of the most common loops that users write is a loop that includes an error statement, i.e. go to error, which removes the control from a loop nest and sends it to a different place in the program. These loops could not be profiled by the prior art profiler using the prior art technique.
Another problem is the manner in which the profiler attempts to collect information about loops. Namely, the profiler changes the way in which the loop executed. Meaning that the prior art profiling technique used to collect information about the loop has altered the loop, and thus skews the collected information. One way the profiler skews the information is that it changed the time of execution for the loop. The profiler also changes the behavior of the instruction cache, because it places the instructions inside the loop. This changes the behavior of the instruction cache, because these instructions have to be loaded into the cache and then purged, depending on the cache replacement algorithm that is used by the processor.
When programs are executed, instructions are brought in from the main memory and then executed. In order to speed up the programs, a faster memory, called cache, is placed between the CPU and the main memory. Typically, a cache memory is comprised of SRAM and is fairly small in size. Usually cache is about 1K or 2K in size, as opposed to main memory which is typically about 1 or 2 gigabytes in super-computers. In order to further increase the speed of the programs, different kinds of cache replacement algorithms are used. When the CPU wants to execute instructions from the main memory, it brings the instructions from the main memory into the cache, and then it executes those instructions. Now, since cache is really fast, the computer can retrieve the instructions from the cache into the CPU very quickly and then execute those instructions. In comparison, if the computer retrieves the instructions from the main memory, then it takes much longer to execute the instructions.
Typically, programs execute instructions in a tight loop and when the instructions are repeatedly executed, then it is more efficient to have the sequence of instructions stored in the cache, instead of in the main memory, and consequently this speeds up the execution of the program. However, if the sequence of instructions that is being executed by the CPU is really large in size, for example greater than 1K which is the typical size of the cache, then the computer will not be able to reuse any of the instructions that are loaded into the cache. This is because instructions would have to be purged from the cache to make room for subsequent instruction to be loaded, as this is extremely inefficient. Thus, when the profiler places instrumentation instructions inside the loop, then the number of instructions inside the loop will increase to such an extent that the loop instruction sequence might be larger than the cache size. This means that the loop cannot fit into the cache, and then each instruction that is to be executed must be retrieved from the main memory, thereby increasing the time for which the program is executing. Thus, the prior art profiler would increase the size of the loop that is being profiled such that it would not fit in cache memory, and consequently must be stored in main memory. Since main memory is significantly slower than cache memory, the prior art profiler does not determine an accurate time of execution of the loop.
In order to determine characteristics about the loops, the compiler would use standard algorithms to recognize loops, such as an algorithm known as interval analysis. Interval analysis determines the entry and exit points of the loop. In order to collect information about the loops, the prior art profiler will place some instrumentation slots at entry points in the loop, so that the profiler will know that a particular loop has been entered. Also, the profiler will place instrumentation slots at the exit points which will allow the profiler to decide whether the loop has been exited or not. The placement of these instrumentation slots is very critical because these instrumentation slots are the places wherein control enters and exits the loop and profiler collects information about the loop.
The prior art technique only works for loops with a single entry point and either a single or multiple exit points. This is because the technique is derived from function level profiling and functions have only one entry point and one exit point. The instrumentation slots are placed just before the entry point of the function and then just after the exit point of the function to indicate that control has entered the function and control has exited the function. This technique was adapted for collection information for the loops. However, loops are more difficult to collect information than functions, because loops can typically have multiple entry points and multiple exit points unlike functions.
Thus, the prior art profiler and profiling techniques subvert the collected information by changing the behavior and characteristics of the loops. Moreover, the prior art profiler and profiling technique cannot profile many types of loops, particularly loops that have multiple entry points or nested loops with common exits.