The present invention relates generally to program profiling, and relates more particularly to a technique for hot spot detection and monitoring that may be implemented in hardware to support adaptive runtime optimization.
Traditionally, program runtime performance has been enhanced through optimization performed in connection with compilation. In some cases, compiler optimization is based upon profile information (such as statistics about frequency of execution of conditionally executed statements) gathered from execution of a program with a set of sample inputs. This is an inherently static technique, however, since the program performance is optimized only for a the set of sample inputs, possibly at the expense of other inputs. In other words, if the inputs to the program differ from the inputs considered by a compiler during optimization, the optimized program may actually perform worse on the different inputs than an unoptimized program.
A number of different techniques have been used to gather profile information about programs. Traditionally, the program is augmented with additional instructions that incrementally collect a profile of the program as it executes. Since these additional instructions are executed frequently, significant slow down of the program being profiled is often observed. Recently, low-overhead profiling techniques have been proposed where the microprocessor contains extra hardware designed to collect profile information. The profile information may then be analyzed to determine if the program performance may be improved or optimized. The basic approach of this technique, however, requires gathering profile information for the program""s entire execution which is then averaged into a large database that is then fed back into a static compiler. The disadvantage of this technique is that it does not automatically filter out data that is most useful for optimization. Instead, another process must analyze the complete data set, searching for optimization opportunities. Because the profile data for a program""s entire execution is often quite large, storing the data may require more space than is available, and analyzing the data may take an undesirably long time. Furthermore, because an entire profile of the program must be continuously maintained at runtime, the profile represents only average behavior across an extended period of time, and it is difficult to detect variations in program behavior because only long term average behavior is tracked.
Although such static optimization techniques are useful to increase program performance in some respects, additional opportunities for optimization may be realized through dynamic or adaptive techniques capable of monitoring and reporting program performance based upon specific runtime information. In particular, certain runtime optimizers have been proposed that gather program performance information during execution of a program and such information is made available to a dynamic optimizer that may modify program code while the program is running in an attempt to improve program execution accordingly.
U.S. Pat. No. 5,151,981 for Instruction Sampling Instrumentation discloses a hardware implemented technique that allows for periodic sampling of instructions that are executing. According to the technique, each time a predetermined instruction is executed, information associated with the execution of the instruction is stored. Although this approach results in gathering of information that may be useful for runtime optimization, the approach requires interruption of the processor to copy out data gathered when the memory buffer is full and also requires interruption of the processor when the predetermined instruction to be monitored is changed. This technique does not provide a means for automatically detecting optimization opportunities among the data that are collected.
U.S. Pat. No. 5,452,457 for Program Construct and Methods/Systems for Optimizing Assembled Code for Execution discloses a software implemented technique that includes gathering, by means of executing additional instructions, information that may be used to perform optimizations that are environment-specific. However, as a software implemented technique, it requires substantial overhead that detracts from its overall objective of improving program performance.
The foregoing limitations and disadvantages of the prior art are overcome by the present invention which provides a technique that may be implemented in hardware for supporting runtime optimization. An important observation fundamental to the present invention is that when a collection of intensively executed program blocks also has a small static footprint, it represents a highly favorable opportunity for runtime optimization. Such a set of blocks and their corresponding periods of execution are referred to as xe2x80x9chot spotsxe2x80x9dxe2x80x94a set of intensively executed instructions. The present invention involves dynamic runtime hot spot detection, as well as hot spot monitoring to determine whether program execution has strayed from a set of hot spots that have previously been detected. Thus, a hot spot detector may be selectively enabled or disabled, depending upon whether detected hot spot portions of a program are currently being executed. Once a hot spot is detected, information regarding the hot spot may be supplied to an operating system or other supervisory process for an opportunity to optimize the hot spot to enhance overall program performance.
Several advantages of the present invention over the techniques proposed by the prior art are realized. First, the technique of the present invention provides for rapidly detecting hot spots by continuously monitoring program execution rather than relying upon statistical sampling as taught in the prior art. Thus hot spot information is almost immediately available for optimization opportunities during execution. Second, hot spots are detected during timeslices of execution, rather than over the entire execution of a program, so that optimizations can adjust the program to changes in its behavior throughout execution. Third, a processor is interrupted or an operating system or other supervisory process is notified only when a hot spot has been detected, resulting in minimal interference with normal execution. Fourth, since the technique may be implemented in hardware independent of other hardware mechanisms in a data processing unit, it will not significantly degrade processor performance, complicate the rest of the processor design or increase cycle time.