The present invention relates generally to software, and more specifically to the profiling of software.
When software is compiled, it is converted from a high level xe2x80x9chuman readablexe2x80x9d set of statements to a set of low level xe2x80x9cmachine readablexe2x80x9d instructions. The control flow of the machine readable instructions can be very much like that of the human readable statements, or can be very different. During compilation, software can be xe2x80x9coptimizedxe2x80x9d to increase the speed with which the final machine readable code executes.
Programs, or portions of programs, in addition to being optimized when compiled (i.e. at xe2x80x9ccompile-timexe2x80x9d), can also be optimized when the software is executed (i.e. at xe2x80x9crun-timexe2x80x9d). This xe2x80x9cdynamic optimizationxe2x80x9d can benefit from profiling information that typically includes the frequency with which portions of the program execute. Programs can be profiled while operating on test data, or while operating on actual end-user data. By profiling software in the end-user environment, the resulting profiling information reflects actual usage patterns, and can aid in the dynamic optimization process.
Efficient profiling at run-time can be difficult. Typical algorithms for collecting profiling information at run-time call for inserting extra program instructions into each profiled block of the end-user program. These algorithms can incur overhead penalties in the range of 3% to 40%. Examples of these algorithms can be found in: Thomas Ball and James Larus, xe2x80x9cOptimally profiling and tracing programs,xe2x80x9d ACM Transactions on Programming Languages and Systems, 16(3): 1319-1360, July 1994; Thomas Ball and James Larus, xe2x80x9cEfficient Path Profiling,xe2x80x9d MICRO-29, December 1996; and Alexandre Eichenberger and Sheldon M. Lobo, xe2x80x9cEfficient edge profiling for ILP-processors,xe2x80x9d Proceedings of PACT ""98,xe2x80x9d 12-18, October 1998.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for an alternate method and apparatus for profiling software.
In one embodiment, a computer-implemented method of measuring a frequency of execution of a software program block includes reading a branch instruction from the software program block and decoding the branch instruction. The method further includes generating at least one update instruction to increment a counter, wherein the counter includes a counter value that represents the frequency of execution of the software program block.
In another embodiment, a method of instrumenting software includes inserting a profiling instruction configured to load a base address register in each compiled element, and separating each compiled element into at least one single-entry region. The method further includes inserting a second profiling instruction configured to load an offset register in at least one of the at least one single entry region, and modifying at least one instruction within at least one of the at least one single entry region to facilitate profiling of the at least one single-entry region.
In another embodiment, a method of profiling the execution of a software region includes reading an instrumented profiling instruction from the software region, extracting an identification (ID) value from the instrumented profiling instruction, and incrementing a value at a counter location, the counter location being a function of the ID value.
In another embodiment, a processor includes an execution unit configured to produce profiling information when encountering an instrumented program instruction in a user program, and a buffer adapted to receive the profiling information from the execution unit as the execution unit executes the user program. The profiling information of this embodiment can include a plurality of profile counter update instructions, and the processor can further include profiling hardware for executing the plurality of update instructions.
In another embodiment, a processing system includes a memory device and a motherboard configured to receive the memory device, and a processor coupled to the memory device and to the motherboard. In this embodiment, the processor can include a buffer for holding update instructions to be executed during free slots of a user program, and an execution unit configured to load the buffer after reading an instrumented profiling instruction from the memory.