1. Field of the Invention
The invention relates to software performance profiling support in microprocessors, and more particularly to a microprocessor-based device incorporating an on-chip trace cache capable of capturing software performance profile data.
2. Description of the Related Art
Software performance profiling refers to examining the execution times, frequencies and calling patterns of different software procedures within a software program. Performance profiling can be a very useful tool to a software engineer attempting to optimize the execution times of software applications. Various techniques for performing software profiling are currently used, including many base don statistical analysis. When performing software profiling, execution times and subroutine call linkage are sometimes captured by external (off-chip) instrumentation that monitors the system buses of the computer system which is executing the software. Alternatively, software can be "instrumented" or modified to provide profiling information directly to the computer system on which the software is executed.
The growth in software complexity, coupled with increasing processor clock speeds, has placed new burdens on application software developers and complicated the task of performance profiling. The costs associated with developing, debugging and optimizing new software products is now a significant factor in processor selection. Processor features that adequately facilitate software debug, including performance profiling, result in shorter customer development times and increase the processor's attractiveness for use within industry. The need to provide software debug support is particularly acute within the embedded products industry, where specialized on-chip circuitry is often combined with a processor core.
Logic analyzers, read-only memory (ROM) emulators and in-circuit emulators (ICE) are frequently employed to capture software performance profiling data. In-circuit emulators provide certain advantages over other debug environments, offering complete control and visibility over memory and register contents, as well as overlay and trace memory in case system memory is insufficient. However, use of traditional in-circuit emulators, which involves interfacing a custom emulator back-end with a processor socket to allow communication between emulation equipment and the target system, is becoming increasingly difficult and expensive in today's age of exotic packages and shrinking product life cycles.
In another approach (the "Background Debug Mode" by Motorola, Inc.), limited on-chip debug circuitry is provided for basic run control. Through a dedicated serial link requiring additional pins, this approach allows a debugger/performance profiler to start and stop the target system and apply basic code breakpoints by inserting special instructions in system memory. Breakpoint registers are used to generate off-chip trigger pulses that function to start and stop timers. The serial link, however, does not provide on-chip software performance profiling capture capabilities--additional dedicated pins and external trace capture hardware are required to provide profile data.
As mentioned, software itself is sometimes instrumented so that it can be analyzed to collect performance profiling data. Instrumented code is often generated by a compiler configured to insert profiling information in order to analyze selected procedures. For example, on procedure call prologues and exit epilogues, the compiler may insert code used to activate counters that track execution times. As a specified program run call is executed, a jump to an inserted routine is performed to mark a counter/timer. The execution time of a parent procedure that calls other, ancillary procedures can be determined by subtracting the execution time(s) of the ancillary procedures from the total execution time of the parent procedure. By analyzing all of the procedures of a module, the total execution time of the module can be calculated. Of course, the execution time of a given procedure may vary depending on the state of variables within the procedure, requiring statistical sampling to be utilized.
Thus, many current solutions for software performance profiling have a variety of hardware and software limitations, including: the need to instrument code, increased packaging and development costs, circuit complexity, and bandwidth matching difficulties. A low-cost procedure for capturing profile data would be greatly desirable, especially because the limitations of the existing solutions are likely to be exacerbated in the future as internal processor clock frequencies continue to increase.