1. Field of the Invention
This invention relates generally to the field of computer processors. More particularly, the invention relates to an apparatus and method for flexible, accurate, and/or efficient code profiling.
2. Description of the Related Art
Program code “profiling” is a form of dynamic program analysis which gathers information as a program executes. Profiling may be used, for example, to determine the execution time of certain program functions as part of a debugging process. “Sampling” is a form of program code profiling in which the sampling profiler probes a target program's program counter at periodic intervals (e.g., using operating system interrupts). “Instrumentation” is yet another form of program code profiling in which additional instructions are added to existing program code to collect the necessary information.
One problem which exists is that current profiling techniques affect the operation of the underlying program code, typically reducing performance and resulting in inaccurate results. For example, if additional profiling instructions are used, the extra overhead resulting from the profiling instructions implies that either: (i) simple profiling models are used, or (ii) profiling is only performed during a very small time window. These two solutions sacrifice profiling accuracy in order to reduce the costs to obtain profile information. In addition, in this case, the extra instructions may have collateral effects on the events being profiled, yielding imprecise profile data. In fact, most systems today only profile the execution frequency of basic blocks and branch destinations. However, numerous event types could potentially be profiled to leverage sophisticated optimizations (e.g., L1 cache misses, branch mis-predictions, translation lookaside buffer (TLB) misses, etc). The problem with current hardware support to gather this information is that it does not associate accurately the occurrences of events and the ratios of occurrences/not occurrences of such events with individual instructions.
Several processors already include some sampling mechanisms in order to collect profiling information. In these cases, the user can specify a software service routine to be invoked when certain execution characteristics are met. In a typical usage scenario, the user programs a routine to be invoked periodically every, for example, 100,000 retired instructions. The routine then accesses a hardware structure in which the addresses of the last N taken branches are recorded. The routine reads them out and accumulates them in memory. The value for ‘N’ is a hardware implementation parameter and it is normally quite small (e.g., 4). Moreover, with such kind of profiling schemes it is not possible to obtain the ratio between the number of occurrences (taken) and not occurrences (not taken) for a given instruction/event pair (e.g., a conditional branch retired), as the hardware only records the last N occurrences and does not record “not” occurrences. If it did, the routine would need to be invoked very frequently (every N conditional branch instructions if possible), resulting in significant overhead. Furthermore, these schemes do not offer the option of specifying a filtering address range to identify certain portions of program code for profiling. Hence, the obtained profile information may belong to any instruction, and potentially to instructions which do not require optimization.
Current processors also provide interfaces to monitor the behavior of an application. However, in these implementations, profiling information is obtained at a coarse grain and merely identifies whether a small or large amount of the desired events occurred. Once again, using these techniques, it is not possible to obtain the ratio between occurrences and not occurrences of such events with individual instructions.
In summary, there is currently no simple, flexible and inexpensive mechanism to obtain accurate profiling information.