To enhance system performance, it is helpful to know which modules within a system are the most frequently executed. These most frequently executed modules are referred to as "hot" modules. Within these hot modules, it is also useful to know which lines of code are the most frequently executed. These frequently executed code segments are known as "hot spots."
A programmer hoping to improve system performance should focus his or her efforts on improving the performance of the hot modules and hot spots within those modules. Improving the performance of the most frequently executed modules and code segments will have the most effect on improving overall system performance. It does not make sense to spend much time improving the performance of modules or code segments which are rarely executed, as this will have little, if any, effect on the overall system performance.
Many modern processors contain hardware capability which allows performance data to be collected. For example, most modern processors have the capability to measure cycle time. Many modern processors also have the ability to count other items, such as cache misses, floating point operations, bus utilization, and translation look-aside buffer (TLB) misses. To count cache misses, for example, a bit or a sequence of bits within a control register is set to a predetermined code. This bit sequence tells the processor to increment a counter every time there is a cache miss. When the bit sequence is reset, the processor stops counting cache misses, and the total number of cache misses can be read from another register or from a memory area.
Once a programmer determines a code segment (i.e. a hot spot) that needs further performance analysis, the programmer then "instruments" the code to be tested. For example, suppose the programmer determines that a particular code segment, consisting of twenty lines of code, is a hot spot that needs further performance analysis. The programmer will put a "hook" (i.e. an instruction or group of instructions) in front of the twenty instructions. The hook will typically be a jump instruction, causing execution to jump to an instrumentation routine. The instrumentation routine will start some type of performance analysis. For example, the instrumentation routine may set an appropriate bit or set of bits in a control register to turn on cache miss counting in the processor. The instrumentation code then returns control to the instructions being tested. At the end of the code segment being tested, the programmer will insert another hook. This hook typically jumps to an instrumentation routine which turns off performance testing. In the example given, the instrumentation routine would set the appropriate bit or bits in the control register to stop cache miss counting, and then would store the cache miss count.
One problem with this type of instrumentation is that the instrumentation routines may affect the performance results of the code being analyzed. For example, if any of the instructions in the instrumentation routines are in the same cache congruency class as an instruction in the code being tested, an instrumentation instruction could cause a tested instruction to be forced out of the instruction cache. This would affect the cache hit/miss ratio and the cycles per instruction (CPI) measurement for the code being tested. Similar problems could occur with data cache measurements if any data accesses by the instrumentation routine forced data out of the data cache. Similar problems could also occur with other types of measurements, such as translation lookaside buffer (TLB) measurements.
Consequently, it would be desirable to have a minimally intrusive system and method for measuring performance in an information handling system. It would be desirable if the system and method greatly decreased the chance of instrumentation code or data impacting the performance measurements of tested code.