1. Field of the Invention
The present invention generally relates to data processing, and more specifically, the invention relates to counting instructions executed by programs running on data processing systems.
2. Background Art
In analyzing and enhancing performance of a data processing system and the applications executing within the data processing system, it is helpful to know which software modules within a data processing system are using system resources. Effective management and enhancement of data processing systems requires knowing how and when various system resources are being used. Performance tools are used to monitor and examine a data processing system to determine resource consumption as various software applications are executing within the data processing system. For example, a performance tool may identify the most frequently executed modules and instructions in a data processing system, or may identify those modules which allocate the largest amount of memory or perform the most I/O requests. Hardware performance tools may be built into the system or added at a later point in time.
Currently, processors have minimal support for counting carious instruction types executed by a program. Typically, only a single group of instructions may be counted by a processor by using the internal hardware of the processor. This is not adequate for some applications, where users want to count many different instruction types simultaneously. In addition, there are certain metrics that are used to determine application performance (counting floating point instructions for example), that are not easily measured with current hardware. Using the floating point example, a user may need to count a variety of instructions, each having a different weight, to determine the number of floating point operations performed by the program A scalar floating point multiply would count as one FLOP, whereas a floating point multiply-add instruction would count as 2 FLOPS. Similarly, a quad-vector floating point add would count as 4 FLOPS, while a quad-vector floating point multiply-add would count as 8 FLOPS.