1. Technical Field
The present invention relates in general to the field of computers, and, in particular, to computer processors. Still more particularly, the present invention relates to an improved method and system for evaluating processing steps that affect an average cycles per instructions (CPI) time for the computer processor.
2. Description of the Related Art
A computer processor is capable of completing one or more instructions every clock cycle. Typically, instructions are completed in groups, which can be processed simultaneously through the use of multiple processing units operating simultaneously in the processor. The processing units are typically dedicated to a specific type of operation, such as performing an arithmetic function on a floating point number, performing an arithmetic function on a fixed point number, loading and storing data, setting processor condition registers, and calculating branching addresses. These multiple processing units typically permit pipelining of instructions, allowing a very high throughput of instructions.
Performance analysis of processors includes the calculation of the average cycles per instruction (CPI) required to complete an instruction. Although each instruction requires multiple steps and thus multiple clock cycles to complete, modem processors are able to process multiple instructions concurrently using multiple processing units as described above, thus reducing the average CPI time.
As the term implies, CPI describes the average number of clock cycles required to complete instructions. For example, if a processor takes an average of one clock cycle to complete each instruction, then the CPI is 1. If an average two clock cycles are required to complete each instruction, then the CPI is 2. Conversely, if an average of only one clock cycle is required to complete two instructions, then the CPI is 0.5 (½).
The processor's CPI performance is dependent on multiple factors, including the number of cycles to actually process the group of instructions in the processing units located in the processor, including delays caused by data cache misses, data dependency and execution time within a processing unit. In addition, CPI performance is affected by flushes to a completion table corresponding to a group of instructions to be or being processed. A completion table flush may be caused by any of several reasons, including a global flush of all completion tables and pipeline stacks in the processor, an instruction branch misprediction or an instruction cache miss. CPI performance is affected not only by the time required to re-fill the completion table, but by the time during which the table is empty as well.
In order to provide a way to evaluate the reasons for CPI delay, there is a need for a method and system to monitor the average time wasted for while a completion table is empty and then re-filled. Preferably, the method and system monitors and quantifies reasons for a completion table flush.