Several current microprocessor designs incorporate superscalar, pipelined, out-of-order execution architectures. Such architectures simultaneously handle multiple instruction groups of one or more programs that are distributed to multiple pipeline processing stages of the processor. Such architectures are also able to distribute instructions to the various processing stages in orders other than that specified by the program, subject to instruction dependencies.
Processing instrumentation is incorporated into microprocessors to support analysis of executing programs by, for example, facilitating identification of processing performance bottlenecks for the computer program being analyzed. Common instrumentation techniques include collecting instrumentation data at a period related to the processing cycle time of the microprocessor. For example, instrumentation data for the instruction currently executing in the pipeline of the microprocessor is collected upon an occurrence of a sample pulse that is provided every one million processing cycles of the microprocessor. The collected instrumentation data can include, for example, the instruction's opcode and what is causing that instruction to stall. Reasons for an instruction's stalling include, for example, waiting for a data dependency or a cache miss.
Typically, there are many instructions executing at a given time in a superscalar, pipelined, out-of-order execution microprocessor. In assessing processing bottlenecks, the best indication of which stalls are delaying the processor, versus ones that may be hidden by other instructions, is to look at the Next-To-Complete (NTC) instruction or group of instructions. Given that instrumentation data samples are taken at random times to not skew the observed results, it is difficult to collect information about the NTC group of instructions without collecting information on all instructions active in the pipeline. There are typically many instructions being simultaneously handled by the processor and active in the processing pipeline of a superscalar, pipelined, out-of-order execution microprocessor and there are many stages of the processing pipeline that require monitoring for instruction stall conditions. Staging the stall conditions through the pipeline often adds complexity as the size of the pipeline and the number of simultaneously active instruction groups increases. For example, in a pipeline that is around twenty processing cycles long with a stall occurring for an instruction during processing at cycle six of the pipeline, the condition for that stall is required to be staged for fourteen cycles to the completion cycle of that instruction to properly indicate why the group of instructions including that stalled instruction may not be completing. Such staging for all required processing pipeline stages and for all active instructions requires a large amount of latches to implement. This complexity increases for out-of-order execution processing designs.
Therefore, a more efficient processing instrumentation architecture for superscalar, pipelined, out-of-order execution microprocessor is required to improve the processing performance monitoring of such processors.