Processors (e.g., microprocessors in computers) may include performance monitoring components (e.g., hardware, firmware, software) that are configurable to detect and count occurrences of various events. For example, performance monitoring components may count occurrences of instruction cache misses, data cache misses, events associated with instruction dependencies, events associated with resource conflicts, and so on. Upon detecting and counting an event, the performance monitoring components may capture and record information related to the event. For example, information like the address of an instruction retiring at the time the counted event occurred may be acquired and written to a data store. This information may facilitate understanding why and/or how a processor is performing in a certain manner.
Processors may have a pipelined architecture. For example, early processors may have had a two stage pipeline where the next instruction to be executed was fetched while the current instruction to execute was executed. Thus, an event to be counted may have been associated with either the fetching of an instruction or execution of an instruction. Later processors may have had pipelines with more stages (e.g., fetch, decode, execute, retire). Events to be counted may have occurred as a result of an instruction located at various stages in the pipeline. For example, an instruction retiring may have been an event to be counted. When counted, an instruction pointer and other information associated with the retiring instruction may have been saved.
Processors may also have a parallel architecture. For example, a processor may have two or more pipelines through which sets of instructions may progress. Thus, events to be counted may have occurred as a result of an instruction being processed in various pipelines. Pipelined architectures and/or parallel architectures may interact with multiple execution units. For example, a processor may have two integer processing units and two floating point processing units. Thus, various instructions at various stages of various pipelines may be dispatched to various execution units. Clearly this impacts the efficacy and complexity of performance monitoring systems and methods. Thus, while some useful information could conventionally be acquired from performance monitoring logic by capturing information associated with retiring instructions, other information may have been ignored or considered unobtainable due to complexities associated with multiple pipeline stages, parallel architectures, multiple execution units, and so on.