1. Field of the Present Invention
The present invention is related to the field of superscalar microprocessors, and more particularly to the sampling of microprocessor instructions for analyzing and optimizing the processor design.
2. History of Related Art
Advanced processors typically provide facilities to enable the processor to count occurrences of software-selectable events and to time the execution of processes within an associated data processing system. These facilities are typically referred as performance monitors. Performance monitoring provides the ability to optimize software that is to be used by the system. A performance monitor may comprise any facility that is incorporated into the processor and is capable of monitoring selectable characteristics of the processors. A performance monitor may produce information relating to the utilization of a processor""s instructions execution and storage control. The performance monitor can provide information, for example, regarding the amount of time that has passed between events in a processing system. A software engineer may use the timing data gathered with the performance monitor to optimize programs by relocating branch instructions and memory accesses (as just two examples). A performance monitor may also be used to gather data about the access times to the data processing system""s L1 cache, L2 cache, and main memory. Using this data, system designers may identify performance bottlenecks specific to particular software or hardware environments. The information generated by performance monitors usually guides system designers toward ways of enhancing performance of a given system or of developing improvements in the design of a new system.
A performance monitor typically includes a register that is configured to count the occurrence of one or more specified events. Typically, a programmable control register permits a user to select the events within the system to be monitored and specifies the conditions under which the counters are enabled. Typically, it is considered unnecessary and highly impractical to monitor every instruction that is executed by a microprocessor due to the extremely large number of instructions that are executed in a short period of time. Instead, performance monitoring is typically enabled for only a sample of instructions. Detailed information about the sample instructions is collected as the instructions execute. Typically, instruction sampling is based upon a deterministic variable such as the instruction""s location within an internal queue of the processor. When the sample instructions are based upon such criteria, the instructions that are sampled for monitoring may not accurately represent the mix of instructions that are being executed by the hardware. Therefore, it would be desirable to implement a method of randomly sampling or selecting instructions for performance monitoring.
The problem identified above is addressed in large part by a microprocessor as disclosed herein. The microprocessor includes a dispatch unit configured to receive a set of instructions from an instruction cache and to forward the set of instructions to an execution unit when the instructions are ready for execution. The dispatch unit may include sampling logic that is configured to select one of the instructions for performance monitoring from the set of instructions. The microprocessor further includes a performance monitor unit enabled to monitor performance characteristics of the selected instruction as it executes. The sampling logic may identify the instruction selected for monitoring as the instruction occupying an eligible position within the set of instructions. The eligible position from which the monitored instruction is selected may vary with each subsequent set of instructions. The sampling logic may include a selection mask that contains an asserted bit that identifies the position within the set of instructions from which the selected instruction is chosen. The selection mask may include a single bit for each position in the set of instructions and may be implemented as a shift register that periodically rotates the eligible position. The rotation of the eligible bit position may occur every clock cycle, every dispatch cycle, or at some another suitable synchronous or asynchronous interval. The selection mask may contain multiple asserted bits and may include a filter circuit that generates a selection vector based on the selection mask where the selection vector includes only a single asserted bit.