1. Technical Field
The present invention relates to an improved data processing system, and in particular, the present invention is directed to a system and method for randomly or pseudo-randomly, selecting instructions for performance analysis in a microprocessor without systematic bias.
2. Description of Related Art
A typical data processing system utilizes processors to execute a set of instructions in order to perform a certain task, such as reading a specific character from the main memory. However, as the number of tasks required to be executed by the processor increases, the efficiency of the processor's access patterns to memory and the characteristics of such access become important factors for engineers who want to optimize the system. In addition with multiple execution engines and/or multiple threads (e.g. in a superscalar, SMT processor), analysis of the threads' utilization of the processor's resources is also critical. To analyze the performance of a microprocessor, it is useful to know how frequently certain events associated with selected instructions occur.
Currently, the prior art contains mechanisms that can count occurrences of software-selectable events, such as cache misses, instructions executed, I/O data transfer requests, and the time a given process may take to execute within a data processing system. One such mechanism is a performance monitor. Microprocessors may contain performance monitoring logic that counts the frequency of these selectable events. A performance monitor performs monitoring on selected characteristics to assist analysis of a system by determining a machine's state at a particular time. This analysis provides information of how the processor is used when instructions are executed and its interaction with the main memory when data are stored. In addition, the performance monitor may provide the amount of time that has passed between events in a processing system. The performance monitor provides counts of events that may be used by engineers to analyze system performance. This analysis may cause application code changes such as possible relocation of branch instructions and memory accesses to further optimize the performance of a system. Moreover, data may be gathered by the performance monitor on how the processor accesses the data processing system's level 1 and level 2 cache, and main memory in order to identify performance bottlenecks that are specific to a hardware or software environment.
Events within the data processing system are counted by one or more counters within the performance monitor. The operation of such counters is managed by control registers, which are comprised of a plurality of bit fields. In general, both control registers and the counters are readable and writable by software. Thus, by writing values to the control register, a user may select the events within the data processing system to be monitored and specify the conditions under which the counters are enabled.
One method to generate detailed performance monitoring and tracking is through instruction marking. Instruction marking is used to identify the frequency of selectable events as the instructions are being executed. With instruction marking, an instruction is selected for monitoring and any events associated with that instruction are reported as “marked” events. Instruction marking is especially useful in superscalar processors, since multiple instructions may be processed simultaneously as a group and any of the concurrent instructions may cause events which are monitored by the performance monitoring hardware. As a result, it is difficult to identify which instructions caused a particular event.
Existing systems employ a two-stage approach to mark instructions. The first stage of marking instructions is used to select the instructions that are eligible to be matched against operational code (opcode) mask values. In this stage, the instructions are selected to be marked in the instruction fetch unit by the Instruction Match CAM (IMC). The IMC first selects the instructions that are eligible to be matched and then compares the selected instructions to opcode/extended opcode mask values in each of the IMC array rows. If an instruction matches one or more IMC arrays masks, a mark is associated with the instruction and stored with the instruction in the instruction cache. When this instruction is fetched from the instruction cache, the instruction and mark are sent and stored in the instruction buffer.
In the second stage, existing systems use a fixed queue position or pseudo-randomly pick an instruction within a wide dispatch group to mark. If this selected instruction has a mark set in the instruction buffer, a mark is also sent with the instruction when it is dispatched to an execution unit.
A user may select events within the system to be monitored and specify the conditions under which the counters are enabled. Since it is considered unnecessary and highly impractical to monitor every instruction that is executed by a microprocessor due to the extremely large number of instructions that are executed in a short period of time, performance monitoring is typically enabled for only a sample of instructions, or marked instructions. Using instruction sampling, one or more instructions are selected, i.e. sampled, and detailed information about the sampled instruction is collected as the instructions execute.
However, although a pseudo-random sampling for marking instructions may be performed, there is a tendency of introducing a bias towards certain instruction streams. In other words, certain instructions are sampled and marked at a disproportionate amount than what is desired for a random sampling of executed instructions. This bias can occur due to the interdependencies between instructions in the instruction dispatch groups. For example, during group formation instructions are not evenly distributed to all available slots in the dispatch group. There also may be bias introduced by reliance on available but limited execution resource in the processor itself (e.g. not enough Fixed Point Units to service all slots at once). In addition, certain dependencies or “hazards” exist when trying to execute multiple instructions within a code stream simultaneously. For example, as a result of the above biasing, the first slot of the instruction group may be four times more likely than the last slot of the group to be occupied by an instruction at any given time. Also, some instruction slots may be reserved for certain instruction opcodes or types, or that limitation may be dependent on the contents of the other slots in that group or other groups being processed. As a result, some instruction opcodes may be inadequately represented, while other instructions have a much higher likelihood of being marked. Bias can also be introduced by instruction branch loops having the same number (or modulo the number) of instructions as the fixed queue used to mark the instructions. Consequently, biased marking provides an inaccurate view of the performance of a machine.
Therefore, it would be advantageous to have an improved method and system for marking instructions to be able to identify the frequency of certain events associated with selectable instructions without introducing any bias in marking a random distribution of instructions.