The present invention relates generally to a dynamically-scheduled microprocessor, and particularly to a guard outcome predictor for a dynamically-scheduled microprocessor that executes predicated instructions.
A predicated instruction is a machine-level instruction whose operation is performed only if a specified condition is true. (For brevity, the term xe2x80x9cinstructionxe2x80x9d will be used hereafter for xe2x80x9cmachine-level instructionsxe2x80x9d.) A conditional move, CMOV, is an example of a predicated instruction. A predicated instruction includes a guard and guarded instructions. The guard includes a guard operator and at least one guard source. The guard operator is applied to the guard sources to determine whether the specified condition is true. Possible guard operators include, for example: equal-to-zero, EQ; greater-than-zero, GT; and not-equal-to-zero, NEQ. A guard source may be a constant encoded within the predicated instruction or the contents of a register whose ID is encoded within the predicated instruction. Application of the guard operator to the guard sources yields a guard outcome. When the guard outcome is TRUE then the guarded instructions are executed. On the other hand, if the guard outcome is FALSE, then execution of the guarded instructions must not affect the architectural state of the microprocessor. In other words, if the guard outcome is FALSE execution of the guarded instructions must not affect the state visible to an application program.
Execution of predicated instructions impacts the design and performance of dynamically-scheduled microprocessors, particularly those using register renaming. A brief description of dynamically-scheduled microprocessors and register renaming helps illustrate the difficulties presented by predicated instructions. A dynamically-scheduled microprocessor is one in which instructions may be issued to the functional units for execution in an order that is different from the order in which the instructions are fetched. To increase the number of instructions that may be issued in any given cycle, many dynamically-scheduled microprocessors use register renaming to eliminate write-after-write and write-after-read dependencies. Register renaming involves mapping the architectural registers named in the instructions to actual, physical, registers. Register renaming typically occurs after an instruction has been fetched.
FIG. 1 illustrates, in block diagram form, a prior dynamically-scheduled microprocessor. The microprocessor includes a data and instruction cache and a series of cascaded stages: a fetcher, mapper, dispatcher, execution pipes, and a retire unit. The fetcher fetches instructions from the memory hierarchy and decodes them to determine the operation of the instruction as well as all of the architectural registers and/or constants required for instruction execution. The order in which instructions are fetched is called the fetch order. For each fetched instruction, decoding yields zero or more architectural source registers and zero or more destination architectural registers. These two sets of architectural registers are then renamed to physical registers by the mapper. That is, for each instruction I in fetch order, the mapper maps I""s source architectural registers to the physical registers that contain the corresponding latest values. Then, if instruction I names one or more destination registers, the mapper maps each of these registers to a unique and free physical register. For example, if instruction I names a single destination architectural register Ldx, the mapper will create a mapping between Ldx and some free physical register P1. This mapping will remain active until another instruction I2 that names Ldx as one of its destination registers subsequently enters the rename stage. When such an instruction occurs, the mapper creates a mapping between Ldx and a free physical register P2 and in the process, unmaps P1 from Ldx. As a result of this unmapping, for all instructions subsequent to I and up to and including I2 that name Ldx as a source register, the mapper will map this source register to P1. That is, P1 will contain the latest value of Ldx for these instructions. After register renaming, instructions are placed in the dispatch buffer of the dispatcher. The dispatcher dynamically selects from its buffer the next instruction to be issued to the execution pipes. The dispatcher issues instructions when their input dependencies have been resolved and when a suitable functional unit of the execution pipes is available. After the execution pipes have completed execution of an instruction, the result may be written back into a destination register, if one was allocated during register renaming. When all architectural constraints of an instruction have been satisfied, the retire unit retires, or commits, the instruction results to the architectural state of the microprocessor. If the retire unit commits results in program order, when an instruction is committed that names one or more architectural destination registers, one or more physical registers are freed. That is, assuming the retire unit commits instruction I2 above, during the process it will free physical register P1. Physical register P1 can be freed at this point because there are no longer any instructions in the system that require the value contained in P1. In general, physical registers can be freed only when their being freed will not prevent the processor from recovering and resuming execution after a mispredicted branch or a non-fatal exception. Recovery and execution resumption requires, among other things, the ability to restore the register mapping and the list of free physical registers. A number of approaches to state recovery exist.
The design and performance impact upon a dynamically-scheduled microprocessor of supporting predicated instructions arise from the additional data dependencies of predicated instructions. Predicated instructions include three sources of data dependencies, as compared to the single source of data dependency of a non-predicated instruction. The first input-dependency source, which is unique to predicated instructions, relates to the guard sources. To determine the guard outcome, a microprocessor must read the value of the guard sources and apply to it the guard operator. The second input-dependency source, which is not unique to predicated instructions, relates to the source registers named in the guarded instructions. If the guard outcome is TRUE, then the hardware performs the operations specified by the guarded instructions using the values of the sources for the guarded instructions. The reading of these sources induces an input dependency. The final input dependency relates to the destinations of the guarded instructions and is unique to predicated instructions. The general case is more easily explained using an example. If a guarded instruction I names a destination architectural register Ldst and if the guard outcome is FALSE, those instructions preceding the predicated instruction I in fetch order and those following it must all obtain the same value if they read from Ldst, assuming that no other intervening instructions write Ldst. But, because of the use of register renaming, instructions preceding I will read a physical register Pold while those following I will read another physical register Pnew, assuming that when the renamer renames the registers for I, it unmapped Pold when mapping Pnew to Ldst. To ensure that instructions preceding and following instruction I all obtain the same value, if the guard outcome is FALSE. Pold must be read and its value written into Pnew, The read of this value induces the third input-dependency source. It is possible to implement a more complicated register renaming mechanism that does not introduce this dependency source.
The additional sources of input data dependencies of predicated instructions affect the design and performance of a dynamically-scheduled microprocessor in two ways. First, both the data paths and the control circuitry must be designed to accommodate these dependencies. The scheduling unit must track these dependencies to determine when a predicated instruction may be issued. The processor must be designed to allow the bypassing of the value corresponding to these input dependencies to the predicated instruction. Additionally, the mapper must have sufficient bandwidth to rename the guard sources. The second way in which predicated instructions affect microprocessor performance is by limiting instruction-level parallelism. Instruction-level parallelism is limited because the scheduler cannot issue a predicated instruction until all of its input dependencies have been resolved, even though all of them will not be necessary to execute the predicated instruction. The guard sources are always necessary, as they are used to determine the guard outcome, and thus determines whether the guarded instruction should be executed. If the guard outcome is found to be TRUE when the predicated instruction is executed, the scheduler need not have taken into account the availability of the value contained in Pold when deciding when to issue the predicated instruction to a functional unit. Pold is not required because the previously-mapped register need not be read. On the other hand, if the guard outcome is found to be FALSE, then the scheduler need not have taken into account the availability of the source operands of the guarded instruction when deciding to issue the predicated instructions, since these values will not be needed.
Guard outcome prediction could alleviate some of the issues associated with execution of predicated instructions. Guard outcome prediction would simplify the design of microprocessor data paths and control circuitry by reducing the number of data dependencies that must be tracked. Additionally, guard outcome prediction would increase instruction-level parallelism by allowing instructions to be scheduled at the time all of the data input sources predicted to be necessary were available, rather than waiting for all three of the data input sources of a predicated instruction to become available. Any guard outcome prediction scheme would have to check and account for guard outcome mispredictions.
A body of work addresses the related fields of branch prediction and value prediction. Branch prediction deals with predicting the direction of a conditional branch in an early pipe stage and executing instructions only along the predicted control-flow. Branch prediction permits microprocessor execution to continue while the outcome of a condition is determined. Branch prediction is essential to the performance of superscalar and deeply pipelined microprocessors. Branch prediction also requires checking and accounting for misprediction, which the relevant art addresses. FIG. 2 illustrates a prior branch predictor, which is described in U.S. Pat. No. 5,758,142 to McFarling et al. entitled xe2x80x9cTrainable Apparatus for Predicting Instruction Outcomes in Pipelined Microprocessors.xe2x80x9d Value prediction reduces the issue time of instructions by predicting their input values. Like branch prediction, value prediction also requires checking and accounting for misprediction.
Guard outcome prediction, while similar to branch prediction, differs from it in at least four significant ways. First, guard prediction uses prediction to determine data flow, rather than control flow as in branch prediction. Guard prediction predicts the source of a value to be stored in the destination register named by the guarded instruction. In contrast, branch prediction predicts which instruction should be executed next. Second, as compared to branch prediction, guard outcome prediction can take more time without a performance penalty. Branch prediction is typically accomplished in a single cycle to allow predicted-taken branches to update the fetch address for the immediately following cycle. In contrast, guard outcome prediction can take several pipeline stages to produce its results without performance penalty. Guard outcome prediction can start at the time when instructions are fetched and wait to produce a prediction until the time register remapping is performed for the relevant predicated instruction, which is typically several cycles after the instruction is fetched. Third, as compared to branch prediction, guard outcome prediction requires higher prediction bandwidth. In non-predicated code, typically only a single branch prediction will be required for several instructions. With predicated code, every instruction may be decorated with a guard; i.e., there may be as high as a one-to-one correspondence between instructions and required predictions. Finally, guard outcome prediction rewards correlation of guard outcome predictions. With predicated instructions it is not infrequently the case that multiple instructions may be decorated with identical, or closely related, guards. Correlation of identical and/or closely related guards increases the likelihood of correctly predicting guard outcomes, and hence, of improving an application""s performance.
Thus, a need exists for a guard outcome predictor to simplify the design, and speed-up the execution, of a dynamically-scheduled microprocessor executing predicated instructions.
The guard prediction apparatus of the present invention predicts guard outcomes for predicated instructions, each of which specifies a guard operator to be applied to a guard source to generate the guard outcome. Briefly described, the guard prediction apparatus of the present invention includes a cache, availability logic, a selection circuit, a deduction circuit and write back circuitry. The cache stores previous predictions of guard outcomes for a set of guard sources and guard operators. The availability logic determines whether the cache includes a previous prediction that is relevant to a first guard source and first guard operator and, if so, couples that previous prediction to the selection circuit. The selection circuit generates the final guard outcome prediction by selecting between the previous prediction, if available, and an initial prediction, if a previous prediction is not available. Additionally, the guard prediction apparatus includes the deduction circuit, which deduces from the initial prediction of the guard outcome other consistent guard outcomes for a set of guard operators when applied to the guard source. The write back circuitry writes the initial prediction and the deduced guard outcomes into the cache for future use.