1. Technical Field
The present invention relates to an apparatus for data processing in general, and in particular to a condition code register. Still more particularly, the present invention relates to a condition code register architecture for supporting multiple execution units.
2. Description of the Prior Art
Pipelined processors utilize pipelining to increase the rate of instruction execution by allowing a new instruction to begin execution before a previous instruction finished execution, and superscaler processors can issue multiple operations per cycle. Thus, instruction-level parallelism available in programs can be exploited by pipelined, superscalar, or pipelined-superscalar processors. However, this potential parallelism also requires that instructions be fetched at a sufficient rate. Conditional branching instructions presents a problem for instruction fetching because an instruction fetch unit cannot know with certainty which instructions to fetch until the conditional branch instruction is resolved. In addition, when a branch is detected, the target address of the instructions following the branch must be available before the execution of those instructions.
By utilizing branch prediction techniques that are known to have an accuracy rate of at least 95 percent, a prediction unit can be used to predict the outcome of branch instructions, allowing the instruction fetch unit to fetch subsequent instructions according to the predicted outcome. These instructions are then speculatively executed to allow the processor to make forward progress during the time the branch instruction is being resolved. If the prediction is correct, then the results of the speculative execution can be accepted as correct results, which greatly improves processor speed and efficiency. However, if the prediction is incorrect, the completely or partially executed instructions must be xe2x80x9cflushedxe2x80x9d from the pipeline, and execution of the correct branch must be initiated.
Early processors executed instructions in an order determined by the compiled machine-language code running on the processor and so are referred to as in-order or sequential processors. For superscalar processors, multiple pipelines can simultaneously process instructions when there are no data dependencies between the instructions in each pipeline. These instructions can be processed in different pipelines in an order which is not the order of the instructions in memory. These processors are referred to as out-of-order processors. Thus, greater parallelism and higher performance can be achieved by out-of-order processors having multiple pipelines in which instructions are processed in parallel in any efficient order by taking advantage of any parallel processing opportunities that may be provided by the compiled machine-language code.
Although out-of-order processors greatly improve processing throughput, they also increase processing complexity as compared to sequential processors. For example, out-of-order execution may result in conflicts between instructions attempting to use the same registers even though these instructions are otherwise independent.
Instructions generally produce two types of actions during execution: storing results that are directed to an architectural register location and/or setting condition codes that are directed to one or more architectural condition code registers. The results and the condition codes for any instruction that are speculatively executed cannot be stored in the architectural registers until all conditions prior to the instruction have been resolved. For example, several execution units may be capable of generating condition codes according to pipeline templates of varying lengths. When an execution unit generates a condition code, the generated condition code must be synchronized with condition codes from other execution units before the generated condition code can be committed.
There are many approaches for maintaining program order in the condition code. One approach is to stall the so pipelines while executing a long latency condition code-updating instruction. The pipelines should be stalled long enough for the condition code to be written before the condition code can be read or written by a subsequent instruction. However, this approach requires the addition of an arbitration circuit for controlling which pipeline can write the condition code, and creates a performance problem if the long latency condition code setting instructions are important. Another approach is to extend all of the condition code pipelines to the length of the longest latency unit. Although this approach simplifies the condition code update arbiter, it necessitates a forwarding system to route results that are not yet committed and requires a more complex stall generation circuit that activates when a forward is not possible.
Consequently, it would be desirable to provide an improved condition code register architecture for supporting multiple execution units.
In accordance with a preferred embodiment of the present invention, a master execution unit is coupled to a master condition code register such that condition codes generated by the master execution unit are stored in the master condition code register. In addition, a non-master execution unit is coupled to a shadow condition code register such that condition codes generated by the non-master execution unit are stored in the shadow condition code register. A tag unit is coupled to the master execution unit and the non-master execution unit such that an entry within the master condition code register can be read only when a corresponding entry within the tag unit is referenced to the master execution unit or the master condition code register.
All objects, features, and advantages of the present invention will become apparent in the following detailed written description.