This disclosure relates to semiconductor devices, and more particularly to microprocessors that control the operation of electronic devices, plus electronic devices that use such microprocessors.
A microprocessor, also known as a Central Processing Unit (CPU), works by executing instructions. Some instructions result in branching points, where one path of execution can be chosen over another. A microprocessor can have increased speed if it makes a correct speculative prediction about which path will be chosen, and executes in advance instructions along that path. Such CPUs are known as out-of-order CPUs. The speed benefit diminishes, however, when there was a misprediction, and recovery is required.
A challenge with out-of-order CPUs is hazards such as Write After Write (WAW) and Write-After-Read (WAR). These hazards are avoided by register renaming, which is accomplished with the help of a rename table that keeps track of the renamed source and destination registers.
A remaining problem, however, is that, anytime there is a mispredicted branch instruction, the rename table has to be entirely flushed. Flushing is a problem because, since the branches can be executed out-of-order, there could be instructions waiting to retire which are older than the mispredicted branch instruction. The rename information for these older instructions has to be rebuilt into the rename table.
The problem, then, manifests itself as delay. During the rebuild process the rename logic has to stall the front end of the pipe from sending new instructions for renaming. This stalling results in delay, which amounts to the penalty for branch misprediction. The penalty depends not only on the rebuild latency, but also on the redirection latency and the depth of the front end of the pipe.
For reducing the stalling, check-pointing schemes have been proposed that are prior to the dispatch stage, as part of the rename pipeline. In those schemes, traditionally each branch instruction starts a new check-point window. This approach is area expensive, since it requires as many check-points as there are in-flight branches allowed in the machine.