1. Field of the Invention
This invention relates to decoding instructions out of program order within a microprocessor.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance through the use of pipelining, parallel execution, and high clock rates. Pipelining is an implementation technique whereby multiple instructions are overlapped during the execution process. Parallel execution refers to the simultaneously executing multiple instructions in a single clock cycle. As used herein, the term "clock cycle" refers to an interval of time during which the pipeline stages of a microprocessor perform their intended functions. At the end of a clock cycle, the resulting values are moved to the next pipeline stage.
Pipelining has several hazards associated with it. One particular hazard is stalling the pipeline due to branch instructions. When a branch instruction propagates through the pipeline, it is difficult to determine which instructions after the branch should be processed until the results of the branch instruction are know. For example, if the branch instruction is "taken", then the next instruction to be executed after the branch may be located at a particular address that is offset from the branch instruction's address. In contrast, if the branch instruction is "not taken", then the next instruction to be executed may be located at the address immediately following the branch instruction. As a result, the initial stages of the pipeline may be unable to determine which instructions should begin execution in the pipeline following the branch instruction. Thus, the pipeline may stall awaiting the results of the branch instruction.
In order to prevent the instruction pipeline from stalling, microprocessor designers may implement branch prediction schemes to provide the initial pipeline stages with a predicted result for each branch instruction. The initial stages of the pipeline speculatively execute instructions along the predicted path until the branch instruction executes and one of the following occurs: (1) the prediction is found to correct, in which case the instructions continue to execute and are no longer speculative, or (2) the prediction is found to be incorrect, in which case all pipeline stages executing instructions after the branch are flushed and the pipeline starts anew using the correct path.
While parallel execution and branch prediction improve overall instruction throughput for a microprocessor at a given clock cycle, process improvements have led to dramatically increased operating frequencies that have further increased the number of instructions that a microprocessor may execute in a fixed period of time. These advancements have placed increasing importance upon a microprocessor's ability to decode instructions. Instruction decoding typically refers to identifying the different fields within the instruction (e.g., the opcode field and any prefixes or operands) and then expanding the instruction into an internal format so that the microprocessor's functional units may easily execute the instruction.
While RISC (Reduced Instruction Set Computer) microprocessors have been implemented to simplify instruction decoding, microprocessors capable of executing older variable-length instruction sets such as the x86 instruction set have remained commercially important due to the vast amount of software available for the older instruction sets. Furthermore, operating frequencies have climbed so quickly that even RISC microprocessors may eventually need faster methods for decoding instructions.
One proposed method for quickly decoding large numbers of instructions involves using a number of parallel decoders. However, current implementations using parallel decoders have been limited in their throughput because of the "in-order" (i.e., in program order) nature of decoding. Most programs rely upon their instructions being executed in a particular order. This order is referred to as "program order". As previously noted, most modern microprocessors support out-of-order execution. However, these microprocessors must ensure that the instructions that are executed out-of-order do not aversely affect the intended operation of the program. This is accomplished through "dependency checking". Dependency checking refers to determining which instructions rely upon other instructions' prior execution to finction properly. Thus, dependency checking ensures that the only instructions that are executed out of order are those that will not adversely affect the desired operation of the program. For typical dependency checking hardware to operate correctly, it relies upon receiving decoded instructions that are in-order. Thus, typical instruction decoders receive and decode instructions in program order so that the program order will be preserved for the dependency checking hardware (typically the next stage in the instruction processing pipeline).
This in-order configuration affects decoder throughput by causing some decoders to stall in certain instances. For example, when a new set of instruction bytes is received by the decoders, each decoder must wait to output its results (i.e., its decoded instructions) until all decoders before it have output their results. If not, the following pipeline stages may receive the decoded instructions out-of-order.
For these reasons, a method and apparatus for quickly decoding a large number of instructions is desirable. In particular, a method capable of quickly decoding large numbers of instructions out of order is desirable.