1. Field of the Invention
This invention relates generally to processor pipelines and, more particularly, to a table for tracking operand locations through such a pipeline.
2. Description of the Related Art
Computers and many other types of machines are engineered around a "processor." A processor is an integrated circuit that executes programmed instructions on data stored in the machine's memory. There are many types of processors and there are several ways to categorize them. For instance, one may categorize processors by their intended application, such as microprocessors, digital signal processors ("DSPs"), or controllers. One may also categorize processors by the complexity of their instruction sets, such as reduced instruction set computing ("RISC") processors and complex instruction set computing ("CISC") processors. The operational characteristics on which these categorizations are based define a processor and are collectively referred to as the processor's architecture. More particularly, an architecture is a specification defining the interface between the processor's hardware and the processor's software.
One aspect of a processor's architecture is whether it executes instructions sequentially or out of order. Historically, processors executed one instruction at a time in a sequence. A program written in a high level language was compiled into object code consisting of many individual instructions for handling data. The instructions might tell the processor to load or store certain data from memory, to move data from one location to another, or any one of a number of data manipulations. The instructions would be fetched from memory, decoded, and executed in the sequence in which they were stored. This is known as the "sequential programming model." Out of order execution involves executing instructions in some order different from the order in which they are found in the program.
The sequential programming model creates what are known as "data dependencies" and "control dependencies." For instance, if one uses the variable x to calculate a result, one needs to know the value of x and that value might depend on results from previously executed instructions. Similarly, a group of instructions might contain two alternative subsets of instructions, only one of which will be executed, depending on some specified condition. Thus, the result of executing the group of instructions will depend on whether a branch is executed. Even out of order execution follows this sequential programming model, and it therefore also creates data and control dependencies.
A second aspect of a processor's architecture is whether it "pipelines" instructions. The processor fetches instructions from memory and feeds them into one end of the pipeline. The pipeline is made of several "stages," each stage performing some function necessary or desirable to process instructions before passing the instruction to the next stage. For instance, one stage might fetch an instruction, the next stage might decode the fetched instruction, and the next stage might execute the decoded instruction. Each stage of the pipeline typically moves the instruction closer to completion.
Some advanced processor pipelines process selected instructions "speculatively." Exemplary speculative execution techniques include, but are not limited to, advanced loads, branch prediction, and predicate prediction. Speculative execution means that instructions are fetched and executed before resolving pertinent control dependencies. Speculative execution requires a prediction as to what instructions are needed depending on whether a branch is taken, executing fetched instructions, and then verifying the execution and prediction. The pipeline executes a series of instructions and, in the course of doing so, makes certain predictions about how control dependencies will be resolved. For instance, if two instructions are to be alternatively executed depending on the value of some quantity, then the pipeline has to guess what that value will be or which instruction will be executed. The pipeline then predicts the next instruction to be executed and fetches the predicted instruction before the previous instruction is actually executed.
A pipeline therefore has the tremendous advantage that, while one part of the pipeline is working on a first instruction, a second part of the pipeline can be working on a second instruction. Thus, more than one instruction can be processed at a time, thereby increasing the rate at which instructions can be executed in a given time period. This, in turn, increases the processor throughput.
A third aspect of a processor's architecture is whether the processor is "superscalar." Historically, processors executed only one instruction at a time, i.e., in any given clock cycle. Such a processor is called a "scalar" processor. More recently, "superscalar" processors have been designed that execute more than one instruction at a time. More technically, a scalar processor executes one instruction per clock cycle whereas a superscalar processor executes more than one instruction per clock cycle.
Superscalar processors typically use a pipeline as described above where different stages of a pipeline work on different instructions at any given time. Not only do superscalar processors work on several different instructions at a time, but each stage of a superscalar pipeline processes more than one instruction each clock cycle. A superscalar pipeline usually includes one or more stages having several execution units executing instructions in parallel. Each execution unit reads from and writes to storage through "functional unit ports." Thus, a pipeline including N execution units may be described as an N-way pipeline having N functional unit processors.
One of the pipeline's most significant challenges in speculative execution is verification. At the end of the pipeline, the results from executed instructions are temporarily stored in a buffer until all their data and control dependencies have been actually resolved. The pipeline then checks to see whether any problems occurred. If there are no problems, then the executed instructions are "retired." This is sometimes referred to as "commitment to an architectural state" or "retirement to a committed state." Retirement, or commitment, signals that all dependencies have been correctly resolved and that the execution results are finalized.
However, no pipeline correctly predicts all eventualities and, when problems occur, they must be repaired. Problems can typically be traced to executing an instruction that should not have been executed; omitting an instruction that should have been executed; or executing an instruction with incorrect data. The effects of such problems on subsequent execution of instructions must also be repaired. Repairing the effects of such a problem in a superscalar pipeline is a particular concern because the effects might propagate other problems in instructions currently being processed. Once the problem and its effects have been repaired, the pipeline can then process the execution stream correctly.
Most pipelined processors "stall" the pipeline upon detecting a problem. As discussed above, the pipeline is usually divided into several stages. Progress through the stages is governed by a number of latches enabled by a signal generated by a particular part of the pipeline. If a problem is detected, the latches are disabled and the pipeline "stalls" such that the instructions can no longer be transferred into the next stage. The problem and its effects are then repaired, the latches are re-enabled, and the pipeline resumes.
Some processor pipelines "replay" in addition to stalling. Replay is the re-execution of instructions upon detecting an execution problem in the retirement of speculative results. The speculative results are not retired, i.e., used to update the architectural state of the processor, but are instead ignored. The pipeline corrects the problem and then re-executes the instructions. The new results are then checked for problems and retired.
One such processor is the Alpha 21164 microprocessor, commercially available from Digital Equipment Corporation. The Alpha 21164 stalls only the first three stages of the pipeline. If a problem occurs after the third stage, the Alpha 21164 replays the entire pipeline beginning with the repaired problem instead of stalling the problem in midstream. However, replaying the entire pipeline regardless of the problem can be expensive in terms of time. The Alpha 21164 therefore combines expensive stalling with complex decision-making circuitry necessary to determine when to replay. Also, when the Alpha 21164 replays, it replays the entire pipeline line even though the problem may be localized at some point in the pipeline.
A fourth aspect of computer architecture is storage utilization. Virtually all processors employ a type of memory known as a "register." A register is a high speed storage element used to temporarily store information. When the information necessary to execute an instruction is stored in a register, or several registers, the instruction can be executed more rapidly than if the information were stored in other kinds of storage.
Many processors not only have a number of registers, but include several types of registers for special purposes. As with processors, registers may be typed according to a number of distinct criteria derived from the functions they perform. Different types of registers serve different logic and architectural purposes in a processor. For instance a general purpose register is for general purpose storage for calculations as well as moving data in and out of the processor. Other types of registers found in some processors are speculative and architectural registers discussed more below.
Registers hold "operand values" during execution. An operand is a part of an instruction specifying a storage location, such as a register, that provides data to or receives data from the results of executing an instruction. An operand value is the data provided to or received by the storage location specified by the operand. Consider the following instruction: EQU Add R10=R11, R12
This instruction tells the processor to add the operand value stored in the register R.sub.11 to the operand value stored in register R.sub.12 and to store the resulting sum, also an operand value, in register R.sub.10. In the sample instruction, R10, R11, and R12 are all operands because they specify a storage location for an operand value used in executing the instruction.
Operands may also be referred to as "source operands" and "destination operands." A source operand provides an operand value necessary to execute an instruction. A destination operand stores an operand value resulting from executing an instruction. In the sample instruction above, R10 is a destination operand because it specifies the storage location for the results of executing the instruction. Similarly, R11 and R12 are source operands because they provide operand values necessary for executing the instruction.
A register may hold an operand value that has not been retired or committed to an architectural state. Uncommitted operand values are referred to as "speculative" and a register holding uncommitted operand value is referred to as a "speculative register." If an operand value has been committed, it is referred to as a "committed," "retired," or "architectural" operand value and the register is a "committed," "retired," or "architectural register."
Some processor pipelines may speculatively execute selected instructions as discussed above. This is particularly true of superscalar processors. Speculative, out of order execution sometimes requires a speculative "operand value." A speculative operand value may be stored in a speculative register. A speculative operand value may also be in some stage of the pipeline for processing another instruction. Consider the following instructions: EQU Add R10=R11, R12 EQU Add R20=R10, R21
The operand value stored in the operand R.sub.10 as a result of executing the first instruction is speculative when used in the second instruction because will not be retired at that time.
Such a pipeline therefore typically includes a "bypass" stage. The bypass stage forwards the source operand to another point in the pipeline to obtain the speculative operand value for use in subsequent execution. Using the sample instructions immediately above, the source operand R.sub.10 is bypassed from the stage executing the first instruction to the stage executing the second instruction, to obtain the operand value therefrom. It is important that the operand be bypassed to the latest copy of the operand in order to minimize, or at least lessen, the occurrence of problems in execution. In order to "bypass" operand values, the pipeline must consequently be able to determine where speculative operands are in the pipeline.
Pipelines typically track operands using large arrays of comparators. For instance, the Pentium.RTM. microprocessor manufactured and sold by Intel Corporation of Santa Clara, Calif. uses a large number of comparators to implement a "reservation station" that tracks operands in the pipeline thereof. However, as pipelines get wider and deeper in terms of ALUs and pipeline stages, respectively, operand value tracking with comparators becomes problematic. Larger networks of comparators require larger proportions of the processor die, which is undesirable. Larger networks also eventually become slower, thereby reducing performance in terms of frequency. Future generations of processors are generally expected to be both much deeper and much wider than current processors. Thus, operand value tracking using comparators is quickly becoming impracticable.
The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.