1. Field of the Invention
The present invention relates generally to superscalar computers, and more particularly, a system and method for using tags to control instruction execution in a superscalar reduced instruction set computer (RISC).
2. Related Art
Processors used in conventional computer systems typically execute program instructions one at a time, in sequential order. The process of executing a single instruction involves several sequential steps. The first step generally involves fetching the instruction from a memory device. The second step generally involves decoding the instruction, and assembling any operands.
The third step generally involves executing the instruction, and storing the results. Some processors are designed to perform each step in a single cycle of the processor clock. Alternatively, the processor may be designed so that the number of processor clock cycles per step depends on the particular instruction.
Modern computers commonly use a technique known as pipelining to improve performance. Pipelining involves the overlapping of the sequential steps of the execution process. For example, while the processor is performing the execution step for one instruction, it might simultaneously perform the decode step for a second instruction, and perform a fetch of a third instruction. Pipelining can thus decrease the execution time for a sequence of instructions. Superpipelined processors attempt to further improve performance by overlapping the sub-steps of the three sequential steps discussed above.
Another technique for improving performance involves executing more than one instruction in parallel, simultaneously. Processors which utilize this technique are generally referred to as superscalar processors. The ability of a superscalar processor to execute two or more instructions simultaneously depends upon the particular instructions being executed. For example, two instructions which both require use of the same, limited processor resource (such as the floating point unit) cannot be executed simultaneously. This type of conflict is known as a resource conflict. Additionally, an instruction which depends on the result produced by execution of a previous instruction cannot be bundled with that previous instruction. The instruction which depends on the result of the previous instruction is said to have a data dependency on the first instruction. Similarly, an instruction may have a procedural dependency on a previous instruction, which prevents the two instructions from being executed simultaneously.
Thus, a superscalar processor seeks to execute more than one instruction at a time. In order to do this a processor must contain a system for executing multiple instructions called an Execution Unit (e.g., floating point unit and integer unit, etc.). The Execution Unit must be supplied with a group of instructions that it is to execute in the near future. This group of instructions are typically located in a so called "instruction window." The Window provides a "snap-shot" of an instruction program.
The above mentioned Window is normally located in an Instruction Fetch Unit. The Instruction Fetch Unit fetches a group of instructions from memory; decodes the instructions and sends them to a Superscalar Unit. The Superscalar Unit issues the instructions to the various functional units. The Super-Scalar unit needs information showing which of the microprocessor's resources the instruction will use (e.g., Load/Store); the names of registers where an instruction's inputs will come from (e.g., integer unit register file) and where its output will go to (e.g., floating-point unit register file), information indicating what function the instruction will perform (e.g., add, multiply), etc.
As a result of knowing this information, once the instructions are completed, the Superscalar Unit notifies the Instruction Fetch Unit to remove them from the window and add new instructions to take their place.
Current designs employ an instruction window that utilizes a First In First Out queue (FIFO). The dam in the FIFO can only be advanced a "fixed amount." For example, an instruction window might contain four instructions (I0-I3) and may be changed in groups of four. In this case, after instructions I0, I1, I2, and I3 have executed, they are removed from the window and four more instructions are advanced into the window. The Superscalar Unit can easily follow the progress of the instructions through the window, since the window changes by a fixed amount each time a group of instructions are completed.
Fixed advance windows have some drawbacks. One instruction can delay the group instructions from being removed from the window. For example, if I0, I1, I2 are instructions that all execute in one cycle, and I3 is an instruction that requires many cycles to execute, then I0, I1, and I2 will have to remain in the window even after they are completed, until I3 completes execution; instead of being pushed out of the FIFO and replaced by three new instructions. This stalls the instruction stream (i.e., a bottleneck) and tends to limit performance.
One solution for fixed advance windows (FIFOs) is an instruction window that can be advanced by a variable amount. This would permit instructions to be removed from the window immediately after they have been executed. Instruction execution is much more complex using variable advance windows, since an instruction may be located in several places in the FIFO. For example, if I1 is executed during the same cycle that I0 is retiring (completed), then in the next cycle I0 will be pushed out of the FIFO, and I1 will move into I0's slot in the FIFO (where I0 and I1 refer to slot locations in a FIFO and not instruction program order). The Superscalar Unit must know that the new I0 was once I1, so that it can retire instructions when they are executed. And the Execution Unit can write corresponding results into correct register files addresses of the functional units.
Although this would improve performance, there are drawbacks associated with this system. Typically, it is necessary to employ a large principal queue (usually in the Superscalar unit) that contains the instructions from the Instruction Fetch Unit. Additionally, several other centrally located queues would need to contain the decoded information associated with each instruction located in the principal queue (i.e., a resource queue, a destination register queue, etc.). The principal queue and the queues that contain the decoded information, would all need to advance in parallel in order to keep track of instructions (i.e., where they should be sent, when they are executed, etc.).
Currently, the idea of using several queues to contain instructions is disadvantageous, for many reasons including: a large amount of chip area resources are dedicated to a plurality of queues; there is not as much flexibility in designing a system with more than one queue; and control logic for directing data in queues is intricate and inflexible.
Therefore, what is needed is a technique to "track" or monitor instructions after they are decoded. The system must require a small area on a chip, be flexible and be able to properly monitor instructions as they advance through a "Variable Advance Instruction Window."