1. Field of the Invention
The present invention relates generally to superscalar computers, and more particularly, a system and method for using tags to control instruction execution in a superscalar reduced instruction set computer (RISC).
2. Related Art
Processors used in conventional computer systems typically execute program instructions one at a time, in sequential order. The process of executing a single instruction involves several sequential steps. The first step generally involves fetching the instruction from a memory device. The second step generally involves decoding the instruction, and assembling any operands.
The third step generally involves executing the instruction, and storing the results. Some processors are designed to perform each step in a single cycle of the processor clock. Alternatively, the processor may be designed so that the number of processor clock cycles per step depends on the particular instruction.
To improve performance, modern computers commonly use a technique known as pipelining. Pipelining involves the overlapping of the sequential steps of the execution process. For example, while the processor is performing the execution step for one instruction, it might simultaneously perform the decode step for a second instruction, and perform a fetch of a third instruction. Pipelining can thus decrease the execution time for a sequence of instructions.
Another class of processors improve performance by overlapping the sub-steps of the three sequential steps discussed above are called superpipelined processors.
Still another technique for improving performance involves executing multiple instructions simultaneously. Processors which utilize this technique are generally referred to as superscalar processors. The ability of a superscalar processor to execute two or more instructions simultaneously depends on the particular instructions being executed. For example, two instructions which both require the use of the same, limited processor resource (such as a floating point unit) cannot be executed simultaneously. This type of conflict is known as a resource dependency. Additionally, an instruction which uses the result produced by the execution of another instruction cannot be executed at the same time as the other instruction. An instruction which depends on the result of another instruction is said to have a data dependency on the other instruction. Similarly, an instruction set may specify that particular types of instructions must execute in a certain order relative to each other. These instructions are said to have procedural dependencies.
A third technique for improving performance involves executing instructions out of program order. Processors which utilize this technique are generally referred to as out-of-order processors. Usually, out-of-order processors are also superscalar processors. Data dependencies and procedural dependencies limit out-of-order execution in the same way that they limit superscalar execution.
From here on, the term xe2x80x9csuperscalar processorxe2x80x9d will be used to refer to a processor that is: capable of executing multiple instructions simultaneously, or capable of executing instructions out of program order, or capable of doing both.
For executing instructions either simultaneously or out of order, a superscalar processor must contain a system called an Execution Unit. The Execution Unit contains multiple functional units for executing instructions (e.g., floating point multiplier, adder, etc.). Scheduling control is needed to dispatch instructions to the multiple functional units. With in-order issue, the processor stops decoding instructions whenever a decoded instruction creates a resource conflict or has a true dependency or an output dependency on a uncompleted instruction. As a result, the processor is not able to look ahead beyond the instructions with the conflict or dependency, even though one or more subsequent instructions might be executable. To overcome this limitation, processors isolate the decoder from the execution stage, so that it continues to decode instructions regardless of whether they can be executed immediately. This isolation is accomplished by a buffer between the decode and execute stages, called an instruction window.
To take advantage of lookahead, the processor decodes instructions and places them into the window as long as there is room in the window and, at the same time, examines instructions in the window to find instructions that can be executed (that is, instructions that do not have resource conflicts or dependencies). The instruction window serves as a pool of instructions, giving the processor lookahead ability that is constrained only by the size of the window and the capability of the instruction source. Thus, out-of-order issue requires a buffer, called an instruction window between the decoder and functional units; and the instruction window provides a snap-shot of a piece of the program that the computer is executing.
After the instructions have finished executing, instructions must be removed from the window so that new instructions can take their place. Current designs employ an instruction window that utilizes a First In First Out queue (FIFO). In certain designs, the new instructions enter the window and completed instructions leave the window in fixed size groups. For example, an instruction window might contain eight instructions (I0-I7) and instructions may be changed in groups of four. In this case, after instructions I0, I1, I2 and I3 have executed, they are removed from the window at the same time four new instructions are advanced into the window. Instruction windows where instructions enter and leave in fixed size groups are called xe2x80x9cFixed Advance Instruction Windows.xe2x80x9d
In other types of designs, the new instructions enter the window and completed instructions leave the window in groups of various sizes. For example, an instruction window might contain eight instructions (I0-I7) and may be changed in groups of one, two or three. In this case, after any of instructions I0, I1 or I2 have executed, they can be removed from the window and new instructions can be advanced into the window. Instruction windows where instructions enter and leave in groups of various sizes are called xe2x80x9cVariable Advance Instruction Windows.xe2x80x9d
Processors that use Variable Advance Instruction Windows (VAIW) tend to have higher performance than processors that have Fixed Advance Instruction Windows (FAIW). However, fixed advance instruction windows are easier for a processor to manage since a particular instruction can only occupy a fixed number of locations in the window. For example, in an instruction window that contains eight instructions (I0-I7) and where instructions can be added or removed in groups of four, an instruction can occupy only one of two locations in the window (e.g., I0 and I4). In a variable advance instruction windows, that instruction could occupy all of the locations in the window at different times, thus a processor that has a variable advance instruction window must have more resources to track each instruction""s position than a processor that has a fixed advance instruction window.
Current designs use large queues to implement the instruction window. The idea of using queues is disadvantageous, for many reasons including: a large amount of chip area resources are dedicated to a plurality of queues especially when implementing a variable advance instruction window; there is limited flexibility in designing a system with more than one queue; and control logic for directing data in queues is complex and inflexible.
Therefore, what is needed is a technique to xe2x80x9ctrackxe2x80x9d or monitor instructions as they move through the window. The system must be flexible and require a small area on a chip.
The present invention is directed to a technique for monitoring instruction execution of multiple instructions in parallel and out of program order using a system that assigns tags to the multiple instructions and maintains an instruction window that contains the multiple instructions. The system is a component of a superscalar unit which is coupled between a source of instructions and functional units which execute the instructions. The superscalar unit is in charge of maintaining the instruction window, directing instructions to the various functional units in the execution unit, and, after the instructions are executed, receiving new instructions from the source.
The present invention employs a tag monitor system, which is a part of the superscalar unit. The tag monitor system includes: a register file and a queue that operates on a First-In-First-Out basis (the queue is a multiple-advance, multiple output, recycling FIFO). The queue is coupled to the register file. The register file is coupled to the instruction source and is used to store instruction information (i.e., the resource requirements of each instruction). When an instruction is sent from the instruction source to the register file it is assigned a tag that is not currently assigned to any other instruction. The instruction information is then stored in the register file at an address location indicated by the tag of the instruction. Once an instruction""s information is stored in the register file, it is said to be xe2x80x9cin the instruction window.xe2x80x9d The tags of each instruction in the instruction window are stored in the queue. The tags are arranged in the queue in the same order as their corresponding instructions are arranged in the program.
When an instruction is finished, the queue advances and the tag of the instruction is effectively pushed out the bottom of the queue. The tag can then be reassigned to a new instruction that enters the instruction window. Accordingly, the tag is sent back to the top of the queue (in other words, it is recycled). It is also possible for several tags to be recycled at the same time when several instructions finish at the same time. In a preferred embodiment, instructions are required to finish in order. This is often necessary to prevent an instruction from incorrectly overwriting the result of another instruction. For example, if a program contains two instructions that write to the same location of memory, then the instruction that comes first in the program should write to the memory before the second. Thus, the results of instructions that are executed out of order must be held in some temporary storage area and the instructions themselves must remain in the instruction window until all previous instructions have been executed. When a group of instructions is completed, all of their results are moved from the temporary storage area to their real destinations. Then the instructions are removed from the window and their tags are recycled.
The register file has write ports where new instruction information is received from the instruction source. The register file has a number of write ports equal to the number of new instructions that can be added to the window at one time. The register file has one entry for each instruction in the window. The register file also has one output port for every instruction in the window. Associated with each output port is an address port. The address port is used to select which register file entry""s contents will be output on its corresponding output port.
The queue has an output for each slot (e.g., specific buffer location in the queue) that shows the value of the tag stored in that slot. These outputs are connected to the read address ports of the register file. This connection causes the register file to provide an entry""s contents on its corresponding output port when a tag valve is presented by the queue to the read address ports. The outputs of the register file are sent to various locations in the superscalar unit and execution units where the instruction information is used for instruction scheduling, instruction execution, and the like.
It is possible that some of the locations in the instruction window may be empty at any given time. These empty window locations are called xe2x80x9cbubbles.xe2x80x9d Bubbles sometimes occur when an instruction leaves the window and the instruction source cannot immediately send another instruction to replace it. If there are bubbles in the window, then some of the entries in the register file will contain old or bogus instruction information. Since all of the data in the register file is always available, there needs to be some way to qualify the data in the register file.
According to the present invention, a xe2x80x9cvalidity bitxe2x80x9d is associated with each entry in the instruction window to indicate if the corresponding instruction information in the register file is valid. These validity bits can be held in the tag FIFO with the tags. There is one validity bit for each tag in the FIFO. These bits are updated each time a tag is recycled. If, when a tag is recycled, it gets assigned to a valid instruction, then the bit is asserted. Otherwise it is deasserted.
The validity bits are output from the tag monitor system along with the outputs of the register file. They are sent to the same locations as the outputs of the register file so that the superscalar unit or execution units will know if they can use the instruction information.
A feature of the present invention is that an instruction window can be maintained without storing instruction information in large queues. This simplifies design and increases operational flexibility. For example, for a window containing n instructions, the tag monitor system would contain a queue with n entries and a register file with n entries and n output ports. If each output of the queue is connected to its corresponding read address port on the register file (e.g., output 0 connected to read address port 0, output 1 connected to read address port 1, etc.) then the register file outputs will xe2x80x9cdisplayxe2x80x9d (i.e., make available at the output ports) the information for each instruction in the window in program order (e.g., output port 0 will show instruction 0""s information, output port 1 will show instruction 1""s information, etc.). When the window advances, the queue advances and the addresses on the read address ports change. This causes the outputs of the register file to change to reflect the new arrangement of instructions in the window. It is necessary for the instruction information to be displayed in order on the register file outputs so that it can be sent to the rest of the superscalar unit in order. The superscalar unit needs to know the order of the instructions in the window so that it can schedule their execution and their completion.
Further features and advantages of the present invention, as well as the structure and operation of various embodiments of the present invention, are described in detail below with reference to the accompanying drawings.