1. Technical Field of the Invention
The present invention relates generally to data processing and, in particular, to processors that support out of order instruction execution. Still more particularly, the present invention relates to a system and method for managing the execution of an instruction group having multiple executable instructions.
2. Description of the Related Art
The evolution of microprocessors has reached the point where architectural concepts pioneered in vector processors and mainframe computers of the 1970s, such as the CDC-6600 and Cray-1, are appearing in Reduced Instruction Sets Computing (RISC) processors. Early RISC machines were very simple single-chip processors. As Very Large Scale Integrated (VLSI) technology improves, more additional space becomes available on a semiconductor chip. Rather than increase the complexity of a processor architecture, most designers have decided to use the additional space to implement techniques to improve the execution of their current processor architecture. Two principal techniques utilized are on-chip caches and instruction pipelines.
A next step in this evolutionary process is the superscalar processor. The name implies that these processors are scalar processors that are capable of executing more than one instruction in each cycle. The elements to superscalar execution are an instruction fetching unit that can fetch more than one instruction at a time from a cache memory; instruction decoding logic that can decide when instructions are independent and thus can be executed simultaneously; and sufficient execution units to be able to process several instructions at one time. It should be noted that the execution units may be pipelined, e.g., they may be floating point adders or multipliers, in which case, the cycle time for each stage matches the cycle times for the fetching and decoding logic. In many systems, the high level architecture has remained unchanged from earlier scalar designs. The superscalar processor designs typically use instruction level parallelism for improved implementations of these architectures.
Within a superscalar processor, instructions are first fetched, decoded and then buffered. Instructions can be dispatched to executions units out of program order as resources and operands become available. Additionally, instructions can be fetched and dispatched speculatively based on predictions about branches taken. The result is a pool of instructions in varying stages of execution, none of which have completed by writing final results. As resources become available and branches are resolved, instructions are xe2x80x9cretiredxe2x80x9d in program order. This preserves the appearance of a machine that executes the instructions in program order.
A superscalar processor tracks, or manages, instructions that have been speculatively executed typically utilizing a completion buffer. Each executed instruction in the buffer is associated with its results, which are generally stored in rename registers, and any exception flags. A retire unit removes these executed instructions from the buffer typically in program order. The retire unit then updates designated registers with the computed results from the rename registers. A problem arises, however, when instructions are executed out of order; in particular when one of the instructions encounters an exception condition. The processor architecture requires that when an instruction has an exception, then the processor must stop at that point in the program. This is because effects from instructions executed after the instruction that has an exception should neither be reflected in the state of the machine nor should there be any unexecuted instructions before it. This characteristic is generally known as a precise exception or interrupt. By retiring instructions in order, the processor can maintain precise exceptions. To accomplish this, conventional processors typically employ a methodology, whereby each executable instruction is associated with an exception flag. Thus, a completion buffer contains a equal number of exception flags as instructions tracked by the buffer. Furthermore, a separate cycle is used to read the completion status, including checking the exception flag, of each individual instruction to determine if the instruction can be retired. Therefore, even though the processor can execute more than one instruction every cycle, the processor is generally limited to retiring only one instruction per cycle per read port of the retire unit that, in turn, limits the processing xe2x80x9cthroughputxe2x80x9d of the processor.
Accordingly, what is needed in the art is an improved processor architecture that mitigates the above described limitations.
It is therefore an object of the present invention to provide an improved processor.
It is another object of the present invention to provide a group completion table that manages the execution of instruction groups having more than one executable instruction and a method of operation thereof.
To achieve the foregoing objects, and in accordance with the invention as embodied and broadly described herein, the present invention provides a group completion table (GCT) that manages the execution of instruction groups having more than one executable instruction. The GCT includes a plurality of table entries, wherein each of the table entries is associated with a respective instruction group. Each table entry in the GCT includes a plurality of instruction completion identifiers, wherein each of the instruction completion identifiers corresponds to a specific instruction in the associated instruction group. The table entry also includes a trouble identifier that is utilized to flag the occurrence of any exception condition encountered in the execution of any instruction in the instruction group. In a related embodiment, the trouble identifier utilized in the table entry is a single bit.
The present invention introduces a novel method utilizing a single trouble identifier in a group completion table to track and manage the execution status of all of the instruction in an instruction group. Unlike conventional techniques that typically employ a separate exception indicator for each instruction, the present invention utilizes a single exception indicator for each instruction group. In this manner, the instructions in the instruction group are retired en bloc in one cycle, in contrast to using multiple cycles (equal to the number of instructions in the instruction group) when employing conventional techniques. The present invention significantly increases the processing throughput of a processor, a substantial consideration in the design and use of processors.
In one embodiment of the present invention, the GCT further includes a plurality of write ports that, preferably, are coupled to an equal number of execution units. Those skilled in the art should readily appreciate that the execution units generally include fixed and floating point execution units.
In another embodiment of the present invention, a table entry further comprises a single exception information identifier for all the instructions in the associated instruction group.
In yet another embodiment of the present invention, the instruction group has five instructions. It should be noted that the number of instructions in the instruction group is not limited to five but may be any arbitrary number greater than one. The present invention does not contemplate limiting its practice to any one particular number. In a related embodiment, the last instruction in an instruction group is a delimiter identifier. In an advantageous embodiment, the delimiter identifier is a branch (BR) instruction. Alternatively, in another advantageous embodiment, the delimiter identifier is a no-operation (no-op) instruction.
In another embodiment of the present invention, each instruction group is associated with a group tag number that corresponds to a table entry. Furthermore, in a related embodiment, an instruction in an instruction group is identified by the instruction group""s group tag number concatenation with a multiple bit mask. The multiple bit mask indicates the location of the instruction in its instruction group.
The foregoing description has outlined, rather broadly, preferred and alternative features of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features of the invention will be described hereinafter that form the subject matter of the claims of the invention. Those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiment as a basis for designing or modifying other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.