For a long time, processors have executed instructions in their sequential order. This implies that instructions are forwarded to the processor's execution units in the same order as they appear in the program sequence, one after the other.
In such a system, read- and write-operations to any of the processor's registers occur in exactly the same sequence as indicated by the program. Therefore, there exists a one-to-one correspondence between the registers and their values. At any given point in execution, a register identifier precisely identifies the value contained in the corresponding register. This value of a register also represents the actual machine state and can be referred to as the architected register value.
In order to achieve higher instruction throughput, and thus a higher performance, processors that issue, or initiate execution of, multiple independent instructions per clock cycle were introduced. Such processors are known as superscalar processors. Multiple instructions can be executed in a single cycle, as long as there are no data dependencies, procedural dependencies, or resource conflicts. When such dependencies or conflicts exist, only the first instruction in a sequence can be executed. As a result, a plurality of functional units in a superscalar architecture can not be fully utilized.
The next step in the development of high performance processors is to be seen in the introduction of out-of-order processing. Out-of-order processors disobey the instruction sequence when executing a program, and process instructions in a different order than the sequential order.
But if an instruction A produces target data, and said target data is needed by an instruction B as source data, this data dependency has to be taken care of. When instructions are issued out-of-order, correspondence between registers and values breaks down. Several register values corresponding to one logical register may exist in parallel, because each write access to a certain logical register creates a new instance of said register.
The values of different register instances must not be confused. Therefore, register arrays have to be provided that can hold and identify a multitude of values per logical register. Before instructions can be dispatched to any of the execution units, it has to be indicated which instances of the addressed logical registers are to be used. The task of identifying the actual storage cell representing a logical register at a given moment is usually referred to as "register renaming".
A new instance of a certain logical register is created each time a write access to said logical register occurs. Thus, each instruction that modifies any register produces a new physical instance of said register, and for each new instance, a physical register in the register array has to be allocated.
When allocating a new physical register each time a logical register is modified, there also has to exist a mechanism for getting rid of old register instances. Otherwise, the system would accumulate an indefinite amount of register instances. An instance can be destroyed when its value is superseded and there are no outstanding references to said value.
When processing instructions in their sequential order, there always exists a defined "state" of the processor. This defined state must be saved for a program that is suspended, in order to provide for the possibility of "precise interrupts". In case an exception occurs, the processor has to be able to return to said defined state. Also in case a branch has been mispredicted, and several instructions following said branch have speculatively been executed, the processor has to be able to return to a well-defined, non-speculative machine state.
The question arises how said state, and the corresponding architected register values, can be defined in an out-of-order processing system. Even though instructions are processed out-of-order, it is desirable to advance said architected state in order.
One approach for defining an architected in-order state is the following: If an instruction is completed and all previous instructions have also been completed, the instruction's results can be stored as the corresponding register's in-order state, and the instruction can be considered "retired". Thus, the architected state of an out-of-order processing system can be defined by the most recently completed instruction of the continuous string of completed instructions. The corresponding architected register values are the values at the moment said instruction was completed. In case of exceptions, and in case of mispredicted branches, the machine resumes instruction execution at said architected state.
One concept for both being able to handle different register instances, and for continuously advancing said architected in-order state, is the use of a reorder buffer in combination with a register file. When an instruction is decoded, it is assigned an entry at the top of the reorder buffer. Said reorder buffer is implemented as a first-in first-out (FIFO) buffer. When the instruction completes, its result value is written back to the allocated entry. When the value reaches the bottom of the buffer, and if no exception has occurred, it is written to the register file. If the instruction is not complete when it reaches the bottom, the reorder buffer does not advance until the instruction completes. While the speculative values of various register instances are contained in the reorder buffer, the register file holds the architected register values and thus defines the in-order state. In case an exception or a misprediction of a branch occurs, the reorder buffer is discarded and the in-order state is accessed.
One disadvantage of this solution is that register values have to be transferred from said reorder buffer entries to the register file. In case precise interrupt occurs, the values of said register file are accessed.
There exist a variety of different solutions that use a separate register file for holding the architected register values. The temporary values of the different register instances may either be contained, as described, in the reorder buffer, or in the instruction window itself, or in a separate temporary register array. All these solutions have one disadvantage in common: Register values have to be transferred from a temporary register storage--no matter how said storage is implemented--to a register array or a register file holding the architected in-order register values.
In the international application PCT/JP93/00553, "A system and method for retiring instructions in a superscalar microprocessor" to J. Wang, S. Garg and T. Deosaran, a system and method for keeping track both of architected state and rename instances of an out-of-order processing system's logical registers is provided. According to the technique disclosed, results of instructions executed out-of-order are first stored in a temporary buffer, until all previous instructions have been executed.
As soon as all previous instructions have been executed, and their results have been stored in order in a register array, the results of the instruction in question can be written to said register array, and the instruction is considered retired. To maintain the integrity of register array data, results of instructions are not written to the register array until the results of all previous instructions have been written. In this manner, the machine state is updated in sequential order. The solution described comprises means for assigning and writing instruction results to a temporary storage location, for transferring results from temporary storage to the register array, so that the register array is updated in-order, and for accessing both the temporary storage and the register array for subsequent operations.
Again, retiring register values is done by transferring them to a register array which holds the "final register values". A constant data traffic between said temporary register array and said final register array is necessary.
In case the actual value of a certain logical register is to be determined, it first has to be checked whether there exists an instance in said temporary register file. In case there is no temporary instance, said final register array has to be accessed. This data access in two steps requires both time and additional logic.