In computer systems that process instructions out of their sequential order, data dependencies between instructions that produce target data, and instructions that use said target data as source data, have to be taken care of carefully. This is usually done by mechanisms known as "register renaming" and "register allocation".
A number of external instructions that have been fetched from the instruction stream are decoded to internal instructions which are forwarded to a reservation station where they are analyzed for data dependencies. In the reservation station (or the instruction window buffer), the instructions wait until all their source operands are available. As soon as this is the case, an instruction can be forwarded from the reservation station to one of the processor's functional units. Results produced by said functional units are rewritten to the instruction window buffers.
R. M. Tomasulo has been the first to present an algorithm for dependency resolution concerning instructions that are to be processed out-of-order. A summary of his ideas is to be found in IBM J. Res. Develop. pages 25-33, January 1967, "An efficient algorithm for exploiting multiple arithmetic units", to T. M. Tomasulo. This article is incorporated herein by reference.
The algorithm operates as follows: An instruction whose operands are not available when it enters the decode stage is forwarded to a reservation station. It waits in said reservation station until its data dependencies have been resolved and its operands are available. Once at a reservation station an instruction can resolve its dependencies by monitoring the common data bus (the result bus). When all the operands for an instruction are available, it is dispatched to the functional unit for execution. Each source register is assigned a tag which identifies the result that will be written into the register. Since any register in the register file can be a source register, each register must be assigned a tag. All instructions issued later and using said register as a source, will receive the same tag, too.
Commonly the entry number of the reservation station entry is used as the tag. The entry number of the reservation station can also be used for renaming: Acquiring a tag for the result at the same time defines an entry in the reservation station. Thus the entry number in the reservation station becomes the tag. This holds if a reorder buffer, which is used in order to allow for precise interrupts, is incorporated into the above, too. A solution with a combined reservation station and renaming approach can be found in the article "Instruction issue logic for high-performance interruptable, multiple functional unit, pipelined computers" by G. S. Sohi, IEEE, Transactions on Computers, vol. 39, number 3, March 1990, which is also incorporated herein by reference.
However using just the entry number as a tag becomes problematic when speculative execution of instructions due to prediction of conditional branches is attempted. A mispredicted branch causes several entries of the combined reservation station and reorder buffer, called instruction window buffer, to be cleared. These are all entries which are younger than the mispredicted branch. After clearing the entries they will be reused for loading the instructions from the correct path. The problem arises if an entry is reused and the previous, cleared instruction was already sent to an execution unit and now wants to write back its result. Writing back and especially picking up said result by a depending instruction must be prohibited. The simply entry number is not enough. Some path ID has to be added.
Instead of issuing instructions to one common reservation station, it might be advantageous to use separate window buffers, to which internal instructions are issued according to their type. There do exist solutions where separate window buffers for register operations (operations that modify registers) and storage operations (operations that write to or read from storage) are used. Typically, an external instruction from the external instruction stream is decoded into a number of internal instructions, which are distributed to the different window buffers according to their type. This implies that there is a lot of data exchange taking place between the internal instructions in the different window buffers. Cross-referencing between related instructions in different window buffers occurs rather frequently.
In a solution where window buffer entries are numbered independently a translation unit has to be implemented which translates entry numbers of one window buffer to entry numbers of another window buffer. Usually this is done by adding a constant. This implies additional hardware, and a decrease in performance, because a big amount of extra add operations has to be performed.
But there do exist other disadvantages when using separate, independent tag mechanisms.
In case of an exception, for example in case of a translation exception, instruction execution has to go back to the point where the exception occurred. When several separate window buffers are used, it is necessary to go back to the point where the exception occurred in each of these buffers. This implies that entry numbers have to be translated from one system to the other.
A similar problem arises when instructions have to be purged because a branch has been mispredicted. Also in this case, instruction execution has to go back to the purge-point.
When committing instructions, the result values produced by said instructions are recorded as architected register values. When using separate window buffers, it is necessary to take care of the relative order of internal instructions distributed among different window buffers when committing said instructions. Therefore, also here, a translation between the different tag systems is necessary.