Almost all processors are designed to operate in a pipeline; the simplest of which consists of the fetch, decode, and execute stages. Instructions are fetched (or read) from memory in the fetch stage. They are then decoded to determine what operations to perform on which operands in the decode stage. The actual operations are performed in the execute stage. Most high-performance processors use additional pipeline stages to increase the operating speed or the number of instructions that can be processed simultaneously (in one clock cycle) or to speculatively process instructions before it is known that these instructions are to be processed at all.
The results of executing instructions are stored in registers or in memory. The results that are used immediately or repeatedly are generally kept in registers, since registers can be accessed much faster than memory. The registers can be implemented using individual flip-flops or latches but are generally implemented using SRAM, known as a register file, to minimize the area occupied by the registers. A 32-bit processor with 16 general-purpose registers, for example, would use a register file consisting of SRAM organized as 16 words of at least 32 bits per word. A register file is designed to support multiple read and write operations per clock cycle. For instance, a register file may support four read and two write operations to sustain execution of two instructions in each cycle, assuming that the instructions use two operands and produce one result. Such a register file is said to have four read ports and two write ports. Processors may also have special-purpose registers that serve specific functions, such as keeping processor control and status information, providing debug or performance monitoring information, or aid in translating from virtual address to physical address. Although special-purpose registers may be better implemented as individual flip-flops and general-purpose registers in a register file, the same set of rules apply to reading and writing either type of registers, as described below.
If an instruction is executed before all instructions that are earlier in the program sequence have executed, its results must not be written to the specified register or memory if the processor is to provide a programming model known as precise exception. Such behavior is required when an earlier instruction produces an error condition, in which case the results of this “prematurely executed” instruction must be discarded without affecting any of the processor's registers or memory. To be exact, the processor must behave as if it executed all instructions that are earlier than the one causing the error and none of the instructions that are later than the one causing the error. The result of any prematurely executed instructions must, therefore, be kept in temporary storage.
Many processors use a rename buffer to hold these temporary results until it is safe to update the intended destination registers or memory with the results. The rename buffer is said to hold the future states—as opposed to the architectural state—because it contains the results that may or may not be updated to their intended destination registers or memory. As each instruction is executed in the program sequence and does not cause an error, its results can be safely and permanently copied to its specified memory or destination registers. Such an instruction is said to be completed and its destination registers are said to hold the architectural state. If an instruction causes an error, its results as well as the results of any prematurely executed instructions in the rename buffer are discarded.
Many high-performance processors execute a later instruction before executing an earlier one if the later instruction is ready to execute while the earlier one is not. They generally use an additional pipeline stage between the stages where the source operands are read and the instructions are executed. They use a reservation station to hold the instructions in this intermediate stage. As an instruction enters the reservation station, it obtains the source operands from the instruction itself for immediate operands or from memory, the register file or the rename buffer for register operands. If a source operand is not yet valid in memory, the register file or the rename buffer, it must be the destination of an earlier instruction that has not yet executed. When this earlier instruction is executed, its results are written to the rename buffer (assuming that all results are first written to the rename buffer before they are copied to memory or the register file) and to the source operand fields of the waiting instructions in the reservation station. The latter process is known as result forwarding, which allows the waiting instructions to obtain the source operands without reading memory, the rename buffer or register file.
Rename buffer is one of many names that refer to the storage elements used to hold future results until the results can be safely and permanently written to their intended destination registers or memory. Reservation station is also one of many names that refer to the storage elements used to hold the source operands of instructions waiting to be executed.
The advantage of operand file is that it eliminates copying results and operands between the register file, reservation station, and rename buffer, thereby greatly simplifying the design and reducing area and power consumption. Furthermore, it can also be used in multithreaded processors that spawn children threads by copying some or all of the parent thread's registers to each of the children thread's registers.