Many data processing systems are designed with reduced instruction set computer ("RISC") data processors. RISC data processors are also known as load/store data processors, for reasons which will become apparent below. RISC data processors are characterized by several features that increase their performance relative to other types of data processors.
RISC data processors predominantly execute instructions which may be broken into several discrete sequential steps. A single piece of hardware within the RISC data processor is dedicated to the execution of each one of these discrete steps. Therefore, several similar instructions, in several different phases of execution, may be simultaneously executing. This performance strategy is known as instruction pipelining.
The typical RISC data processor also concurrently executes two or more different types of instructions. A RISC data processor may concurrently execute two or more different types of instructions if it incorporates two or more different pipelines ("execution units") corresponding to two or more different classes of instructions, i.e. floating point execution unit, fixed point execution unit, branch unit, load/store unit, etc. These data processors do not wait until a first instruction completes before beginning a second instruction. They begin executing an instruction as soon as there is a pipeline stage available to accept the instruction. Some concurrent execution data processors begin two or more instructions each clock cycle. These data processors are described as "superscalar."
The combination of instruction pipelining and multiple execution units allows a RISC data processor to perform many different instructions simultaneously.
One disadvantage associated with RISC data processors is the complexity arising from the different latencies associated with the various instructions that the data processor executes. Instruction latency is the time, typically measured in machine clock cycles, that each instruction takes to produce a result or perform a function. For performance reasons, each pipeline is optimized to minimize the time each instruction takes to produce a result. However, not all instructions produce a result in the same amount of time. As a result, a fast instruction, such as a fixed point add, may finish before a slow instruction, such as a floating point multiply, even when the slow instruction begins earlier. This scenario is referred to as "out-of-order completion." Out-of-order completion must be accounted for either by software or in the hardware itself if the data processor is to maintain a coherent programming model.
Two hardware solutions for out-of-order completion are the rename buffer (or reorder buffer) and the history buffer. The rename buffer temporarily stores the result of each instruction as it is generated by an execution unit. This step is known as instruction write-back. The rename buffer will write the result of a particular instruction to the appropriate architectural register when all instructions preceding the particular instruction have written their results to the appropriate architectural registers. This step is known as instruction completion or retirement. A rename buffer masks the out-of-order execution from the architectural registers. A history buffer stores the data held in each register immediately before some instruction modifies the register. The computer can restore the state of the data processor for any predetermined time by loading the contents of the history buffer into the appropriate architectural registers if, for instance, the data processor receives an interrupt.
Both hardware solutions to the out-of-order completion problem, themselves, suffer disadvantages. The rename buffer, for instance, is shared between several, if not all, execution units within a data processor. However, not all execution units generate the same size data results. For instance, some execution units modify special purpose registers in addition to or in place of the architectural registers. These special purpose registers are handled separately from the ordinary architectural registers. Also, the value of a special purpose register may be some function of a particular instruction and the result of an immediately preceding instruction of the same type, a "sticky bit". Consequently, the rename buffer oftentimes is enlarged so that it suits all possible instruction results and is combined with an elaborate pointer circuit to identify the last special purpose register result of a particular class of instruction. Such a solution may be an expensive addition to a RISC data processor.