Modem integrated data processors achieve remarkable performance by incorporating advanced features once the exclusive domain of main frame class processing systems: instruction pipelining, superscalar instruction dispatch, and out-of-order instruction execution. These strategies increase the number of instructions executed per clock cycle. Unfortunately, these strategies also increase the complexity of modern integrated data processors.
The advanced features of modem integrated data processors increase their complexity in at least two ways. First, a data processor must now carefully coordinate when it updates a destination architectural register with the result of an instruction. Each instruction must update its destination architectural register in the original program order if the data processor is to be considered "precise." Precise data processors appear to have not executed any instructions following an instruction that causes an exception. Similarly, precise data processors appear to have executed all instructions preceding the instruction that causes the exception. The advanced features and complications of a precise data processor are invisible to a software programmer. Second, high instruction completion rate requires a high performance instruction operand supply during instruction dispatch. The instruction operands need to be available as soon as the operands are determined by preceding instructions and must be easily accessible from wherever they are stored.
Reorder or rename buffers are often used to coordinate when each instruction copies its result to the appropriate architectural register. Coordinating circuitry allocates a unique entry in these buffers to receive each instruction's result when the instruction is dispatched. Each execution unit writes its result into the specified entry whenever it finishes executing the instruction. This order usually does not parallel the original instruction order. These buffers, in turn, write the contents of each one of their entries to the appropriate architectural destination register when the instruction that generated the contents is the oldest instruction in the data processor. Reorder and rename buffers are typically first-in-first-out ("FIFO") queues. Therefore, the oldest instruction can be determined by maintaining a pointer specifying the first dispatched instruction.
Future files are often used to provide instruction operands quickly at instruction dispatch. Future files maintain a list of the most recent value with respect to the original program order of some or all of the architectural registers. The contents of a future file are also stored in a reorder or rename buffer along with older values of the same architectural registers. If no instruction generates an exception, then the reorder buffer will write its copy of the future file data into the architectural register at some "future" time. Therefore, the future file can provide a copy of the most current expected value of a particular architectural register from one convenient location for instruction dispatch. Also, the future file provides its data to an instruction execution unit before the reorder buffer writes its copy of the data to the appropriate architectural register. The execution unit need not wait for the reorder or rename buffer to update the appropriate architectural register before the execution unit begins instruction execution.
Two future file architectures have been widely discussed: James E. Smith and Andrew R. Pleszkun, Implementation of Precise Interrupts in Pipelined Processor, in 12th Annual Symposium on Computer Architecture, at 41 (IEEE, June 1985) and Mike Johnson, Superscalar Microprocessor Design, at 94-95 (1991). Each of these two strategies is a compromise between operand access speed and exception handling speed. The future file described by Smith and Pleszkun provides a fast lookup for operand data. However, this first future file requires numerous cycles to correct its data in the event of an exception. Conversely, the future file described by Johnson provides a slower lookup cycle for operand data. However, that future file requires only one cycle to correct the data stored in the future file in the event of an exception.