As data processing systems have evolved over time, the need for faster performance has increased. Today, to improve performance, many data processing systems include pipelined processors. A pipelined processor executes multiple instructions in a simultaneous and overlapping manner. By using such a technique, the pipelined processor is able to process a greater number of instructions within a given time, even though there is a slight increase in the execution time of an individual instruction due to the added pipeline control. Typically, a pipelined processor includes six stages for executing an instruction. The six stages include instruction fetching, instruction decoding, data memory addresses generation, processor resident operand fetching, instruction execution, and results writing.
Traditionally, all stages in a pipeline must proceed at a same time. As a result, the time required to move an instruction one step down the pipeline was determined by a slowest pipe stage. Therefore, in such traditional pipeline processors, throughput of the processor is determined and limited by the slowest pipe stage. To compensate for this limitation, current implementations of pipelined processor either used faster pipe stages or allowed functional units to execute independently at their own pace.
However, by allowing the functional units to proceed independently at their own pace, various pipeline hazards are introduced. When hazards are encountered, the offending instructions and following instructions are flushed and refetched. An example of a pipelined processor which utilizes such out of order execution is the PentiumPro.TM. processor available from Intel Corporation. It should be noted that PentiumPro.TM. is a trademark of Intel Corporation. The PentiumPro.TM. microprocessor allows memory read operations to be reordered ahead of some write operations. To perform this reordering operation, a central processing unit of the PentiumPro.TM. processor reorders read operations around the write operations, but such reordering is not observable from a program point of view. For information regarding the PentiumPro.TM., refer to "Intel's . . . ," Byte Magazine, by Tom R. Halfell, April 1995, pp. 42-58.
Additionally, the PowerPC.TM. 604 RISC microprocessor available from IBM Microelectronics also performs out of order instruction execution. While dispatching an instruction, dispatch logic within the PowerPC.TM. 604 allocates instruction to an appropriate execution unit. It should be noted that PowerPC is a trademark of IBM Corporation. A reorder entry in a special completion buffer is allocated for each instruction and dependency checking is performed between the instructions in a dispatch queue. Executed instructions are retired in the completion unit. In addition to storing the dispatched instructions, the completion unit updates register files and control registers in an appropriate manner. Furthermore, the completion unit guarantees sequential programming and only retires an instruction from the completion buffer when all instructions ahead of it have been completed and the instruction itself has finished execution. Thus, the completion unit, together with the reorder or completion buffer, ensure that instructions that execute in an out-of-order manner are retired in the same order as they were originally provided. While both the PowerPC.TM. 604 and PentiumPro.TM. microprocessor solutions provide significant advantages over more traditional pipelining implementations, the use of the completion buffer in the PowerPC.TM. 604 and a reorder buffer in the Intel PentiumPro.TM. require a specific table or memory storage location to keep track of an original program order and require results to be completed in order. Such strict ordering is required to ensure that instructions which access the same resource of a data processor are correctly executed in an original ordering so that a correct result is obtained. Thus, these implementations require the use of an extra table for performing such reordering operations and are limited to completing instructions in order, as well. The use of the extra table requires additional circuit area overhead and, therefore, increases the costs associated with the data processor.
In addition to using the completion buffer to track an original program order, completion buffers typically have a completion pointer which indicates which resources should be released when an instruction is complete. As has been previously discussed, a dispatched instruction may not execute for a significant amount of time. In the interim, instructions which are found later in the program sequence may be executed. Since the completion buffer tracks all instructions which have been dispatched and since instructions are executed out-of-order, the completion buffer may have to store many instructions. The design of such a large completion buffer to meet cycle time and area requirements is difficult, at best.
Therefore, a need exists for a mechanism which reduces the complexity associated with large completion buffers so that there is greater flexibility and better use of resources within a data processing system.