1. Field of the Present Invention
The present invention generally relates to the field of microprocessors and more particularly to a microprocessor utilizing a non-stalling execution pipeline for improved performance.
2. History of Related Art
The use of pipelined architectures in the design of microprocessor systems is well known. Pipelining improves performance by overlapping the execution of multiple instructions. In a pipelined microprocessor, the execution of each instruction occurs in stages, where each stage ideally completes in one clock cycle. Additional information concerning pipelining is available in Hennessy and Patterson, Computer Architecture a Quantitative Approach, pp. 125-214 (Morgan Kaufinann 2d ed. 1996). Turning to FIG. 3, a simplified representation of an execution pipeline 300 in a conventional processor is presented. Pipeline 300 includes a set of latches or registers 302a, 302b, etc. (collectively or generically referred to herein as latches 302). Each latch 302 represents the termination of one pipeline stage and the beginning of another. In FIG. 3, pipeline 300 is full such that each latch 302 contains information corresponding to an instruction that is proceeding through the pipeline. Each stage of pipeline 300 includes a functional logic block, represented in FIG. 3 by reference numerals 304a, 304b, etc., that defines the operation of the corresponding pipeline stage.
If an instruction flowing through a pipeline 300 generates an exception at any stage, the pipeline must be stalled so that instructions in the pipeline do not collide. FIG. 3 indicates a stall condition signal 306 generated by logic block 304a. Stall condition signal 306 indicates that logic block 304a is unable to successfully complete its assigned function with respect to the current instruction (Instruction A) within the single cycle timing constraint. Because Instruction A did not complete successfully, it is necessary to retain Instruction A in latch 302a for at least one more cycle. In addition, it is also necessary to route stall signal 306 to preceding pipeline stages so that instructions corresponding to each of the preceding stages are not advanced in pipeline 300.
In a conventionally designed pipeline such as pipeline 300, an instruction is stalled by feeding the output of each latch 302 back to the latch""s input. These feedback loops are indicated in FIG. 3 by reference numerals 308a, 308b, etc. Accordingly, each latch 302 can receive its input from a one of two sources, namely, the output of the preceding stage or the output the latch itself. In a typical configuration, this dual input feature is accommodated using a multiplexer corresponding to each bit of a latch 302 as depicted in FIG. 4. FIG. 4 illustrates the output of a bit 310 of a latch 302 being routed back to one of the inputs of a multiplexer 312k. The other input to multiplexer 312k is received from the output of a preceding stage in pipeline 300. The stall signal 306 serves as the select input to mux 312k. It will be appreciated the structure of FIG. 4 is repeated for each bit position in latch 302 and that the number of multiplexers 310 that stall signal 306 is required to drive increases with the number of bits in latch 302. In addition, stall signal 306 must be routed to preceding stages to stall instructions in preceding latches. This routing may require signal 306 to travel a considerable distance over an interconnect with an associated capacitive loading. The combination of the number of multiplexers 312k being driven by signal 306 and the distance that signal 306 must travel limit the minimum time required for stall signal 306 to stall pipeline 300. For processors with wide pipelines (i.e., 64 bits or more), operating a high frequencies (i.e., frequencies in excess of 1 GHz) stall signal 306 may be unable to successfully halt the pipeline in a single cycle. Therefore, it would be desirable to implement a processor with a wide execution pipeline capable of high speed execution free from the constraints imposed by the need to accommodate pipeline stalls.
The problem identified above is addressed by a microprocessor and related method and data processing system are disclosed. The microprocessor includes a dispatch unit suitable for issuing an instruction executable by the microprocessor, an execution pipeline configured to receive the issued instruction, and a pending instruction unit. The pending instruction unit includes a set of pending instruction entries. A copy of the issued instruction is maintained in one of the set of pending instruction entries. The execution pipeline is adapted to record, in response detecting to a condition preventing the instruction from successfully completing one of the stages in the pipeline during a current cycle, an exception status with the copy of the instruction in the pending instruction unit and to advance the instruction to a next stage in the pipeline in the next cycle thereby preventing the condition from stalling the pipeline. Preferably, the dispatch unit, in response to the instruction finishing pipeline execution with an exception status, is adapted to use the copy of the instruction to re-issue the instruction to the execution pipeline in a subsequent cycle. In one embodiment, the dispatch unit is adapted to deallocate the copy of the instruction in the pending instruction unit in response to the instruction successfully completing pipeline execution. The pending instruction unit may detect successful completion of the instruction by detecting when the instruction has been pending for a predetermined number of cycles without recording an exception status. In this embodiment, each entry in the pending instruction unit may include a timer field comprising a set of bits wherein the number of bits in the time field equals the predetermined number of cycles. The pending instruction unit may set, in successive cycles, successive bits in the timer field such that successful completion of an instruction is indicated when a last bit in the time field is set. In one embodiment, pending instruction unit includes a set of copies of instructions corresponding to each of a set of instructions pending in the execution pipeline at any given time. In various embodiments, the execution pipeline may comprise a load/store pipeline, a floating point pipeline, or a fixed point pipeline.