The function of a microprocessor is to execute programs. Programs comprise a group of instructions. The processor fetches and executes the instructions from memory or a known storage location. The processing of the single instruction can be divided into several distinct steps or stages: instructions must be fetched, instructions must be decoded, the operands must be assembled, the specified operation must be performed, and the results must be written into their destination. The processing of instructions is controlled by a periodic clock signal, the period being the processor cycle time.
Processor performance can be improved by reducing the time it takes to execute a program. One technique for increasing the performance is by overlapping the steps involved in executing several instructions. This technique is called pipelining. Each step in the pipeline completes a portion of the execution of an instruction. Each of the steps in the pipeline is called a pipe stage. Each pipe stage is separated by clocked registers or latches. The steps required to execute an instruction are executed independently in different pipeline stages provided that there is a dedicated part of the processor for each pipe stage. The result of each pipeline stage is communicated to the next pipeline stage via the register between the stages. Although pipelining does not decrease the total amount of time required to execute an instruction, it does reduce the average number of cycles required to execute a program, by permitting the processor to handle more than one instruction at a time.
Superscalar processors issue multiple instructions at a time. In this manner, a processor with multiple execution units can execute multiple instructions concurrently. This type of superscalar processor performs concurrent execution of instructions in the same pipeline stage, as well as concurrent execution of instructions in different pipeline stages. One basic design approach is to have separate integer and floating-point execution units, such that there are separate integer and floating-point pipelines.
Integer pipelines may be divided up into five stages: prefetch, decoding, address generation and operand fetch, execution, and writeback. During the prefetch stage, the integer instructions are fetched from memory or an instruction cache. During the decoding stage, the instructions fetched must be decoded in order to elicit the operation of the instruction and gather any necessary operands. Once decoding has been completed, the operation of the instruction is carried out (i.e., executed) during the execution stage. After the execution stage has been completed, the results produced by performing the operation are stored in memory or a known space (e.g., a register file) during the writeback stage.
A floating-point pipeline for executing floating-point instructions consists of essentially the same pipeline as the one utilized with integer instructions, with one important difference. The execution of the operation during the execution stage may require multiple cycles as opposed to typical integer operations requiring one cycle (i.e., cycle latency). Thus, several execution stages may be needed to complete an operation, wherein the number of execution stages varies depending on the operation. Since multiple instructions are being executed at the same time in the pipelined processor, the latency can adversely affect the execution of instructions. Because the execution of a floating-point instruction usually has a longer latency in comparison to the execution of integer instructions, it is possible that a later integer instruction would complete execution before the execution of an earlier floating-point instruction has completed.
On a conventional microprocessor architecture, instructions written by the programmer in a certain sequential order must be completed in precisely that order. Furthermore, exceptions have to be handled in a manner bearing well defined relation to the instruction sequence. For example, an exception may be defined to be handled immediately before the execution of the next sequential instruction. Given that the execution pipeline for a floating point instruction has more stages than an integer pipeline, it is possible that processor state will be updated by an integer instruction prior to the completion of a previous floating point instruction. Thus, there is a need to halt execution of instructions in the pipeline to accommodate possible exception handling.
There is also a need to handle exceptions during the execution of floating-point instructions. The exception must be handled and any action required to deal with the exception must be accommodated in the pipeline structures. Typically, some of the pipelined instructions following the floating-point instruction having the exception may have to be flushed. This includes flushing the pipeline of subsequent instructions as well, when the handling of the exception indicates that the current instructions being executed should not be in the pipeline.
The latency in the execution of the floating-point operations also impacts performance when there are data dependencies. A data dependency exists when an instruction in the instruction stream requires the result of a preceding instruction for its execution. If a subsequent instruction requires the result of a previous instruction, there is a need to stall the subsequent instruction, as well as the pipeline, to allow the later instruction to complete execution.
Moreover, the latency of cycles during execution delays any possibility of storing data results until execution is completed. However, certain operations, such as loading data from memory or a known space do not require multiple execution cycles. These could occur early in the pipeline. Thus, there is a need to allow for certain instructions to be able to writeback their results before the writeback stage and before the execution stage latency period has elapsed.
As will be seen, the present invention provides a floating-point pipeline for executing floating-point instructions. The pipeline includes a stage for handling exceptions and stalling the pipeline when an exception exists and a stage for fetching source operands from memory or the register file. The present invention also permits flushing of the integer and floating-point pipelined structures when exceptions arise. The present invention also allows for some writing back of results before the writeback stage.