1. Field of the Invention
The present invention generally relates to digital computers and more particularly to pipeline processing and pipeline processors.
2. Description of the Prior Art
A typical prior art pipeline architecture is shown in FIG. 1. It includes five stages in the pipeline data path. In sequence the pipeline has first an instruction fetch stage into which an instruction on the instruction bus is strobed during each pipeline processor clock cycle. Next is an instruction decode stage in which the instruction read into the instruction fetch register is decoded during the next clock cycle. An ALU stage is next. It executes the decoded instruction during the next clock cycle. The ALU is used to calculate arithmetic results (including comparison of operands for conditional branch) and to calculate memory addresses as, for example, in the case of a load word from memory instruction. The number of cycles which must intervene before the load operand from the memory store is available for use in a subsequent instruction is called load latency, and is a function of the access time of the memory. Systems usually have a load latency of no more than one clock cycle. Code optimization support software can often fill a single latency cycle with a useful instruction.
The next stage is a fill stage to accommodate a one-cycle data fetch load latency. It provides a register through which data moves in a single cycle so that all instructions can be processed in the same number of steps regardless of whether or not the instruction requires a memory access. The final stage in the pipeline is a write back stage. It takes one cycle.
Bypass logic allows data in the pipeline to be used in the execution of subsequent instructions before the data reaches the final stage. An operand register associated with the ALU stores operands for ALU operations.
The prior art recognizes that a major impediment to pipelined computer efficiency is the fact that the instruction decode is often followed by address preparation. This sequence takes approximately the same time as the instruction fetch, thus causing the operand fetch to occur just when it would be desirable to initiate the next instruction fetch for effective pipeline operation. In addition to fetch cycles conflicting, the address preparation and execution cycles also conflict.
U.S. Pat. No. 4,613,935 ('935) describes the use of two ALUs so that address preparation does not have to wait for instruction execution or vice versa as one solution to the problem. In the '935 patent there is no suggestion of using the additional ALU for any function other than address calculation.
In certain systems the memory access has a load latency of two clock cycles. It is not practical to fill two load latency cycles with useful instructions so that with a load latency of two cycles there are an excessive number of no-operation instructions which reduce the overall efficiency of the pipeline operation. Prior art proposals for adding an address calculation adder stage ahead of the ALU in the pipeline are not altogether satisfactory. For example, the results of a subtract instruction, or other arithmetic or logical operation, may be necessary in order to calculate a memory address in the execution of a subsequent LOAD instruction. If subtraction is performed in the ALU stage of the pipeline the result will not be available for the next LOAD instruction until after a one-cycle delay. Similarly, with respect to conditional branch instructions, a comparison of operands must be made as soon as possible in order to minimize delay caused by the pipeline.