A central processing unit of a data processing machine typically consists of three logical units: an instruction unit, a storage unit and an execution unit. The instruction unit fetches instructions from storage, decodes the instructions, generates addresses for operands and specifies the operations by generating execution unit opcodes to be performed by the execution unit. The storage unit maintains a cache for instructions and operands, retrieves instructions from either the cache or a main storage facility in response to instruction unit requests and supplies operands to the execution unit. Further, the storage unit stores results supplied by the execution unit into the cache. The execution unit performs an operation specified by the execution opcodes generated upon decoding of the instruction, and generates results for supply back to either the instruction unit or the storage unit.
The central processing unit is typically implemented in a pipeline manner. For instance, the system in which the present invention is implemented includes a five-stage pipeline processor. The five stages are:
(1) D-Cycle
The instruction to be executed is decoded by the instruction unit.
(2) A-Cycle
The addresses of operands for the instruction are generated by the instruction unit and passed to the storage unit.
(3) B-Cycle
The operands are fetched from either instruction unit general purpose registers or the storage unit cache.
(4) X-Cycle(s)
The execution opcodes corresponding to the operation specified by the instruction are executed. More than one X-Cycle may be necessary depending on the complexity of the execution operation.
(5) W-Cycle
The result of the instruction is supplied to the storage unit for storage in the cache or to the instruction unit for storage in general purpose registers.
When no interlocks exist in the pipeline and only one X-Cycle is required for execution of the execution opcode, the pipeline flows evenly with an execution opcode delivered to the execution unit in the A-Cycle, the operand or operands delivered in the B-Cycle, the execution opcode executed in the X-Cycle and the result returned in the W-Cycle.
If the execution unit requires multiple X-cycles to execute the execution opcode, the pipeline interlocks until the operation is completed. Under these conditions, the instruction unit may deliver the execution opcode for the following instruction to the execution unit in the B-Cycle rather than the A-Cycle.
If the storage unit is unable to deliver an operand within one B-Cycle, such as when the data required for the operand is not resident in the cache, the execution unit waits in the X-Cycle until the operand is available.
The storage unit typically operates according to a separate pipeline from the instruction unit pipeline, or a queue, that is designed to make the most efficient use of storage unit resources without impacting the flow of the instruction unit pipe. When the storage unit is unable to deliver an operand within one B-Cycle of the instruction unit pipe, then the address for the operand is loaded into fetch ports in the storage unit and the data is retrieved from the main storage system, or other storage unit resources are utilized to make the data available. When the data becomes available through the cache, the address is read from the fetch port through the storage unit pipeline and supplied to the instruction unit. When the operand is received by the instruction unit, the instruction unit pipe is freed from the interlock and continues processing.
It is desirable that the storage unit pipeline be utilized in a manner that minimizes the effect of interlocks in the instruction unit pipe which occur due to storage unit accesses. The fetch ports accomplish this goal to a certain degree by shortening the length of interlocks caused by the storage unit. However, it is desirable to further increase instruction unit pipeline throughput by reducing the number of interlocks caused by storage unit accesses, as well as the length of the interlocks.