A register file is found in many data processing units and comprises a plurality of registers coupled to an arithmetic/logic unit (ALU), the registers being employed for storing operands and results of arithmetic or logical operations such as floating point operations, various control operations, etc. The register file may be considered as a local store cache of high speed high performance random access memory (RAM). The greater the number of registers within the register file the greater is the amount of data that may be stored within the CPU itself. Thus, as the register file is made larger fewer accesses are required to be made to typically slower system memory to retrieve oeprands and to store the results of ALU operations. As such, it can be appreciated that any improvements in speed and efficiency that are achieved in regard to the operation of the register file and ALU have a direct impact upon the overall speed and processing efficiency of the CPU.
In some types of systems the register file is accessed (read) and updated (written) during a single CPU instruction cycle. However, as the CPU cycle time is reduced a problem is created in that there is not sufficient time for the register file to be accessed for an ALU operation and the result of the ALU operation written back to the register file during the same cycle. One solution to this problem is to access the register file during a first CPU cycle (cycle N) and to update the register file during a next consecutive CPU cycle (cycle N+1).
However, this solution creates a problem for those types of CPU instructions wherein a result generated during cycle N is required to written back to the register file during cycle N+1 and is also required to be used as an operand during cycle N+1. Such a condition occurs in a pipelined CPU wherein the execution of instructions are overlapped such that a second instruction is begun before the execution of a preceding first instruction is completed.
FIG. 1a illustrates in block diagram form a portion of a conventional CPU 1 pipeline having a multiplexer (MUX) 2. MUX 2 receives a first input from a CPU databus and a second input from a result (R) output of an ALU 4. Interposed between the MUX 2 and the ALU 4 is the register file 3. The register file 3 is comprised of a plurality of registers such as 16, 64, 128 or 256 registers. The number of bits (m) of the various data paths and the width of the individual registers varies between implementations and is usually within the range of eight to 128 bits. The register file 3 receives an update register address during a cycle N and an access register address during a cycle N+1. The update address is an address that selects a register wherein the ALU 4 result is written. The access address is an address that selects a register that is read out to either the A port or the B port and subsequently to the corresponding input of the ALU 4. The result (R) output of the ALU is directed back to the input of the multiplexer for updating a register within the register file. Of course, the R output of the ALU is typically also directed to a number of other circuits that are not shown in the simplified block diagram of FIG. 1.
In accordance with this conventional system and referring to FIG. 1b there is shown a first instruction that has the form A+B=B. That is, the operand stored within a register file location A is added to the operand stored within a register file location B and the result is written back to (updated in) register file location B. A next instruction is of the form C+B=D wherein one of the operands (B) is contained within the updated register from the previous instruction. In this case, in that the operations are pipelined within the CPU and execute in an overlapping manner with one another, the result of the first operation may not yet be updated within the register file when the second operation is begun. By example, an instruction of the form A+C=C followed by C+D=D presents the same problem in that the register file location to be updated (C) is also required as an operand for the second pipelined instruction.
It is therefore an object of the invention to provide an improved CPU arithmetic/logical pipeline wherein an ALU result is directly provided as an operand during an immediately subsequent pipelined operation without first being updated within a register file.
It is another object of the invention to provide an improved CPU arithmetic/logical pipeline that includes circuitry for bypassing a local operand store when an ALU result is required as an input to the ALU during a cycle wherein the ALU result is also required for updating a location within the local operand store.