1. Field of the Invention
The present invention relates to the field of use of register files. More particularly, the present invention relates to using additional bits in the register file to handle write-after-write hazards and eliminate bypass comparators.
2. Description of Related Art
Register files are arrays in processors that store one or more registers. In processors capable of processing more than one instruction at a time, it is common to associate with each of these registers a bit which indicates whether the data inside each respective register is either: (1) updated and ready to be used; or, (2) being modified or produced and therefore not available. This bit is termed a xe2x80x9cscoreboardxe2x80x9d bit.
For example, if a scoreboard bit for a particular register is set, then the next instruction which needs to access this register cannot execute until the scoreboard bit for this register has been cleared. To clear this register bit, a preceding operation (i.e., the operation that is generating/modifying the data to be placed/returned to this register) needs to complete execution. Thus, if a program were to (1) execute a LOAD of a first value and place it into a register R4; and (2) execute an ADD of the first value with a second value contained in a register R5; then there is clearly a dependency on the LOAD operation. The use of the scoreboard bit by a circuit to xe2x80x9clock-outxe2x80x9d access to a register that is being used is referred to as a xe2x80x9chardware interlock.xe2x80x9d The hardware interlock is used instead of placing the extra burden in software.
Thus, in a processor where there exists multiple execution units, and where one of the execution units has an operation that is waiting to be executed that depends on a result from a previous operation, the register that is waiting to receive the result is xe2x80x9clocked-outxe2x80x9d from being accessed until the register""s scoreboard bit is cleared. After the result has been placed into the register and the scoreboard bit has been cleared, the execution unit containing the waiting operation can access the data in the register.
In cases where an operation is waiting for a result to return from an execution unit, time can be saved by not having to wait for the result to be first placed into the register and then read out again by the waiting execution unit. Instead, bypassing is used to send the result to the waiting execution unit at the same time the result is sent to the registerxe2x80x94significantly speeding-up operations.
Bypassing is used where a processor contains some collection of data in a register file and also contains a set of execution units, each of which may take a varying amount of time to complete an operation. An execution unit can take a varying amount of time to complete an operation because, for example, the execution unit is a multicycle execution unit or because the processor has a pipelined implementation where no operation finishes immediately.
Without bypassing, an execution unit that is waiting for another operation to finish must wait until that operation is finished and the result sent back to the register file before reading the result out again. The execution unit must also wait until the scoreboard bit for the result is cleared and the result is read out before the instruction is issued. Thus, the time that elapses during the writing of the result into the register file and the reading out of the result again before the execution of the instruction that depends on the result adds additional delay.
FIG. 1 shows a prior art bypass circuit where a set of multiplexors (MUX) 12, 14, 22, and 24 is placed into a set of result return data paths 16 and 26. Set of result return data paths 16 and 26 returns results from execution units 10 and 20, respectively, to a register file 30 (no control circuit is shown in FIG. 1 for simplicity).
FIG. 1 contains a set of register file scoreboard bits 28 along with register file 30. The output of register file 30 is fed to MUX 12, MUX 14, MUX 22, and MUX 24. The output of MUX 12 is used as one input to execution unit 10, while the output of MUX 14 is used as the other input to execution unit 10. The output of MUX 22 is used as one input to execution unit 20, while the output of MUX 24 is used as the other input for execution unit 20.
The output of execution unit 10 is returned on a result return data path 16 to register file 30. Similarly, the output of execution unit 20 is returned to register file 30 on a result return data path 26. Note that result return data path 16 and result return data path 26 might also be used by other execution units not shown in the figure. In addition, MUX 12, MUX 14, MUX 22, and MUX 24 receive both the output from execution 10 and the output from execution 20 through the use of result return data path 16 and result return data path 26, respectively.
Thus, in FIG. 1, every input of every execution unit has one three (3) input multiplexor that provides, as input, either the output of the register file or the result that is returning on one of the two result return data paths. As described below, every execution unit may also be able to latch the values that appear on its inputs, to handle situations where all the inputs are not available simultaneously.
For example, if execution unit 10 is an adder which executes in one cycle and the next instruction, which is also an ADD instruction, needs the result, both operations can issue sequentially because the result from the first ADD instruction is written into the register file at the same time that result is bypassed into the adder again so that the subsequent ADD can use it immediately.
The output of each MUX selects the data from one of three inputs depending on which control line is active. The control lines come from the system described in FIG. 2, below.
FIG. 2 shows a bypass circuit 40 having a select register file control line (SRF) 66, a select B1 control line (SB1) 68, and a select B2 control line (SB2) 70 for determining from where an execution unit receives an operand. SRF 66, SB1 68, and SB2 70 are sent to one of the MUX""s of FIG. 1. Thus, each of the MUX""s in FIG. 1, specifically, MUX 12, MUX 14, MUX 22 and MUX 24, receive control signals SRF 66, SB1 68, and SB2 70 from a bypass control circuit similar to bypass control circuit 40. A scoreboard bit line, coming out of register file 30, in FIG. 2 provides the value of the scoreboard bit for the particular register being accessed for determining whether to use the value from the register file or a value from one of the result return data paths.
Bypass circuit 40 also contains a first comparator 50 and a second comparator 60. One of the inputs for both first comparator 50 and second comparator 60 indicates the operand register address of the operand for which the current operation is waiting. For first comparator 50, the other input is the result return data path 16 register address, which indicates the register file address into which the result contained on result return data path 16 is returned after first execution unit 10 has completed the previous operation. For second comparator 60, the other input is the result return data path 26 register address, which indicates the register file address into which the result contained on result return data path 26 is returned after second execution unit 20 has completed the other previous operation.
First comparator 50 and second comparator 60 both operate in the same manner, which is to output a logical one if both inputs are equal. For example, if the operand register address is equal to the result return data path 16 register address, then first comparator 50 outputs a logical one.
The output of first comparator 50 is received by a first AND gate 52. First AND gate 52 also receives the output of a NOT gate 64. Similarly, the output of second comparator 60 is received by a second AND gate 62. Second AND gate 62 also receives the output of NOT gate 64.
The input to NOT gate 64 is the scoreboard bit line, which, as indicated above, provides the value which comes from one of the scoreboard bits from register file scoreboard bits 28. Specifically, the scoreboard bit used is the one associated with the register data being requested by the execution unit.
During operation of the circuit of FIG. 2, if the scoreboard bit coming out of register file scoreboard bits 28 indicates the operand is to be retrieved from register file 30, then the value coming out of the scoreboard is used, as SRF has a value of a logical one. If the scoreboard bit coming from register file scoreboard bits 28 is a logical one, representing that the data in register file 30 is not valid, then the MUX uses the result coming from one of the result return data paths, depending on the output of bypass control circuit 40. Effectively, these three control lines (SRF66, SB1 68, and SB2 70) together determine whether a valid result is available for the operation and thus allows the processor to issue the instruction and let the instruction execute.
The operand address comes from the instruction word and is the register address where the desired operand for the operation is located. For example, if an instruction is to add the value in register file 30 at location 4 to the value in register file 30 at location 5 and there is no valid data in register file 30 at location 4, then the execution unit executing the instruction waits until it detects a value destined for register file 30 at location 4 being returned on a result return data path before beginning to execute.
A comparator is needed for each possible destination bus to execution unit input combination as any execution unit can be waiting for any result return data path for a result. Therefore, in FIG. 1, where there are two result buses and four total operand inputs, eight comparators are needed because the bypass logic, consisting of two comparators per execution unit input, one for each bus, has to be duplicated for each of these locations.
Generally, the number of comparators increases as the product of the number of execution units and the number of result return data paths. The number of return paths may increase with the number of execution units, to allow all or most of the execution units to be producing results simultaneously. This would lead to the number of comparators increasing as a square factor of the number of execution units. For example, if the number of execution units is doubled, the number of comparators might increase by a factor of 4.
An apparatus including a set of data storage units having a set of scoreboard bits associated with the set of data storage units. The apparatus also includes a first execution unit having an output coupled to the data storage unit and a first input; a first switching unit having an output coupled to the first input of the first execution unit and a first input coupled to the output of the first execution unit; and, a first bypass control unit coupled to the first switching unit. The first bypass control unit is configured to cause the first switching unit to couple the output of the first switching unit to the first input of the first switching unit based upon the set of scoreboard bits. The system also provides a method including the steps of receiving a first instruction; and, storing a first address location and a first access path specifier for a first operand associated with the first instruction; wherein the first access path specifier indicates a source of the first operand.