Field of the Invention
The present invention relates to the field of computer system, and more specifically, the present invention relates to register data processing.
Description of the Related Art
A modern computer system typically contains a plurality of execution units and a register file where the register file is generally an integral part of processing unit. The register file is normally a set of general purpose registers where each register may either be 16, 32, or 64 bits wide. One function of the register file or general purpose registers is to provide a temporary storage location during data processing. For example, an instruction unit may use general purpose registers to store addresses and data during an instruction fetch operation. It should be noted that the general purpose registers are frequently accessed during executions and, consequently, the rate of access to the register file is typically high.
One conventional approach to reduce the access rate to the register file is to employ a bypass circuit. A typical bypass circuit allows data to be distributed to various execution units before the data is stored in the register file.
FIG. 1 illustrates a conventional layout of a register file system 100. The register file system 100 includes a register file 102, a bypass circuit 104, and execution units 1, 2, 3. The register file 102 typically contains 16, 32, or 64 general purpose registers where each general purpose register could be 8, 16, 32 or 64 bits wide.
A bypass circuit 104 allows data to bypass the register file 102 so that the data can be used by an execution unit at the next clock cycle. For example, an output data from execution unit 1 can be bypassed back to execution unit 1 through the bypass circuit 104. In other words, an output data of an execution unit can become an input data to an execution unit at the next clock cycle.
A problem with the conventional approach is time delay that occurs during bypasses. Time delay generally includes bypass circuit delay and wire (or bus) delay where the bypass circuit delay is typically similar to gate delay. A gate delay traditionally takes a small portion of a clock cycle. The wire delay includes long wire delay and short wire delay where short wire delay normally takes a small portion of a clock to perform. However, the long wire delay traditionally takes a large portion of a clock to perform. Thus, long wire delay reduces the overall system performance.
Referring back to FIG. 1, short wire 116 and 122 typically cause short wire delays because the bypass circuit 104 and execution unit 1 are physically located close to each. On the other hand, long wire 112, 114, 120, and 122 may render long wire delays because the distances between the bypass circuit 104 and execution unit 2 and 3 are long. Long wire 112 may contain a plurality of transmission wires to transfer data from the bypass circuit 104 to execution unit 3 while long wire 114 may also contain a plurality of transmission wires to transfer data from the bypass circuit 104 to execution unit 2. Both long wire 114 and 112 may cause long wire delays as indicated by alpha .alpha. during data transactions. Likewise, long wire 120 may also contain a plurality of transmission wires for transferring data from execution unit 2 to the bypass circuit 104. Similarly, long wire 122 transfers data from execution unit 3 to the bypass circuit.
Both long wire 120 and 122 typically cause long wire delays, as indicated by beta "B" in FIG. 1, during data transactions. Each long wire delay typically allocates a large portion of a clock cycle for delay and consequently, long wire delays (or alpha and beta delays) are not desirable because such delays decrease overall system performance.
Another problem with the conventional approach is poor work load distribution between clock cycles. Work load typically refers to all work to be accomplished within a clock cycle. For example, a work load includes handling delays as well as functional executions. Poor work load distribution generally renders lower system performance.
Therefore, it is desirable to have a scheme of improved work load distribution and reducing long wire delay to enhance system performance. As can been seen, an embodiment of a register file system having at least one local bypass circuit provides an improved work load distribution and at the same time, reduces long wire delays.