1. Field of the Invention
The present invention generally relates to register allocation and more specifically to avoiding access conflicts for registers in a multi-banked register file.
2. Description of the Related Art
To achieve high data processing throughput, processors typically feature a large number of cores, each capable of supporting many threads. Rather than minimize memory latency through the use of caches, these highly-parallel processors instead hide latency to main memory by running many simultaneous threads. Sustaining many threads requires large register files, because the state for each thread must be local to a core to minimize latency.
Building a single, monolithic, multi-ported register file of such a size is impractical for area, power, and performance reasons. Thus, modern large register files are typically partitioned in some manner. One possibility is to statically partition the register file into separate per-thread partitions. The disadvantage of this approach, though, is that it disallows a continuum of processing configurations from few, high-register-count threads to many, small-register-count threads. Also, different threads may have different register usage patterns, but under a thread-partitioned hardware scheme a high register usage thread cannot utilize unused registers from another thread's partition.
An alternate possibility is the use of banked register files in which the register file is divided into multiple smaller banks, each of which can be independently accessed. Typically, each bank has only a single read and write port. This approach is substantially more area-, power-, and latency-efficient than building a large multiported register file. However, banked register files introduce complexity due to the need for a switch to connect each of the different register banks to each of the different inputs to one or more function units. The switch facilitates interconnection between different register file outputs and function unit inputs and consumes significant area and power.
In addition to the switch, a straightforward method to enable full connectivity between function units and banks also requires staging registers. The staging registers compensate for the constraint that at most one operand can be read from each bank on a given clock cycle by holding function unit inputs in the case of conflicts. For instance, if a function unit requires two or more input operands from the same bank the two input operands are read from the same bank during different clock cycles and buffered by the staging registers. The staging registers also require a scheduling mechanism to sequence data into the staging registers. Staging registers not only consume area and power, but also require support for multi-cycle operations that have longer latency.
Accordingly, what is needed in the art is a system and method for reducing size of the switch and staging registers that are needed to eliminate register bank conflicts for banked register files.