The present invention relates to electronic data processing, and more specifically concerns an organization for general-purpose register files in superscalar or very long instruction word (VLIW) processor architectures having a large number of execution units connected to the same registers.
Semiconductor process trends indicate that transistor gate delays are decreasing at a rate significantly faster than signal-transmission delays through the conductors joining the transistors. As a result, the cycle time of the next generation of microprocessor chips will be increasingly limited by interconnection hardware structures, rather than by transistor structures as in the past.
One important structure required by all microprocessors is a file of general-purpose or architectural registers. Register files in modern processors, especially those in superscalar, VLIW, and other regularized architectures, are dominated both in timing and in chip area by the metal interconnections required for data and address lines. This situation becomes even worse because of the increasing parallel-execution width of present and future designsxe2x80x94because of the larger number of instructions that can be executed in parallel. The importance of interconnect area and delay in large regular structures such as register files has not been appreciated in the past.
Some approximations employed in the industry characterize chip real estate by a small set of parameters: the number of registers in a register file, the size (number of bits) of each register, and the number of ports in the register file, usually three or four times the execution width of the processor. Parallel-execution width depends upon the particular computer technology, but wider is better to exploit instruction parallelism. The number of bits in each register is dictated by architectural considerations. The area of a large, metal-limited register file increases roughly linearly with the number and size of the registers, but rises much faster with the number of ports. The latency or delay time of a register file is also roughly proportional to the number of ports. That is, the large register files required by modern architectures and allowed by new transistor technology reach a state of diminishing returns with respect to the number of ports in a register file.
In order to obtain maximum benefit from the latest semiconductor processes, which speed up transistors more than interconnects, microprocessor designers desire to limit performance by transistor-dominated structures rather than by metal-dominated ones. That is, the register file must be taken off the critical path that limits the performance of the entire processor. The desire for wider machines with increased parallelism, however, exacerbate the register-file problem by growing the register file much more than linearly. Thus, there is a pressing need for highly ported register files that are less dominated by their interconnection area and latency time.
The invention employs multiple copies of a register file in a processor having a number of execution units that access the register file. Each group of execution units can read from and write to its own copy of the file registers by a set of local read and write ports. In addition, all of the register-file copies are synchronized by writing data to remote write ports in the other copies of the register file. The interconnections between the execution units and the register-file copies thus grow less rapidly than they otherwise would, and the difference becomes greater as the execution width of the machine increases.
In one embodiment, not all of the registers are writable by the remote write ports. Each file copy is divided into local and global registers. While all copies of the global registers continue to be written by the remote write ports, the local registers can be written only by a local cluster of execution units. Other embodiments divide the registers into global and local according to other criteria.