Storage spaces in a computer system or other processor-based system are typically partitioned into memory and registers. Conventional register file configurations are described in, for example, M. J. Flynn, “Computer Architecture: Pipelined and Parallel Processor Design,” Jones and Bartlett Publishers, Boston, Mass., 1995, and G. A. Blaauw and Frederick P. Brooks, “Computer Architecture: Concepts and Evolution,” Addison-Wesley, Reading, Mass., 1997, both of which are incorporated by reference herein.
A given register file may be a so-called “general purpose” register file, which typically refers to a register file utilizable for storage of intermediate or otherwise temporary results associated with multiple instruction functions within the processor. Historically, only one instruction would be actively accessing a general purpose register file per processor cycle, such that the number of required register ports was minimal. However, modem processors typically have many instructions active in a given processor cycle, and thus multiple register file accesses per processor cycle. For example, a multithreaded processor provides high concurrency through simultaneous execution of multiple distinct instruction sequences or “threads,” with temporary results being stored in register files.
These and other similar arrangements in modern processors can result in a substantial increase in the “port pressure,” that is, the number of required register file ports. Unfortunately, a significant problem associated with register file port pressure is that an increase in the number of register file ports also substantially increases the power dissipation of the processor. Typically, the power consumption associated with register file ports is primarily attributable to the write ports of the register file.
An example of a modern processor which includes a general purpose register file is the Texas Instruments (TI) C64x, described in the TMS320C6000 CPU and Instruction Set Reference Guide, SPRZ168B, www-s.ti.com/sc/psheets/sprz168b/sprz168b.pdf, which is incorporated by reference herein. The TI C64x utilizes a type of Very Long Instruction Word (VLIW) architecture in which up to eight instructions per cycle can issue, with one instruction controlling one execution unit of the processor. The processor register file includes 64 registers. Configuring the C64x such that each instruction can access all 64 registers requires 26 read ports and 18 write ports, for a total of 44 ports. However, since such an arrangement is intractable, the designers of the C64x instead decided to split the register file access in half, thereby placing restrictions on the programmer, while dividing the ports between the two halves. Nonetheless, the C64x still requires a total of 44 ports.
Reducing port pressure is thus an important aspect of modern processor design, particularly for multithreaded processors and other processors in which many instructions may be active in a given processor cycle. A need exists in the art for techniques for providing reductions in port pressure, so as to decrease processor power consumption, without impacting the desired level of concurrency.