In superscalar microprocessors, performance may be improved by using register renaming techniques, in which logical registers referred to by instructions are mapped onto a larger set of physical registers. The physical register mappings for an instruction may be assigned, or “allocated”, in or around the renaming stage of the microprocessor pipeline, and may remain allocated until the corresponding instruction is retired.
In superscalar microprocessors containing long pipelines, the number of clock cycles between the renaming stage and the retire stage can be substantial. In order to achieve high-performance, the pipeline needs to remain filled with instructions as much as possible. Architecture features, such as branch prediction, may be used to keep the pipeline filled with instructions, which requires numerous physical registers to be allocated and de-allocated within the pipeline simultaneously. Furthermore, an instruction may require numerous physical registers, which require a large amount of available physical register space. Managing such a large physical register pool can require high-speed circuits that can occupy a relatively large die area.
Various register allocation/de-allocation methods have been implemented in microprocessors, one of which is the “ad hoc” algorithm. FIG. 1 illustrates the “ad hoc” register allocation/de-allocation algorithm. In the “ad hoc” algorithm, as a group of instructions are renamed, destination registers used by the instructions are counted. The destination registers are then allocated and removed from a register pool and used during the executing of the renamed instructions. Once the renamed instructions are retired, the allocated registers are de-allocated and returned to the pool.
At least two characteristics of the “ad hoc” method of register allocation/deallocation make it undesireable within high-performance microprocessors. First, the number of registers required by an instruction group must be determined before the register allocation can occur, which lengthens the rename stage of the pipeline and impacts overall microprocessor performance. Secondly, for microprocessors in which many instructions may be in the renaming stage at once, the number of register destinations corresonding to each group of instructions can be large, further degrading microprocessor performance. In FIG. 1, for example, between zero and eight registers must be allocated and removed from a pool of 256 registers each cycle.
One approach to improving the “ad hoc” method of register allocation/deallocation is by allocating registers in groups, or “blocks”. In the block register allocation method, the register are grouped into blocks which are then allocated/de-allocated as atomic units. In FIG. 1, for example, instead of allocating each registers individually, a block of eight registers would be allocated simultaneously, such that only 32 allocation operations (32×8=256) are performed to allocate 256 registers in a worst case. A disadvantage of the block register allocation method is that if not all registers of a block are needed by an instruction group, the unneeded registers are wasted.
Another register allocation method is a “worst case” register allocation method, in which the number of registers allocated during each cycle is equal to the maximum number of regisers that could be required by an instruction group. For example, in FIG. 1, if an instruction group 100 contained eight instructions, and each instruction can write two destinations, then the “worst case” register allocation method would allocate 16 registers during each renaming cycle. After the renaming cycle, any registers that were not needed by the instruction group may be returned to the register pool 105. Although this method facilitates high-performance microprocessor architecture design, it requires an extensive amount of circuitry due to the dynamic nature of the grouping of the registers.