In typical high-performance, superscalar microprocessors, one technique to improve performance is register renaming, in which logical registers referred to by instructions are mapped onto a larger set of physical registers. Mapping physical register to logical registers helps eliminate false dependencies that would exist in the logical register mapping. Traditionally, structures such as a register alias table (RAT) store the logical-to-physical mappings, whereas another structure, such as a freelist table (“freelist”), would hold the unused or “free” physical registers until they are allocated and used by the rename unit.
In multi-threaded processors, for example, which have the ability to execute several instruction streams (“threads”) concurrently, a technique for allocating physical registers from the freelist may use either a hard-partitioned freelist or shared one. A shared freelist technique usually requires a larger freelist table and associated logic but has a performance advantage of having all of the registers within the freelist available for one active thread if the processor is running in single-thread mode. A hard-partitioned freelist technique requires less hardware but can constrain performance, because the number of registers per thread is fixed.
An example of a prior art shared register allocation technique for a two-threaded processor is illustrated in FIG. 1. When a register is allocated for either or both threads, it is read from the freelist 105 and written into the appropriate RAT 110 as a renamed register. Furthermore, a separate structure such as a re-order buffer (ROB) 115 tracks allocated registers so that they can be returned to the freelist when no longer needed.
One short-coming of the prior art shared register allocation technique illustrated in FIG. 1 is the fact that one thread or other group of instructions or micro-operations (“uops”) may deprive other threads or uops from physical registers for periods of time, thereby preventing the other threads or groups of uops from completing tasks until more physical registers are available in the free list.
A prior art example of a partitioned register allocation technique is illustrated in FIG. 2. The partitioned register allocation technique of FIG. 2 allocates specific registers to specific threads or groups of uops, and this allocation does not change. Furthermore, if a thread or group of uops to which a group of registers has been assigned is dormant, the assigned registers are unused, wasting physical register space.