1. Field of the Invention
The invention relates generally to processors and computers. More particularly, the invention relates to allocating registers in an out-of-order, superscalar machine.
2. Description of the Related Art
In a superscalar microprocessor (also referred to herein as a "machine"), decoders produce a sequence of simpler hardware executable micro-operations ("micro-ops" or ".mu.-ops") from a sequence of instructions (i.e., from macro-instructions) received from a fetcher. A typical feature in such machines (e.g., in an out-of-order, superscalar machine) is register renaming by a renamer. The renamer renames temporary destination registers of each micro-op from the decoders by allocating a previously de-allocated physical register (resource) in the machine to replace each of these temporary destination registers. To perform register renaming, the machine must have a pool of unused physical registers that can be allocated for each instruction that writes a register. As long as additional physical registers are available, the renamer continues to assign them for the temporary destination registers. The renamer may also record data on dependencies between the micro-ops and on the reassignment of additional physical registers in a dependency table. By so doing, the renamer can remove false dependencies from a sequence of instructions. The renamer assigns the renamed micro-op entries in a reorder buffer or queue and sends the micro-ops to a scheduler. The scheduler assigns instructions for execution in an order that may not follow the original order of the instruction sequence.
An allocator, included as part of the renamer, determines which physical registers are free and sends them to other portions of the renamer for assignment. Typically, a bit is reserved in an allocation or register status bit vector for each register and the status of that bit is an indication of whether the physical register is free or not. For example, a 32-bit register stat us bit vector may be used for a system having 32 registers (that have 2.sup.32 possible allocation/de-allocation states). A 1 value for a bit in the 32-bit register status bit vector may represent a free physical register and a 0 value may represent a physical register that is unavailable because it is already being used. To determine which physical registers are free, the allocator may have to search for the first occurrence of a bit value of 1 in the 32-bit register status bit vector. Such a search could be performed from the least significant bit ("LSB") side of the 32-bit register status bit vector, or from the most significant bit ("MSB") side, or from both sides. When a bit is found having a value of 1, the identification ("ID") of that physical register is sent to the other portions of the renamer.
In machines having, for example, a 128-bit register status bit vector instead of a 32-bit register status bit vector (a 128-bit register status bit vector may be used for a system having 128 registers having 2.sup.128 possible allocation/de-allocation states), it may be necessary to find four physical registers that are free for allocation each clock cycle. In such systems, the 128-bit register status bit vector may be searched for two bits having values of 1 from the LSB side and two other bits having values of 1 from the MSB side. A problem would occur, however, if only one physical register were available for renaming, in which case, two of these searches would provide no results (i.e., they would be empty), and the other two searches would provide the same result.
Considering again a machine having a 32-bit register status bit vector, the vector may be latched in a 32-bit wide flip-flop ("FF"). One de-allocated physical register may be allocated each clock cycle. A so-called "leading one" detector or "find first one" ("FF1") detector is used to identify the physical register that has been de-allocated by examining the 32-bit register status bit vector. The leading one detector converts all 1s to 0s except the least significant 1, which represents the physical register that is to be allocated. The resulting 32-bit vector is termed a 1 "hot" vector and represents a decoded binary number. This decoded binary number is typically encoded as a 5-bit binary number ID for the physical register that is to be allocated. Similarly, a 128-bit 1 "hot" vector represents a decoded binary number and would typically be encoded as a 7-bit binary number ID. More generally, an N-bit 1 "hot" vector represents a decoded binary number and would typically be encoded as an n-bit binary number ID, where N=2.sup.n (and, hence, n =log.sub.2 N). Typically, n is an integer and N is an integral power of 2. However, if N=2.sup.(n+.alpha.) (and, hence, (n+.alpha.)=log.sub.2 N), with 0&lt;.alpha.&lt;1, then such an N-bit decoded binary number may be encoded as an (n+1)-bit binary number ID, as is well-known.
Once the ID is available, a status register is updated, the 1-valued bit is cleared, and the ID is allocated (i.e., the ID is sent to another portion of the renamer for allocation of the corresponding physical register). The 1 of the leading one detector is inverted, with a value coming out of the register, and the leading one detector is latched for the next cycle.
For example, an 8-bit register status bit vector 01011100 indicates that registers R.sub.0, R.sub.1, R.sub.5 and R.sub.7 are unavailable because they have already been allocated and are being used, reading the 8-bit register status bit vector 01011100 from right to left, from LSB to MSB. However, registers R.sub.2, R.sub.3, R.sub.4 and R.sub.6 are available because they have not yet been allocated and are free to be used, again reading the 8-bit register status bit vector 01011100 from right to left, from LSB to MSB. The 8-bit register status bit vector 01011100 may be latched in an 8-bit wide FF and again, one de-allocated physical register may be allocated each clock cycle. A leading one detector converts all 1s to 0s except the least significant 1, yielding the 8-bit 1 "hot" vector 00000100, which is the decoded binary number that represents the physical register that is to be allocated. This decoded binary number 00000100 is encoded as a 3-bit binary number ID (010) representing the physical register (R.sub.2) that is to be allocated. In general, the encoded value representing the physical register R.sub.i, for I=0, 1, 2, . . . , 7, is given by the 3-bit binary value of (i).
The updated 8-bit register status bit vector 01011000, reflecting the allocation of the physical register R.sub.2 (assuming that no physical registers were de-allocated during the immediately preceding clock cycle), may then be latched in the 8-bit wide FF, and the leading one detector again converts all 1s to 0s except the least significant 1, yielding the 8-bit 1 "hot" vector 00001000, which is the decoded binary number that represents the next physical register (R.sub.3) that is to be allocated. This decoded binary number 00001000 is encoded as the 3-bit binary number ID (011) representing the physical register (R.sub.3) that is to be allocated.
The updated 8-bit register status bit vector 01010001, reflecting the allocation of the physical register R.sub.3, as well as the de-allocation of the physical register R.sub.0 during the immediately preceding clock cycle, may then be latched in the 8-bit wide FF, and the leading one detector again converts all 1s to 0s except the least significant 1, yielding the 8-bit 1 "hot" vector 00000001, which is the decoded binary number that represents the next physical register (R.sub.0) that is to be allocated. This decoded binary number 00000001 is encoded as the 3-bit binary number ID (000) representing the physical register (R.sub.0) that is to be allocated. An updated 8-bit register status bit vector 01010000, reflecting the allocation of the (recently de-allocated) physical register R.sub.0 (assuming that no physical registers were de-allocated during the immediately preceding clock cycle), may then be latched in the 8-bit wide FF, and the leading one detector is again latched for the next cycle, and so forth.
One problem associated with this typical allocation procedure is that, as the size of the physical register pool N increases, it takes longer to find a free physical register. The present invention is directed to avoiding, or at least reducing, this problem and other problems.