1. Technical Field
The present invention relates to microprocessors, and more particularly to the efficient utilization of rename buffers in a superscalar processor.
2. Description of Related Art
Microprocessors have been made faster and more powerful through the use of the reduced instruction set computer (RISC) processor. Further advances in the field of RISC processors have led to the development of superscalar processors. Such processors allow speculative execution, out-of-order instruction execution, and dispatching of instructions beyond dependent instructions. To support such speculative and out-of-order operations in superscalar processors, rename buffers have been utilized. A rename buffer allows a dispatch unit to rename memory buffers so that a location to which execution units temporarily cannot write results can be assigned rename value locations for an operand/result. The rename buffers are limited in number, causing decreased performance when all of the rename buffers are busy but not all of the execution units in the processor are busy. To help improve performance during times when all of the rename buffers are busy, a method for using a virtual rename buffer has been disclosed. A virtual rename buffer, as its name implies, is not actually a physical buffer. Rather, it is simply an address that is assigned to an instruction so that the instruction can be dispatched to the appropriate execution unit. Thus, the instruction can be operated upon but cannot be finished until an actual or physical rename buffer becomes available. This saves time by allowing part of the execution of the instruction to be accomplished while waiting on a physical buffer to open up.
Patel, et. al, (U.S. Pat. No. 5,758,117) provides a method and system for reducing dispatch stalls and for efficiently utilizing rename buffers in a superscalar processor. The method includes tracking allocation and deallocation of real rename buffers for instructions dispatched by a dispatch unit, and providing at least one virtual rename buffer for allocation of an instruction when the real rename buffers have been allocated. The method further includes tagging the instruction allocated to the at least one virtual rename buffer with a rename buffer busy signal, wherein the rename buffer busy signal indicates to an execution unit of the processor that the instruction cannot be completed.
The system disclosed by Patel, et. al, includes a plurality of rename buffers, a dispatch unit coupled to the plurality of rename buffers, and an allocation/deallocation table coupled to the dispatch unit and the plurality of rename buffers. Further, the table includes a plurality of real rename buffer slots and at least one virtual rename buffer slot. Additionally, a rename busy signal is provided via the table for an instruction allocated to the at least one virtual rename buffer slot.
Greater efficiency results from effectively controlling the use of virtual rename buffers in conjunction with real rename buffers. The virtual rename buffers allow dispatches to execution units to continue even after all of the real rename buffers have been allocated. Thus, processor performance is improved by reducing the number of stalls in a dispatch unit due to a lack of real rename buffers.
Detecting the wrapping of a multiple slotted resource is often required in microprocessor designs, particularly in buffer renaming. Virtual renaming will likely become more important in microprocessor designs as the number of rename buffers increases due to the increase of superscalar processors and the increase of execution pipe latencies to obtain higher frequencies in processors.
As an example of a virtual rename scheme that has previously been disclosed, consider FIG. 1. An instruction 100 can be dispatched to superscalar units based on a 32 buffer virtual rename space 110 while implementing only 16 physical rename buffers 120. The dispatched instruction""s sources 130 are mapped to the rename buffer 140 allocated for the instruction producing the previous result, assuming that the instruction 100 is dependent upon a previous instruction. The target 150 is allocated a unique rename buffer 160 from the 32 virtual rename space. These results are saved in an instruction queue 170 and the instruction 100 can be issued. However, the instruction 100 cannot be issued to the execution unit from the queue 170 until one of the physical buffers 120 associated with the unique rename buffer 160 is free.
One method of mapping this scheme is to divide the 32 virtual rename space 110 into an upper portion 180 and lower portion 190 and to overlay the 16 physical rename buffers 120 over the upper portion 180 and the lower portion 190 of the 32 virtual rename space. Thus, physical buffer xe2x80x980xe2x80x99 120a is mapped onto virtual buffers xe2x80x980xe2x80x99 190a and xe2x80x9816xe2x80x99 180a; physical buffer xe2x80x981xe2x80x99 120b is mapped to virtual buffers xe2x80x981xe2x80x99 190b and xe2x80x9817xe2x80x99 180b, and so on. Using this map, the instruction allocated to virtual buffer xe2x80x9816xe2x80x99 180a cannot issue until the instruction allocated to virtual buffer xe2x80x980xe2x80x99 190a is completed, thereby freeing physical buffer xe2x80x980xe2x80x99 120a. Determining whether or not the physical buffer associated with the allocated virtual buffer is free requires wrap detection. For a superscalar processor in which instruction queues may issue to multiple units in a speculative or out-of-order fashion, the determination of whether or not to issue instructions to the execution unit becomes a critical path in the machine, even more so as cycle times become more aggressive. One solution is to add a cycle to the issue determination to alleviate the critical path.
The logic used to implement this solution is shown in FIG. 2. The encoded address of each of the 32 virtual rename buffers 110 includes one high order bit in addition to the encoded address of the corresponding physical buffer. Similarly, each of the 16 physical buffers 120 have a xe2x80x9cvirtual bitxe2x80x9d associated with it to indicate whether the buffer is associated with the upper portion 180 or the lower portion 190 of the virtual rename space 110. For example, before any of the buffers are allocated, each of these 16 virtual bits would contain a logic xe2x80x980xe2x80x99 to indicate that the physical buffer is currently mapped to the lower portion 190 of the virtual rename space 110. That is, the virtual bit associated with physical buffer xe2x80x980xe2x80x99 120a would indicate that the buffer is currently associated with virtual buffer xe2x80x980xe2x80x99 190a; the virtual bit associated with physical buffer xe2x80x981xe2x80x99 120b would indicate that the buffer is currently associated with virtual buffer xe2x80x981xe2x80x99 190b, and so on. Once the contents of a physical buffer are written into the architected buffer file, the virtual bit associated with that physical buffer is toggled to indicate that the physical buffer now maps to the opposite half of the virtual rename space. Thus, when physical buffer xe2x80x980xe2x80x99 with a virtual bit value of xe2x80x980xe2x80x99 120a writes its contents to the architected buffer file, the virtual bit is toggled from xe2x80x980xe2x80x99 to xe2x80x981xe2x80x99 to indicate that physical buffer xe2x80x980xe2x80x99 120a is now mapped to virtual rename buffer 16180a. 
Referring now to FIG. 2, the four lower order bits of the target buffer pointer 200 are input to a 4-to-16 decoder 210. The 16 orthogonal signals 215 are connected to the select inputs of the 16-to-1 multiplexer 220. The 16 virtual bits 225 associated with the 16 physical rename buffers 120 are connected to the input of the multiplexer 220 such that the multiplexer uses the 16 orthogonal signals 215 to select the virtual bit corresponding to the physical buffer mapping to the target buffer pointer 200. The virtual bit 235 that is selected by the multiplexer 220 is compared with the higher order bit 240 of the target buffer pointer 200 using an exclusive or gate 245. If the virtual bit 235 and the higher order bit 240 match, then the exclusive or gate 245 will output a xe2x80x980xe2x80x99 indicating that the instruction may issue to the execution unit; else if the signals do not match, then the output will be a xe2x80x981xe2x80x99 indicating that the instruction is not allowed to issue.
The problem with the solution illustrated in FIG. 2 is that the capacitive load associated with the decode logic requires repowering to drive the load. Therefore, additional stages are needed to determine issue. As the virtual rename space becomes larger, the capacitive load grows exponentially. The latch bits may be replicated to avoid the repower stages, but this increases the chip area and loads critical signals in the dispatch cycle. Furthermore, with this implementation, the multiplexer requires an input equal in size to the number of physical rename buffers to determine whether to issue. This increases both the wiring area and the power dissipation as the number of rename buffers is increased. Thus, a small, fast, and scalable method of wrap detection is needed to increase the utility of virtual renaming as renaming schemes become larger.
Accordingly, the present invention meets the need for a small, fast, and scalable method of wrap detection. The method reduces the capacitive load on the virtual rename address, thereby avoiding delay and costly repowering stages. The method allows for a small wiring area while allowing pre-existing logic to be utilized.
Two pointers are used to manage buffer renaming: the allocation pointer which points to the next virtual rename buffer to be allocated to a target, and the completion pointer which points to the next physical buffer position to free as instructions complete. In addition, a target buffer pointer is used to capture the value of the allocation pointer before it is incremented in preparation for allocation of the next rename buffer. When a rename buffer is allocated to an instruction""s target, the current value of the allocation pointer is assigned to the instruction as a target and is placed with the instruction into an instruction queue as the target buffer pointer. The allocation pointer is then incremented by modular the size of the virtual buffer to be ready for allocation to a subsequent instruction""s target. When instructions complete, the completion pointer is incremented by modular the size of the virtual buffer. A virtual bit is associated with each of the physical rename buffers. The lower order bits of the target buffer pointer are compared to the lower order bits of the completion pointer to determine if the target buffer pointer is greater than or equal to the completion pointer. The most significant bit of the target buffer pointer indicates whether the virtual rename buffer is in the upper half or lower half of the virtual rename space. If the lower order bits of the target buffer pointer are greater than or equal to the lower order bits of the completion pointer and the highest physical buffer is associated with the same half of the virtual rename space as the target buffer pointer, then the instruction may issue. Else, if the lower order bits of the target buffer pointer are smaller than the lower order bits of the completion pointer then the instruction may not issue unless the highest physical buffer of the physical rename space is associated with the opposite half of the virtual rename space as the target buffer pointer. Such a logical implementation avoids the need to sample the virtual bit of the desired physical buffer, that is the physical buffer corresponding to the virtual rename buffer allocated to a particular instruction.
The above steps are the logical steps used by the wrap detector to determine if an instruction can issue in the virtual renaming scheme. As with most logic, the logic can be tuned to perform the steps in order of the availability of signals.