1. Field of the Invention
This invention relates to sharing resources on a semiconductor between multiple functional units to reduce the area consumed by register renaming logic and particularly to providing a way to share a CAM mapper between two distinct physical register files.
2. Description of Background
Before our invention, to increase the performance leverage of present-day superscalar-pipelined microprocessors beyond technology scaling, one needs to maximize the concurrency and overlap in instruction processing. Microarchitectural techniques for instruction-level parallelism can be used to achieve increased concurrency in instruction processing. Out-of-order execution and speculative execution are two powerful techniques that are exploited in modern high-performance processors to increase the amount of concurrency. If the operand data is ready and the required execution resources are free, more concurrency in the pipeline and more performance can be achieved by allowing instructions to be executed out of order. However, while the instructions are processed out of order, they are forced to be committed in program order, which preserves the succession in the architectural states of the machine.
In speculative execution, predictions are made about instructions after branches and are allowed to be speculatively processed in parallel with other instructions. This also increases concurrency and improves performance. If the prediction was false, the speculatively executed instructions are flushed and not committed.
However, to apply these microarchitectural techniques, one has to overcome the instruction data-dependence constraints. These artificial dependences are created by reuse of limited architectural register and memory storage. Such false dependences include write after read (WAR) and write after write (WAW). A WAR occurs when an instruction that writes a new value must wait for all preceding instructions to read the old value. A WAW happens when more than one instruction is written to the same register or memory location. Executing such instructions out of order overwrites the value of the register produced by one instruction before it might have been read by a subsequent one. Therefore, these false data dependences must be eliminated before one can make use of out-of-order and speculative executions.
These dependences and the associated ordering constraints would not occur if a different register name were assigned every time an instruction writes a new value. By applying register renaming operations, each destination architectural (logical) register name is mapped into a unique physical register location in the register file. This, in turn, eliminates all of the false dependences. When an instruction is decoded, its destination logical register number is mapped into a physical register location that is not currently assigned to a logical register. The destination logical register is said to be renamed to the designated physical register. The assigned physical register is therefore removed from the list of free physical registers. All subsequent references to that destination register will point to the same physical register until another instruction that writes to the same logical register is decoded. At that time, the logical register is renamed to a different physical location selected from the free list, and the map is updated to enter the new logical-to-physical mapping.
The physical register of old mappings is returned to the free list to be reused once their values are no longer needed. At the same time, the renaming also provides a mapping table to look up the physical registers assigned to the source logical registers of the instruction. The source operand values are read from these physical locations in the register file. If the free list does not have enough registers, the instruction dispatch is suspended until the needed registers become available. A shadow copy of the register state can also be kept in the register mapper. When an instruction flush occurs, the shadow map is used to restore the register state prior to the flush point so that the machine can resume execution. Thus, it is clear that to facilitate the application of out-of-order and speculative executions to gain machine performance, a register renaming function must be implemented.
One such unit for performing rapid renaming and search is the content-addressable memory (CAM). Such a content-addressable memory (CAM) compares input search data against a table of stored data, and returns the address of the matching data. CAMs have a single clock cycle throughput making them faster than other hardware and software based area search systems. CAMs can be used in a wide variety of applications requiring high search speeds.
However, the speed of a CAM comes at the cost of increased silicon area and power consumption, two design parameters that designers strive to reduce. As CAM applications grow, demanding larger CAM sizes, the power problem is further exacerbated. Reducing size, per chip CAM count, power consumption, without sacrificing processing efficiency, speed, and area is of great need in the industry.
Recent designs have opted to provide larger register rename pools to more aggressively exploit out-of-order execution. As a result the register rename logic has grown in both area and power, constituting a big fraction of resource usage today. Thus, reducing the overall area and power devoted to register renaming without sacrificing performance is of great need in the industry.