1. Field of the Invention
The invention relates to handling of exchange (XCHG) instructions, and more particularly to accelerating XCHG instructions in a processor by a content addressable memory implementation.
2. Background Information
Many instruction set architectures (ISAs) contain an XCHG instruction. An XCHG instruction exchanges data contents between two registers, i.e. a source register and a destination register. In architectures that use a stack-based register file, such as the IA-32 (Intel(copyright) Architecture) floating point instruction set, the XCHG instruction is used frequently. In this case, compilers use the XCHG instruction to move data from a given register to the top-of-stack (TOS) position. Once moved, the data is used in a subsequent operation. This is done because many of the instructions implicitly reference the TOS register. Therefore, it is necessary for the data to be re-located to the TOS register before the operation on that data can proceed.
The basic method of executing the XCHG instruction is to read both registers from the register file (RF), and then write back each data contained in the original registers to the alternate register from where it was read. For example, if register 0 (r0) contained data value A, and register 3 (r3) contained data value B, then the instruction XCHG r0,r3 would place data value B in r0 and data value A in r3. Any subsequent instructions that were necessary to access either r0 or r3 would need to stall at dispatch until the XCHG instruction had completed execution.
An improvement in the performance of most type of instructions including XCHG instructions was realized through the concept of register renaming. Register renaming maps the logical registers of each instruction onto a larger set of physical registers. The unit that performs the logical to physical mapping is commonly referred to as the register alias table (RAT). The destinations of the XCHG instruction can be mapped to different physical registers (an example of nomenclature would be physical register #78 or p78) other than the sources. Therefore, the need for dispatching stalls is eliminated. FIG. 1 illustrates an example of a RAT and an RF structure before and after a XCHG r0,r3 instruction.
Other optimizations include attempts to re-map the renamed registers without physically moving the data. One example requires one or more additional pipeline stages to accomplish re-mapping. This mitigates performance gain achieved by eliminating a data transfer. Another example is to swap the contents of the RAT entries corresponding to the logical registers of the XCHG instruction. This example can be very expensive in terms of implementation. This is due to the RAT entry, not only containing the physical register number, but often containing several status fields related to the logical register and sometimes even embedded logic, such as tag comparators.