The present invention relates to digital computer systems, and more particularly, but not by way of limitation, to methods and apparatus for executing instructions in such systems.
In xc3x9786 computer systems, the floating point unit (FPU) comprises a plurality of data registers. Floating point instructions treat this plurality of data registers as a register stack. All addressing of the data registers is relative to the register on the top of the stack. The register number of the current top-of-stack register is stored in a stack TOP field. Thus, load operations decrement TOP by one and load a value into the new top-of-stack register, while store operations store the value from the current top-of-stack register in memory and then increment TOP by one.
Many floating point instructions, however, only operate on the top one or two registers of a register stack. Thus, if the desired information is located in, e.g., the fourth stack register, one or more operations must be performed before the information in the fourth stack register can be moved into the top register of the stack where it can be operated upon. This creates a xe2x80x9cbottleneckxe2x80x9d in the stack. To this end, the floating point exchange register contents instruction (FXCH) is used in the IA-32 computer architecture to exchange the floating point information in a selected stack register with that in the top register of the stack. For example, the instruction
FXCH ST(0), ST(i)
will exchange the information in the top register in the stack (denoted ST(0)) with the ith register in the stack (denoted ST(i)). In this way, the bottleneck in the register stack can be alleviated by putting desired information at the top of the stack, where it can then be operated upon by most floating point instructions. More information regarding the FXCH instruction may be found in the Intel Architecture Software Developer""s Manual, Volumes 1-3, which are hereby incorporated by reference.
In many computer architectures, instructions, such as the FXCH, must be executed by emulation because the native hardware that supports such an instruction is not present. One way of emulating the FXCH instruction in such architectures is through a technique called register renaming. In register renaming, the physical registers in question (e.g., ST(0) and ST(i)) are mapped into a stack register map. To exchange the contents of the two physical registers, the pointers that map the physical registers into the stack register map are changed or xe2x80x9cre-pointedxe2x80x9d from their original register to the other register, and thus the operation is performed. But, at least one problem with register renaming is that it requires that the pointers be stored in additional hardware which adds to the cost and complexity of the system as well as consuming valuable space.
Another way of emulating the FXCH is to sequentially execute at least three micro-code instructions as follows:
move temp:=ST(0);
move ST(0):=ST(i);
move ST(i):=temp;
This is the traditional method of exchanging the contents of the register. This sequence of instructions uses a temporary register to switch the contents of the top register ST(0) and the ith register ST(i). This method of emulation, with its three micro-code instructions, consumes three times as many clock cycles as the single FXCH instruction and, in some cases, may consume even more, depending upon the latency associated with the move operations. Thus, there exists a need for methods and apparatus for emulating the FXCH instruction without adding excess hardware and that consumes relatively few clock cycles. More generally, there exists a need for methods and apparatus for exchanging the contents of two registers in a relatively quick and efficient manner.
In one embodiment of the present invention, there is a processor based computer system having dependency checking logic and a register stack, wherein the system overrides the dependency logic such that move instructions associated with the stack registers may be executed in parallel. In another embodiment, the system operates such that it can be determined whether a stack underflow exception has occurred and if it has, the move instructions can be flushed, and a micro-code handler algorithm invoked that operates to allow execution of the move instructions in parallel without a stack underflow exception.