1. Field of the Invention
The present invention generally relates to a method and apparatus for saving and restoring the state of various registers in a microprocessor. More particularly, a system is provided which allows the state of a floating point execution unit to be restored without requiring all of the data to be restored from memory.
2. Description of Related Art
Based upon consumer demands for increased multimedia capabilities and functionality, the computer industry has responded with enhancements and new technology that will aid in the processing of multimedia computer software applications. One example is the multimedia extensions (MMX) to the Intel microprocessor architecture. These MMX instructions provide capabilities that will allow software vendors to create applications with enhanced multimedia functions.
The architecture of Intel microprocessors is such that MMX instructions use the floating point unit (FPU) registers for instruction computation. An MMX instructions once executed shall write over the previous floating point state. Thus, the previous floating point state cannot be preserved without first saving such FP state to memory. The previous floating point state can be restored by loading the FP information from memory, and the floating point instructions can continue execution from the FP state that is remaining. The Intel MMX architecture requires empty out (by executing an EMMS instruction) the state that the MMX instructions accumulate before the floating point instructions are allowed to be executed. Otherwise, a floating point exception will be generated. In other words, the MMX state must be erased and the floating point state must re reset before the floating point instructions can be executed without causing a floating point exception. Thus, the MMX state cannot be preserved without first being saved into memory, before execution of the EMMS instruction. The previous MMX state can be restored by loading the information from memory, and the MMX instruction can continue execution from the MMX state that is remaining. However, state restoration from memory will take a large number of CPU cycles and is very costly in terms of system performance.
The term "context switch" as used herein will refer to the state saving and restoring process. It should be noted that the floating point/MMX state does not have to be saved and then restored for certain types of instructions to execute. The present invention may have less utility in these cases where the loss of the state information does not affect the correctness of the execution of the code. The present invention is a technique that rapidly saves and restores the state of the floating point/MMX unit if it is determined by the programmer that the floating point/MMX state must be preserved.
The FPU circuitry within Intel x86 architecture microprocessors provide the user with an FPU data register file, having eight, 80 bit FPU data registers, which are accessed in a stack-like manner. The floating point registers are visible, and available for use by the programmer. The Intel architecture also provides a 16 bit control register and 16 bit status register. A data register tag word is also provided that contains eight, 2 bit fields, each associated with one of the eight data registers. The tag word is used to improve context switching and stack performance by maintaining empty/non-empty status for each of the eight data registers.
Further, the Intel architecture contains an instruction pointer to the memory location containing the instruction word and a data pointer to the memory location containing the operand associated with the current instruction (if any). Also, the last instruction opcode is stored in an eleven bit register. The aforementioned control register, status register, tag word, instruction pointer, data pointer and opcode define the floating point environment (ENV). This environment in combination with the floating point registers (REG) constitute the floating point state.
As noted above, when Intel architecture microprocessors execute MMX instructions the FPU registers are utilized for instruction computation. Thus, 64 bits of the 80 bit FPU registers will be utilized by MMX instructions. When a task switch from floating point operation to MMX, the environment and register state is saved using a floating point save (FNSAVE or FSAVE) instruction. A floating point store environment instruction (FNSTENV or FSTENV) will save the environment to memory. The FNSAVE instruction stores the floating point register state to main memory. The FNSTENV instruction stores the floating point environment to main memory. The floating point load environment (FLDENV) or floating point restore (FRSTOR) instructions are then executed to load the environment, or environment and registers, respectively, from main memory when floating point operations are to be resumed from the remaining state information. This context switching, due to the FLDENV and FRSTOR instructions, can cost as much as 32 to 95 CPU cycles, while the switching time due to instructions FNSTENV and FNSAVE can cost as much as 48 to 151 cycles. Thus, it can be seen that conventional techniques of saving and restoring the floating point context may take from 80 to 246 clock cycles to save the floating point state to main memory, and then restore the registers to their previous state, when switching between floating point and multimedia operations.
Usually, the FNSAVE and FRSTOR instructions are used in conjunction with one another, as are the FNSTENV and FLDENV instructions. It should be noted that FNSAVE, FRSTOR, FNSTENV and FLDENV instructions have been used by the Intel instruction set long before the introduction of MMX to the computer industry in 1996. These instructions become important to the execution of MMX instructions, since the MMX code shares floating point register usage with the Intel floating point instructions.
Therefore, a need exists for a technique that can quickly and efficiently save and restore the state of the floating point register file, when switching between floating point and multimedia operations, using a minimum number of clock cycles.