1. Field of the Invention
The present invention generally relates to a method and apparatus for saving and restoring the state of various registers in a microprocessor. More particularly; a system is provided which allows the state of a floating point execution unit to be restored without requiring all of the data and instructions to be retrieved from main memory.
2. Description of Related Art
Based upon consumer demands for increased multimedia capabilities and functionality, the computer industry has responded with enhancements and new technology that will aid in the processing of multimedia computer software applications. One example is the multimedia extensions (MMX) to the Intel microprocessor architecture. These MIMX instructions provide capabilities that will allow software vendors to create applications with enhanced multimedia functions.
The architecture of Intel microprocessors is such that MMX instructions use the floating point unit (FPU) registers for instruction computation. The FPU circuitry within Intel x86 architecture microprocessors provide the user with an FPU data register file, having eight, 80 bit FPU data registers, which are accessed in a stack-like manner, i.e. the data is sequentially accessed from the top of the register file. The floating point registers are visible, and available for use by the programmer. The Intel architecture also provides a 16 bit control register and 16 bit status register. A data register tag word is also included that contains eight, 2 bit fields, each associated with one of the eight data registers. The tag word is used to improve context switching and stack performance by maintaining empty/non-empty status for each of the eight data registers.
It should be noted that context, or task, switching is controlled by the operating system (OS), such as OS/2, Windows 95, Windows NT, or the like. When a context switch is desired, the operating system generates a trap which will be received by a trap handler. The trap handler then saves the state of the previous context by causing FSAVE, FSTENV, or the like to be executed. It should also be noted that task switching can occur within a single context. For example, a switch can occur between different tasks in the floating point context. Similarly, task switching can also occur entirely within the MMX context.
Further, the Intel architecture contains an instruction pointer to the memory location containing the last floating point instruction word and data pointer to the memory location containing the operand associated with the last floating point instruction (if any).
As noted above, when Intel architecture microprocessors execute MMX instructions the FPU registers are utilized for instruction computation. Thus, 64 bits of the 80 bit FPU registers will be utilized by MMX instructions. When, for example, a task switch from a floating point operation to MMX operations occurs the OS trap handler will cause the register state to be saved using the floating point save (FSAVE) instruction. The FSAVE instruction stores the register state (whether floating point or MMX) to main memory. Execution of the FSAVE instruction by the microprocessor may take from 53 to 155 CPU clock cycles. The number of clock cycles is dependent upon the mode in which the microprocessor is operating, e.g. 16 bit, 32 bit, real mode, protected mode, or the like. Then, when a task switch back to floating point operations is desired, the operating system may use the floating point restore (FRSTOR) instruction to restore the floating point registers from main memory to the state they were in when the FSAVE instruction was executed. The FRSTOR execution may take from 75 to 95 CPU clock cycles, depending on the mode of the microprocessor. Thus, it can be seen that conventional techniques may take from, 128 to 250 clock cycles in order to save the state of the floating point registers to main memory and then restore the registers to their previous state, when switching between floating point and MMX operations.
Therefore, a need exists for a technique that can quickly and efficiently save and restore the state of the floating point register file when switching between floating point and MMX operations or between different tasks in the same context, using a minimum number of clock cycles.
In contrast to the prior art, the present invention provides a system and method that reduces the latency associated with execution of FSAVE and FRSTOR instructions when switching tasks between floating point and MMX operations, or between specific tasks within the floating point/MMX contexts.
Broadly, the present invention reduces the latency associated with saving and restoring the state of the floating point registers in a microprocessor when switching tasks between floating point and MMX operations, or between tasks within the same context. The present invention maintains a secondary register file along with the primary floating point register file in the CPU. The primary register file will keep the state of the floating point task xe2x80x9cas isxe2x80x9d upon the occurrence of a task switch to MMX, or another context. The address of the area where the FPU state is saved is maintained in a save area address register. The secondary register file is then utilized by the other context to store intermediate results of executed instructions. In the majority of cases when a context switch back to floating point operations occurs, the previous state is restored from the primary register file without incurring the latency of retrieving the instructions and data from the memory subsystem. In addition to the secondary register file, a snooping mechanism will use the address of the state save area to determine if the state save area was modified. If the state save area is modified, then the floating point state must be restored from the memory subsystem in a conventional manner. However, the floating point save area will seldom be modified and the penalty for maintaining the floating point state in the CPU is negligible. Further, the present invention will allow the microprocessor to operate in a compatible manner with current operating systems and application software.
It is expected that the present invention will reduce the latency associated with the execution of the FSAVE and FRSTOR instructions to approximately 3-4 cycles.
Therefore, in accordance with the previous summary, objects, features and advantages of the present invention will become apparent to one skilled in the art from the subsequent description and the appended claims taken in conjunction with the accompanying drawings.