1. Field of the Invention
This invention relates generally to the field of microprocessors and, more particularly, to the issuing of instructions and the handling of register stacks within floating point units.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term "instruction processing pipeline" is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Generally speaking, a pipeline comprises a number of stages at which portions of a particular task are performed. Different stages may simultaneously operate upon different items, thereby increasing overall throughput. Although the instruction processing pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
Due to the widespread acceptance of the x86 family of microprocessors, efforts have been undertaken by microprocessor manufacturers to develop superscalar microprocessors which execute x86 instructions. Such superscalar microprocessors achieve relatively high performance characteristics while advantageously maintaining backwards compatibility with the vast amount of existing software developed for previous microprocessor generations such as the 8086, 80286, 80386, and 80486.
Microprocessors compatible with the x86 instruction set are configured to operate upon various data types in response to various instructions. For example, certain x86 instructions are defined to operate upon an integer data type. Another data type employed in x86 compatible microprocessors is the floating point data type. Floating point numbers are represented by a significand and an exponent. The base for the floating point number is raised to the power of the exponent and multiplied by the significand to arrive at the number represented. In x86 compatible microprocessors base 2 is used. The significand comprises a number of bits used to represent the most significant digits of the number. Typically, the significand comprises one bit to the left of the decimal, and the remaining bits to the right of the decimal. The bit to the left of the decimal, known as the integer bit, is typically not explicitly stored. Instead, it is implied in the format of the number. Additional information regarding the floating point numbers and operations performed thereon may be obtained in the Institute of Electrical and Electronic Engineers (IEEE) standard 754.
Floating point numbers can represent numbers within a much larger range than can integer numbers. For example, a 32 bit signed integer can represent the integers between 2.sup.31 -1 and -2.sup.31, when two's complement format is used. A single precision floating point number as defined by IEEE 754 comprises 32 bits (a one bit sign, 8 bit biased exponent, and 24 bits of significand) and has a range from 2.sup.-126 to 2.sup.127 in both positive and negative numbers. A double precision (64 bit) floating point value has a range from 2.sup.-1022 and 2.sup.1023 in both positive and negative numbers. Finally, an extended precision (80-bit) floating point number (in which the integer bit is explicitly stored) has a range from 2.sup.-16382 to 2.sup.16383 in both positive and negative numbers.
Floating point data types and floating point instructions produce challenges for the x86 compatible microprocessor designer. For example, the eight data registers of an x86 compatible floating point unit (FPU) are configured to store values up to 80 bits in length, while x86 integer registers store values that are 32 bits or less. Furthermore, the FPU data registers are configured to operate as a stack, i.e., FPU instructions address the FPU data registers relative to the register on the top of the stack. The top of stack (TOS) is stored as a pointer in the status register.
Because the FPU must accommodate 80-bit floating point operands, designing the FPU to efficiently manipulate the register stack is difficult. For example, the exchange registers instruction (FXCH) swaps the contents of the destination register and the TOS register. Typically this instruction involves three steps: (1) the contents of the TOS registers are copied to a temporary storage register, (2) the contents of the destination register are copied to the TOS register, and (3) the contents of the temporary storage register are copied into the destination register.
The additional data paths, temporary storage registers, and control circuitry for instructions that manipulate the register stack increase the size of the microprocessor, particularly in light of the large size of the operands (up to 80 bits). This in turn disadvantageously reduces the maximum clock rate at which the microprocessor can operate, increases the power dissipation of the microprocessor, and reduces the yield in manufacturing the microprocessor. Therefore, a more efficient mechanism for handling register stack manipulations in floating point units is desired.
Furthermore, overall demand on floating point units has continued to increase as application programs have increasingly incorporated more graphics and multimedia routines. MMX (multimedia extension) instructions have been added to the x86 instruction set to increase multimedia performance. However, these instructions are typically performed within the floating point unit, thereby increasing the need for a higher throughput of instructions through the combined floating point/MMX unit. Thus a mechanism for increasing the number of instructions executed per clock cycle in a floating point unit is also desired.
Register addressing within floating point units that execute My instructions is further complicated because x86 floating point instructions use stack relative addressing to access the FPU registers, while MMX instructions typically use absolute (non-stack-relative) addressing to access the FPU registers. Thus, FPU and NMX instructions can have one of six effects on the top of stack: 1) push (decreases the top of stack by one); 2) pop (increases the top of stack by one); 3) double pop (increases the top of stack by two); 4) exchange (which switches the top of stack register with another register); 5) reset (resets the top of stack); or 6) no change. Thus an efficient mechanism for dealing with both stack-relative and non-stack-relative register addressing in a floating point unit is desired.