1. Field of the Invention
This invention relates generally to the field of microprocessors and, more particularly, to multipliers that are usable to perform floating point calculations in microprocessors.
2. Description of the Related Art
Microprocessors are typically designed with a number of xe2x80x9cexecution unitsxe2x80x9d that are each optimized to perform a particular set of functions or instructions. For example, one or more execution units within a microprocessor may be optimized to perform memory accesses, i.e., load and store operations. Other execution units may be optimized to perform general arithmetic and logic functions, e.g., shifts and compares. Many microprocessors also have specialized execution units configured to perform more complex floating-point arithmetic operations including multiplication and reciprocal operations. These specialized execution units typically comprise hardware that is optimized to perform one or more floating-point arithmetic functions.
Most microprocessors must support multiple data types. For example, x86 compatible microprocessors must execute instructions that are defined to operate upon an integer data type and instructions that are defined to operate upon floating-point data types. Floating-point data can represent numbers within a much larger range than integer data. For example, a 32-bit signed integer can represent the integers between xe2x88x922331 and 231xe2x88x921 (using two""s complement format). In contrast, a 32-bit (xe2x80x9csingle precisionxe2x80x9d) floating-point number as defined by the Institute of Electrical and Electronic Engineers (IEEE) Standard 754 has a range (in normalized format) from 2xe2x88x92126 to 2127xc3x97(2xe2x88x922xe2x88x9223) in both positive and negative numbers.
Turning now to FIG. 1A, an exemplary format for an 8-bit integer 100 is shown. As illustrated in the figure, negative integers are represented using the two""s complement format 104. To negate an integer, all bits are inverted to obtain the one""s complement format 102. A constant of one is then added to the least significant bit (LSB).
Turning now to FIG. 1B, an exemplary format for a 32-bit (single precision) floating-point number 106 is shown. A floating-point number is represented by a significand, an exponent and a sign bit. The base for the floating-point number is raised to the power of the exponent and multiplied by the significand to arrive at the number represented. In microprocessors, base 2 is typically used. The significand comprises a number of bits used to represent the most significant digits of the number. Typically, the significand comprises one bit to the left of the radix point and the remaining bits to the right of the radix point. In order to save space, the bit to the left of the radix point, known as the integer bit, is not explicitly stored. Instead, it is implied in the format of the number. Additional information regarding floating-point numbers and operations performed thereon may be obtained in IEEE Standard 754 (IEEE-754). Unlike the integer representation, two""s complement format is not typically used in the floating-point representation. Instead, sign and magnitude form are used. Thus, only the sign bit is changed when converting from positive value 106 to negative value 108.
In the x86 architecture, the floating point format supports a number of special cases. These special cases may appear in one or more operands or one or more results for a particular instruction. FIG. 2 illustrates the sign, exponent, and significand formats of special and exceptional cases that are included in the IEEE-754 floating-point standard. The special and exceptional cases shown in FIG. 2 include a zero value, an infinity value, NaN (not-a-number) values, and a denormal value. An xe2x80x98xxe2x80x99 in FIG. 2 represents a value that can be either one or zero. NaN values may include a QNaN (quiet not-a-number) value and a SNaN (signaling not-a-number) value as defined by a particular architecture. The numbers depicted in FIG. 2 are shown in base 2 format as indicated by the subscript 2 following each number. As shown, a number with all zeros in its exponent and significand represents a zero value in the IEEE-754 floating-point standard. A number with all ones in its exponent, a one in the most significant bit of its significand, and zeros in the remaining bits of its significant represents an infinity value. The remaining special and exceptional cases are depicted similarly.
Given the substantial differences in floating point and integer formats, microprocessor designers have typically used two sets of execution units, i.e., one set optimized to perform arithmetic on integer instructions and one set optimized to perform arithmetic on floating point instructions. Unfortunately, this approach has some potential drawbacks. Die space on a microprocessor is a relatively scarce commodity, and the die space required to implement complex execution units such as multipliers is significant. Thus, duplicating multipliers for both integer and floating point formats consumes precious real estate that could be used to implement additional functionality.
The recent addition of three-dimensional graphics instructions (e.g., AMD""s 3DNow(trademark) instructions) to the standard x86 instruction has further complicated matters by increasing the performance demands on the microprocessor""s arithmatic execution units (and multiplier execution units in particular). As those skilled in the art will appreciate, 3DNow(trademark) instructions are so-called SIME (single instruction, multiple data) instructions that have operands that include multiple floating point values packed together.
As a result, a method for executing arithmetic instructions with different instruction and data formats is needed. In particular, a method for executing multiply instructions having different data types without dramatically increasing the die space used is desired.
The problems outlined above are in large part solved by the multiplier and method for performing multiplication described herein. In one embodiment, a single multiplier may be configured to perform scalar floating point multiplication and packed floating point multiplication, e.g., single-instruction multiple-data (SIMD) multiplication.
The multiplier may include selection logic that is configured to select a multiplier operand and a multiplicand operand from among a plurality of different potential sources, wherein the potential sources may include one or more of the following: a floating point operand, a packed floating point operand, or the result of a previous iterative multiplication instruction. For example, the multiplier may perform scalar 90-bit format floating point multiplication (i.e., single, double, extended, or internal precision), packed 90-bit format (i.e., 2 packed 32-bit floating point values); or the results of a previous multiplication instruction (i.e., via an internal bypass mechanism for instructions such as iterative multiplication operations). For example, some instructions such as reciprocal instructions (used to perform division) and square root instructions may be implemented using iterative algorithms that perform a number of different multiplication operations before obtaining the correct result. In these situations it may particularly advantageous for a multiplier to have the capability of internally bypassing the results of a previous multiplication operation directly to the selection logic for another multiplication operation.
In some embodiments, square root and divide instructions may be translated into a series of special multiplication opcodes, wherein each multiplication opcode is configured to perform a particular function in addition to standard multiplication. For example, one of the iterative special multiplication instructions may be a xe2x80x9cBACKMULxe2x80x9d instruction that accepts three source operands, A, B, and Q, and calculates the value Bxc2x7Qxe2x88x92A.
In some embodiments, the multiplier may be further configured to detect divide operations having a multiplier that is exactly a power of two. The multiplier may be configured to execute these divide operations without proceeding through the entire iterative division process. Instead, the multiplier may be configured to shift the exponent and round the significand to the appropriate precision without significant additional hardware.
In some embodiments, the multiplier may be further configured to perform independent multiplication instructions during the idle clock cycles that may occur during complex iterative instructions such as square root and non-power of two divides.