The present invention is directed, in general, to processing systems and, more specifically, to a microprocessor having a floating point unit capable of converting numbers between scalar and SIMD values.
The demand for ever-faster computers requires that state-of-the-art microprocessors execute instructions in the minimum amount of time. Microprocessor speeds have been increased in a number of different ways, including increasing the speed of the clock that drives the processor, reducing the number of clock cycles required to perform a given instruction, implementing pipeline architectures, and increasing the efficiency at which internal operations are performed. This last approach usually involves reducing the number of steps required to perform an internal operation.
Efficiency is particularly important in the floating point unit of a microprocessor. In floating point representation, every number may be represented by a significand (or mantissa) field, a sign bit, and an exponent field. Although the size of these fields may vary, the IEEE-754 standard defines the most commonly used floating point notation and forms the basis for floating point units (FPUs) in x86 type processors. The IEEE-754 standard includes a single precision format, a single extended precision format, a double precision format, and a double extended precision format. Single precision format comprises 32 bits: a sign bit, 8 exponent bits, and 23 significand bits. Single extended precision format comprises 44 bits: a sign bit, 11 exponent bits, and 32 significand bits. Double precision format comprises 64 bits: a sign bit, 11 exponent bits, and 52 significand bits. Double extended precision format comprises 80 bits: a sign bit, 15 exponent bits, and 64 significand bits.
It can be advantageous in a load-store implementation of IEEE-754 to represent all numeric values contained in the register file in the floating point unit as properly rounded values in a proprietary internal format with range and precision exceeding the widest. supported IEEE-754 format parameters. One such proprietary format is disclosed in U.S. patent application Ser. No. 09/377,140, entitled xe2x80x9cFormatting Denormal Numbers for Processing in a Pipelined Floating Point Unit,xe2x80x9d which is commonly assigned to the assignee of the present application. The disclosure of Application Serial No. 09/377,140 is hereby incorporated by reference into the present disclosure as if fully set forth herein. The internal proprietary format disclosed in U.S. patent application Ser. No. 09/377,140 comprises 93 bit: a sign bit, 17 exponent bits, and 70 significand bits, and a 5 bit tag field.
In some applications, it may be advantageous to utilize the internal proprietary format to store denormal values from single, double, or any precision with lesser range, as normal values in the extended range provided by this internal format. In xe2x80x9cnormalxe2x80x9d floating point representation, it is assumed that the leading binary digit in the significand is always equal to 1. Since it is known to be equal to 1, the leading binary digit of the significand may, in some floating point representations, be omitted and the exponent value adjusted accordingly. Denormal values are ones which cannot be represented in normalized form (i.e., having the smallest possible exponent with a significand that is non-zero).
If an internal proprietary format is used to store denormal values as normal values in the extended range provided by the internal format and the register file of the FPU is also used to store single instruction-multiple data stream (SIMD) values SIMD implementation of integers (e.g., Intel MMX format) and to store IEEE-754 values (e.g., Intel streaming system extension (SSE) or AMD 3D-Now!), then a problem may occur if SIMD instructions are allowed to reference the results of scalar IEEE-754 instructions directly via the register file, without storing the scalar results to memory. The problem may occur because denormal IEEE-754 results are stored in the register file in normalized form, and that becomes visible to the SIMD instruction stream, which typically occupies the significand portion of the scalar IEEE-754 format(s). The programmer of the SIMD instruction stream (human or compiler) may be unaware of this non-standard but equivalent method of representing denormal numbers as normal within the CPU and therefore may not account for it in the SIMD program. This may produce non-equal and, possibly, incorrect results.
Therefore, there is a need in the art for improved microprocessor designs that are capable of converting denormal number representations between scalar and SIMD formats. More particularly, there is a need in the art for an improved floating point unit that provides an efficient conversion in the register file of denormal numbers between scalar and SIMD formats.
The limitations inherent in the prior art described above are overcome by the present invention which provides an improved pipelined floating point unit. In an advantageous embodiment of the present invention, the pipelined floating point unit comprises: a) a first plurality of pipelined functional units capable of processing operands conforming to a single instruction-multiple data stream (SIMD) instruction set architecture (ISA); b) a second plurality of pipelined functional units capable of processing operands conforming to a scalar instruction set architecture (ISA); and c) a first format fault detection circuit associated with at least one of the first plurality of pipelined functional units capable of determining whether a first operand is a denormal number and, in response to the determination, generating a first fault signal.
According to one embodiment of the present invention, the first fault signal causes a number conversion circuit associated with the pipelined floating point unit to modify a significand and an exponent of at least one operand in a data register associated with the pipelined floating point unit to thereby convert the at least one operand to a denormal number.
According to another embodiment of the present invention, the first format fault detection circuit determines the first operand is denormal by examining a tag field associated with the first operand.
According to still another embodiment of the present invention, the first format fault detection circuit further determines the first operand is denormal by determining if a most significant bit (MSB) of the first operand is set to Logic 1.
According to yet another embodiment of the present invention, the pipelined floating point unit further comprises a second format fault detection circuit associated with at least one of the second plurality of pipelined functional units capable of determining whether a second operand is a denormal number and, in response to the determination, generating a second fault signal.
According to a further embodiment of the present invention, the second fault signal causes a number conversion circuit associated with the pipelined floating point unit to modify a significand and an exponent of at least one operand in a data register associated with the pipelined floating point unit to thereby convert the at least one operand to a denormal number.
According to a still further embodiment of the present invention, the second format fault detection circuit determines the second operand is denormal by examining a tag field associated with the first operand.
According to a yet further embodiment of the present invention, the second format fault detection circuit further determines the second operand is denormal by determining if a most significant bit (MSB) of the second operand is set to zero.
According to another embodiment of the present invention, the second format fault detection circuit is further capable of determining whether the second operand is encoded as a single instruction-multiple data stream (SIMD) number and, in response to the determination, generating the second fault signal.
According to yet another embodiment of the present invention, the second format fault detection circuit determines the second operand is encoded as the SIMD number by examining a tag field associated with the second operand.
According to a further embodiment of the present invention, the second fault signal causes a number conversion circuit associated with the pipelined floating point unit to modify the tag field to indicate that the second operand is a scalar number.
Existing applications making use of both scalar and SIMD floating-point are typically coarse grained (i.e., instructions from each instruction set architecture (ISA) are rarely interleaved). In the case of Intel MMX instructions and 3D-Now! instructions, the use of results from the other instruction set architecture (ISA), scalar or SIMD, is not recommended, although it is implicitly defined by the use of double extended IEEE-754 format for the scalar instruction set architecture and the mapping of the MMX and SSE storage formats onto double extended.
The present invention first requires that the functional units implementing the SIMD ISA be able to identify the scalar IEEE-754 values that represent what would be a denormal number in any of the IEEE-754 formats utilized for that value in memory. In some processor designs provided by National Semiconductor Corporation, this is accomplished through the encoding of a value class as part of the proprietary internal format, wherein one possible encoding is denormal. This has been described in United States Patent Application Serial No. 09/377,140 , previously incorporated by reference into the present disclosure.
The present invention also requires that the MSB of a scalar IEEE-754 be explicit, not implicit, which is the case in the internal format described in U.S. patent application Ser. No. 09/377,140. Thus, the functional units in the floating point unit can distinguish a denormal value stored in external format from a value in internal format. Therefore, a SIMD functional unit can detect and report an internal-to-external fault when it encounters a denormal operand in internal format (MSB set).
This fault can be used to activate an internal-to-external microcode fault handler that walks through the register file converting the values in all registers using an internal-to-external conversion primitive. A xe2x80x9cprimitivexe2x80x9d is one of the plurality of arithmetic operations implemented by a functional unit in the FPU. A primitive can be utilized by microcode on a data value in, for example, one of the instruction set architectural registers of the stack. This conversion primitive performs the identity transformation for all values except denormals in internal format, which are then denormalized by shifting their significands to the right as a function of the exponent value until the resulting exponent is all zeroes. This conversion primitive is very similar to the primitive that performs store conversion from internal format into the widest external format, such as double extended format. Once all registers have been converted by the microcode fault handler, which can be done without branching and in a pipelined fashion, the SIMD ISA is resumed. By performing the conversion of all registers, the fault cannot reoccur in this instance of the SIMD instruction stream.
However, because all registers have been converted, if a subsequent scalar IEEE-754 instruction stream happens to reference these register file values, the register file values will be in the incorrect format, since there was by definition at least one denormal which would now be in external format if not subsequently overwritten. Therefore, in one embodiment of the present invention, the functional units implementing the scalar IEEE-754 ISA may also detect an external-to-internal fault when the functional units encounter a denormal in external format (MSB reset). This fault causes an external-to-internal microcode handler to walk through the register file converting all registers using an external-to-internal conversion primitive. This conversion primitive performs the identity transformation for all values except denormals in external format, which are then normalized by left shifting their significand and incrementing a constant exponent until the MSB is set. This conversion primitive is very similar to the primitive that performs load conversion from the widest external format into internal format. Once all registers have been converted by the microcode handler, pipelined without branches, the scalar ISA is resumed.
At the very limited cost of a fault detection circuit per function unit in the floating point unit (and some additional interconnect between the register file and the load and/or store conversion unit), identical results can be obtained for all instruction streams with only those performing non-recommended data transfers via the register file paying a performance penalty.
The foregoing has outlined rather broadly the features and technical advantages of the present invention so that those skilled in the art may better understand the detailed description of the invention that follows. Additional features and advantages of the invention will be described hereinafter that form the subject of the claims of the invention. Those skilled in the art should appreciate that they may readily use the conception and the specific embodiment disclosed as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. Those skilled in the art should also realize that such equivalent constructions do not depart from the spirit and scope of the invention in its broadest form.
Before undertaking the DETAILED DESCRIPTION OF THE INVENTION, it may be advantageous to set forth definitions of certain words and phrases used throughout this patent document: the terms xe2x80x9cincludexe2x80x9d and xe2x80x9ccomprise,xe2x80x9d as well as derivatives thereof, mean inclusion without limitation; the term xe2x80x9cor,xe2x80x9d is inclusive, meaning and/or; the phrases xe2x80x9cassociated withxe2x80x9d and xe2x80x9cassociated therewith,xe2x80x9d as well as derivatives thereof, may mean to include, be included within, interconnect with, contain, be contained within, connect to or with, couple to or with, be communicable with, cooperate with, interleave, juxtapose, be proximate to, be bound to or with, have, have a property of, or the like; and the term xe2x80x9ccontrollerxe2x80x9d means any device, system or part thereof that controls at least one operation, such a device may be implemented in hardware, firmware or software, or some combination of at least two of the same. It should be noted that the functionality associated with any particular controller may be centralized or distributed, whether locally or remotely. Definitions for certain words and phrases are provided throughout this patent document, those of ordinary skill in the art should understand that in many, if not most instances, such definitions apply to prior, as well as future uses of such defined words and phrases.