1. Field of the Invention
This invention relates in general to the field of data processing in computers, and more particularly to an apparatus and method for calculating the square root of floating point operands.
2. Description of the Related Art
Software programs that execute on a microprocessor consist of macro instructions that together direct the microprocessor to perform a function. Each macro instruction directs the microprocessor to perform a specific operation that is part of the function such as loading data from memory, storing data in a register, or adding the contents of two registers.
A macro instruction may prescribe a simple operation, such as moving the contents of one register location to another register location. In contrast, it may prescribe a complex operation, such as deriving the cosine of a floating point number. Compared to the manipulation of integer data, the manipulation of floating point data by the microprocessor is complex and time consuming. Movement of integer data requires only a few cycles of a microprocessor clock; derivation of a cosine requires hundreds of machine cycles. Because floating point operations are basically more complex than integer operations, conventional microprocessors employ a dedicated floating point unit to improve the speed and efficiency of floating point calculations. The dedicated floating point unit may be part of the same mechanical package as the remainder of the microprocessor or it may reside in a separate mechanical package.
Within an x86-compatible microprocessor, a floating point macro instruction is decoded into a sequence of floating point micro instructions that direct the microprocessor to execute a floating point operation. The sequence of floating point micro instructions is passed to the floating point unit. The floating point unit executes the sequence of floating point micro instructions and provides a result of the floating point operation in a result register. Likewise, an integer macro instruction is decoded into a sequence of integer micro instructions that direct the microprocessor to execute an integer operation. The sequence of integer micro instructions is passed to the integer unit. The integer unit executes the sequence of integer micro instructions and provides a result of the integer operation in a result register.
In more recent years, desktop computational demands have placed a greater burden upon microprocessor designers to add increasingly more functionality to a microprocessor's instruction set. In fact, floating point operations are now so common now that a vast majority of present day floating point units perform their computations on operands which adhere to an industry standard format, called extended-precision format. An operand in extended-precision format has 64 significant bits.
Although the extended-precision format is employed internal to a floating point unit for computational purposes, operands may be stored in memory in formats having less than 64 significant bits: an operand stored in memory in single-precision format, for example, has only 24 significant bits. When the operand is provided from memory to the floating point unit, however, it is converted to extended-precision format. Subsequent computations performed with the converted operand thus yield a result in extended-precision format. For many applications, a higher precision result is welcomed. But for other applications, such as 3D graphics applications, such precision is unnecessary. More specifically, an operand representing information for a pixel on a video monitor need only be provided in single-precision format; the precision afforded by extra bits cannot be distinguished when the pixel is displayed. Hence, graphics applications routinely cause extended-precision results to be rounded or truncated to single-precision format.
Although conventional microprocessors provide for an extended-precision result to be rounded/truncated to single-precision format, their floating point logic still performs computations on extended-precision operands. For computational operations where the time required to perform a computation is not a function of the number of significant bits in its associated operands, translation to and from extended-precision format is transparent to the applications. However, the time required to execute many floating point computations is directly proportional to the number of significant operand bits. Extraction of a square root is exemplary of this case.
Typically, a square root is computed in a microprocessor using one of a class of iterative techniques whereby successive bits of the square root are calculated during each iteration. For instance, calculation of a square root in a conventional floating point unit requires roughly 64 iterations to compute significant digits in an extended-precision result: one bit of the extended-precision result is generated during each of the 64 iterations. But, if the extended-precision square root is specified by a macro instruction to be returned in single-precision format, then 40 of the iterations are essentially wasted in the computation of significant bits which are ultimately not used. Execution of the macro instruction is unnecessarily delayed. This is a problem affecting execution time for any application requiring calculation of a square root in any format less precise than extended-precision format.
Therefore, what is needed is a microprocessor for calculating a square root faster than has heretofore been provided.
In addition, what is needed is a microprocessor that calculates a square root of a floating point operand where the number of calculated significant bits in the square root are less than the number of significant bits in the floating point operand.
Furthermore, what is needed is a method for performing single-precision square root calculation in a microprocessor that eliminates unnecessary clock cycles associated with the performance of extended-precision square root extraction.