1. Field of the Invention
The present invention relates to the field of floating-point arithmetic, and, more specifically, to a method for generating look-up table entries for evaluation of mathematical functions.
2. Description of the Related Art
Floating-point instructions are used within microprocessors to perform high-precision mathematical operations for a variety of numerically-intensive applications. Floating-point arithmetic is particularly important within applications that perform the rendering of three-dimensional graphical images. Accordingly, as graphics processing techniques grow more sophisticated, a corresponding increase in floating-point performance is required.
Graphics processing operations within computer systems are typically performed in a series of steps referred to collectively as the graphics pipeline. Broadly speaking, the graphics pipeline may be considered as having a front end and a back end. The front end of receives a set of vertices and associated parameters which define a graphical object in model space coordinates. Through a number of steps in the front end of the pipeline, these vertices are assembled into graphical primitives (such as triangles) which are converted into screen space coordinates. One distinguishing feature of these front-end operations (which include view transformation, clipping, and perspective division) is that they are primarily performed using floating-point numbers. The back end of the pipeline, on the other hand, is typically integer-intensive and involves the rasterization (drawing on a display device) of geometric primitives produced by the front end of the pipeline.
High-end graphics systems typically include graphics accelerators coupled to the microprocessor via the system bus. These graphics accelerators include dedicated hardware specifically designed for efficiently performing operations of the graphics pipeline. Most consumer-level graphics cards, however, only accelerate the rasterization stages of the graphics pipeline. In these systems, the microprocessor is responsible for performing the floating-point calculations in the initial stages of the graphics pipeline. The microprocessor then conveys the graphics primitives produced from these calculations to the graphics card for rasterizing. For such systems, it is clear that increased microprocessor floating-point performance may result in increased graphics processing capability.
One manner in which floating-point performance may be increased is by optimizing the divide operation. Although studies have shown that division represents less than 1% of all instructions in typical floating-point code sequences (such as SPECfp benchmarks), these instructions occupy a relatively large portion of execution time. (For more information on the division operation within floating-point code sequences, please refer to "Design Issues in Division and Other Floating-Point Operations", by Stuart F. Oberman and Michael J. Flynn, published in IEEE Transactions on Computers, Vol. 46, No. 2, February 1997, pp. 154-161). With regard to the front-end stages of the graphics pipeline, division (or, equivalently, the reciprocal operation) is particularly critical during the perspective correction operation. A low-latency divide operation may thus prevent a potential bottleneck and result in increased graphics processing performance.
One means of increasing performance of the divide operation is through the use of dedicated floating-point division hardware. Because floating-point hardware is relatively large as compared to comparable fixed-point hardware, however, such an implementation may use a significant portion of the hardware real estate allocated to the floating-point unit. An alternate approach is to utilize an existing floating-point element (such as a multiplier) to implement division based on iterative techniques like the Goldschmidt or Newton-Raphson algorithms.
Iterative algorithms for division require a starting approximation for the reciprocal of the divisor. A predetermined equation is then evaluated using this starting approximation. The result of this evaluation is then used for a subsequent evaluation of the predetermined equation. This process is repeated until a result of the desired accuracy is reached. In order to achieve a low-latency divide operation, the number of iterations needed to achieve the final result must be small. One means to decrease the number of iterations in the division operation is to increase the accuracy of the starting approximation. The more accurately the first approximation is determined, then, the more quickly the division may be performed.
Starting approximations for floating-point operations such as the reciprocal function are typically obtained through the use of a look-up table. A look-up table is a read-only memory (ROM) which stores a predetermined output value for each of a number of regions within a given input range. For floating-point functions such as the division operation, the look-up table is located within the microprocessor's floating-point unit. An input range for a floating-point function is typically bounded by a single binade of floating point values (a "binade" refers to a range of numbers between consecutive powers of 2). Input ranges for other floating-point functions, however, may span more than one binade.
Because a single output value is assigned for each region within a function's input range, some amount of error is inherently introduced into the result provided by the table look-up operation. One means of reducing this error is to increase the number of entries in the look-up table. This limits the error in any given entry by decreasing the range of input arguments. Often times, however, the number of entries required to achieve a satisfactory degree of accuracy in this manner is prohibitively large. Large tables have the unfortunate properties of occupying too much space and slowing down the table look-up (large tables take longer to index into than relatively smaller tables).
In order to decrease table size while still maintaining accuracy, "bipartite" look-up tables are utilized. Bipartite look-up tables actually include two separate tables: a base value table and a difference value table. The base table includes function output values (or "nodes") for various regions of the input range. The values in the difference table are then used to calculate function output values located between nodes in the base table. This calculation may be performed by linear interpolation or various other techniques. Depending on the slope of the function for which the bipartite look-up table is being constructed, table storage requirements may be dramatically reduced while maintaining a high level of accuracy. If the function changes slowly, for example, the number of bits required for difference table entries is much less than the number of bits in the base table entries. This allows the bipartite table to be implemented with fewer bits than a comparable naive table (one which does not employ interpolation).
Prior art bipartite look-up tables provide output values having a minimal amount of maximum relative error over a given input interval. This use of relative error to measure the accuracy of the look-up table output values is questionable, however, because of a problem known as "wobbling precision". Wobbling precision refers to the fact that a difference in the least significant bit of an input value to the look-up table has twice the relative error at the end of a binade than it has at the start of the binade. A look-up table constructed in this manner is thus not as accurate as possible.
It would therefore be desirable to have a bipartite look-up table having output values with improved accuracy.