1. Field of the Invention
The present invention relates to the field of floating- point arithmetic, and, more specifically, to a look-up table capable of evaluating a plurality of mathematical functions.
2. Description of the Related Art
Floating-point instructions are used within microprocessors to perform high-precision mathematical operations for a variety of numerically-intensive applications. Floating-point arithmetic is particularly important within applications that perform the rendering of three-dimensional graphical images. Accordingly, as graphics processing techniques grow more sophisticated, a corresponding increase in floating-point performance is required.
Graphics processing operations within computer systems are typically performed in a series of steps referred to collectively as the graphics pipeline. Broadly speaking, the graphics pipeline may be considered as having a front end and a back end. The front end receives a set of vertices and associated parameters which define a graphical object in model space coordinates. Through a number of steps in the front end of the pipeline, these vertices are assembled into graphical primitives (such as triangles) which are converted into screen space coordinates. One distinguishing feature of these front-end operations (which include view transformation, clipping, and perspective division) is that they are primarily performed using floating-point numbers. The back end of the pipeline, on the other hand, is typically integer-intensive and involves the rasterization (drawing on a display device) of geometric primitives produced by the front end of the pipeline.
High-end graphics systems typically include graphics accelerators coupled to the microprocessor via the system bus. These graphics accelerators include dedicated hardware specifically designed for efficiently performing operations of the graphics pipeline. Most consumer-level graphics cards, however, only accelerate the rasterization stages of the graphics pipeline. In these systems, the microprocessor is responsible for performing the floating-point calculations in the initial stages of the graphics pipeline. The microprocessor then conveys the graphics primitives produced from these calculations to the graphics card for rasterizing. For such systems, it is clear that increased microprocessor floating-point performance may result in increased graphics processing capability.
One manner in which floating-point performance may be increased is by optimizing the divide operation (this is equivalent to the reciprocal operation in many embodiments). Although studies have shown that division represents less than 1% of all instructions in typical floating-point code sequences (such as SPECfp benchmarks), these instructions occupy a relatively large portion of execution time. (For more information on the division operation within floating-point code sequences, please refer to "Design Issues in Division and Other Floating-Point Operations", by Stuart F. Oberman and Michael J. Flynn, published in IEEE Transactions on Computers, Vol. 46, No. 2, February 1997, pp. 154-161). With regard to the front-end stages of the graphics pipeline, division (or, equivalently, the reciprocal operation) is particularly critical during the perspective correction operation. A low-latency divide operation may thus prevent a potential bottleneck and result in increased graphics processing performance.
Additional floating-point performance may be gained by optimization of the reciprocal square root operation (1/sqrt(x)). Most square roots in graphics processing occur in the denominators of fractions, so it is accordingly advantageous to provide a function which directly computes the reciprocal of the square root. Since the reciprocal square root operation is performed during the common procedures of vector normalization and viewing transformations, optimization of this function represents a significant potential performance enhancement.
One means of increasing the performance of the reciprocal and reciprocal square root operations is through the use of dedicated floating-point hardware. Because floating-point hardware is relatively large as compared to comparable fixed-point hardware, however, such an implementation may use a significant portion of the hardware real estate allocated to the floating-point unit. An alternate approach is to utilize existing floating-point elements (such as a multiplier) to implement these functions based on iterative techniques like the Goldschmidt or Newton-Raphson algorithms.
Iterative algorithms for division require a starting approximation for the reciprocal of the divisor. A predetermined equation is then evaluated using this starting approximation. The result of this evaluation is then used for a subsequent evaluation of the predetermined equation. This process is repeated until a result of the desired accuracy is reached. In order to achieve a low-latency divide operation, the number of iterations needed to achieve the final result must be small. One means to decrease the number of iterations in the division operation is to increase the accuracy of the starting approximation. The more accurately the first approximation is determined, then, the more quickly the division may be performed.
Starting approximations for floating-point operations such as the reciprocal function are typically obtained through the use of a look-up table. A look-up table is a read-only memory (ROM) which stores a predetermined output value for each of a number of regions within a given input range. For floating-point functions such as the division operation, the look-up table is located within the microprocessor's floating-point unit. An input range for a floating-point function is typically bounded by a single binade of floating point values (a "binade" refers to a range of numbers between consecutive powers of 2). Input ranges for other floating-point functions, however, may span more than one binade.
Because a single output value is assigned for each region within a function's input range, some amount of error is inherently introduced into the result provided by the table look-up operation. One means of reducing this error is to increase the number of entries in the look-up table. This limits the error in any given entry by decreasing the range of input arguments. Often times, however, the number of entries required to achieve a satisfactory degree of accuracy in this manner is prohibitively large. Large tables have the unfortunate properties of occupying too much space and slowing down the table look-up (large tables take longer to index into than relatively smaller tables). In order to decrease table size while still maintaining accuracy, "bipartite" look-up tables are utilized. Bipartite look-up tables actually include two separate tables: a base value table and a difference value table. The base table includes function output values (or "nodes") for various regions of the input range. The values in the difference table are then used to calculate function output values located between nodes in the base table. This calculation may be performed by linear interpolation or various other techniques. Depending on the slope of the function for which the bipartite look-up table is being constructed, table storage requirements may be dramatically reduced while maintaining a high level of accuracy. If the function changes slowly, for example, the number of bits required for difference table entries is much less than the number of bits in the base table entries. This allows the bipartite table to be implemented with fewer bits than a comparable naive table (one which does not employ interpolation).
As described above, increasing the efficiency of the reciprocal and reciprocal square root functions may lead to increased floating-point performance (and thus, increased graphics processing performance). While prior art systems have implemented a single function (such as the reciprocal function) using a look-up table, this does not take advantage of the potential savings of optimizing both the reciprocal and reciprocal square root functions using look-up tables. This potential performance gain is outweighed by additional overhead required by the separate look-up table.
It would therefore be desirable to have a multi-function look-up table which implements both the reciprocal and reciprocal square root functions with minimal overhead. It would further be desirable for the multi-function look-up table to be a bipartite look-up table.