Computation devices that perform arithmetic operations are well known in the art. In order to perform such operations, these computation devices typically comprise an arithmetic logic unit or the like. The arithmetic logic unit or, as it is sometimes referred to, a math engine, implements circuitry in hardware used to perform separate arithmetic functions. Such functions range from relatively simple operations such as addition and multiplication to more complex operations such as producing/figuring exponents, logarithms, inverses and the like. While a variety of techniques exist in the prior art for approximating the values of more complex functions, a technique that is often used relies on tables of point values and slope values to approximate the output value of a function.
Referring to FIGS. 1-3, there is illustrated an example of the use of point and slope values to approximate the output value of a function. Referring to FIG. 1, an arbitrary function 102 is illustrated as a continuous line. For each of a plurality of known, discrete input values, labeled x0 through xp, there are corresponding known output values, labeled ƒ(x0) through ƒ(xp). The discrete input values are often referred to as points, whereas the output values are referred to as point values. Furthermore, linear approximations of the function between the point values are referenced in terms of slopes. This is further illustrated in FIG. 2.
In FIG. 2, a plurality of point values labeled xj, xj+1 and xj+2, and their corresponding output values ƒ(xj), ƒ(xj+1) and ƒ(xj+2) according to an arbitrary function 202 are shown. Between each of the points, piecewise linear approximations 204, 206 of the curve 202 are also shown. Each of the linear approximations 204, 206 is characterized by a slope, m, according to well known geometry principles. When attempting to approximate the output value, ƒ(x), for an arbitrary input, x, it is first determined which point the input value x is closest to. In the example illustrated in FIG. 2, the input value x falls between the points xj and xj+1. In particular, the input value, x, differs from the point, xj, by value, Δx, as illustrated. Using the well known equation for a line, the approximated (or estimated) output value, ƒ′(x), corresponding to the input value, x, may be calculated according to the equation:f′(x)=f(xj)+mΔx  (Eq. 1)
The difference between the estimated output value, ƒ′(x), and the true output value, ƒ(x), as shown in FIG. 2, is the error that results from the approximation nature of the method illustrated in FIG. 2. Assuming that a sufficient number of points and point values are used, the error resulting from the above-described method can be kept relatively small, while still maintaining the relative ease of implementation of this method. A technique for implementing this method is further illustrated with respect to FIG. 3.
The technique previously described with respect to FIG. 2 may be implemented using a point table 302, a slope table 304, a multiplier 306 and an adder 308. The implementation illustrated in FIG. 3 operates upon input values represented as a signed mantissa and an exponent value. Equation 2 below illustrates a signed mantissa and exponent, base-2 representation, i.e., binary.X or Numerical Value=(+/−)Mantissa×2Exponent  (Eq. 2)
In essence, the mantissa represents the significant digits of a value and the exponent value represents a relative magnitude of the significant digits. A sign bit labeled S in the figures, indicates whether the mantissa value is positive or negative. In this manner, a very large range of values may be represented depending on the number of bits used. The mantissa, for example x, may be further divided into a first portion, labeled x0, and a second portion, labeled Δx. As shown, the first portion x0 comprises the most significant bits of the mantissa and defines the points as previously described. For example, if the first portion x0 comprises the five most significant binary bits, there are 32 points available. The remaining least significant digits define the second portion illustrated as Δx in FIG. 3. In implementing Equation 1, the first portion of the mantissa, or point, is used to reference the point table 302 to provide a corresponding point value, ƒ(x0). Likewise, the first portion of the mantissa is also used to index the slope table 304 to provide a corresponding slope value, m. The values in the point tables 302 and the corresponding values in the slope tables 304 are constants defined according to the equation being approximated. Furthermore, the values in the point table 302 and slope table 304 are defined over a limited range for which the approximation is valid. As shown, the resulting slope value, m, is multiplied by the value of the second portion of the mantissa, Δx, by the multiplier 306 and the resulting product is added to the point value, ƒ(x0), by the adder 308. The output of the adder 308 is the mantissa of the output of the function.
Additionally, sign/exponent processing 310 is performed on the input value sign, s, and exponent in order to provide the output value sign and exponent, as shown in FIG. 3. The particular processing implemented by the sign/exponent processing block 310 depends upon the representation of the exponent as well as the particular function being approximated. For example, in order to avoid negative exponent values, it is a common practice to add an offset or bias to the exponent value equivalent to a mid-point of the range of values that may be represented by the exponent. Thus, if 8 bits are used to represent exponents, an offset of 128 will prevent any negative exponent values. This is illustrated in Table 1 below.
TABLE 1ExponentExponentWithout OffsetWith Offset−128 0   0128  128256
In order to operate upon the exponent, it therefore becomes necessary to first remove the offset when processing the exponent and, when processing is completed, to add the offset value once again. Additionally, the nature of the function being approximated affects the processing of the exponent. For example, where an inverse function is being implemented, processing of the true value of the exponent can be as simple as inverting each binary bit of the biased exponent value and then subtracting two (one if the input is an exact multiple of 2.0). In another example, implementation of a square root function requires subtracting the biased exponent value from 381 (383 if the input is an exact multiple of 4.0), then dividing by two. A third example would be the logarithm function where the input exponent is simply unbiased and concatenated as an integer value to the fixed point fractional mantissa result. Such sign and exponent processing is well known to those having ordinary skill in the art.
Regardless, as can be seen in FIG. 3, the implementation of this technique is relatively simple, requiring only two tables 302, 304, a multiplier 306 and an adder 308. However, the precision obtained by this technique is limited by the number of values stored in the point table 302 and the slope table 304. That is, greater precision is gained only by significantly enlarging the overall size of the point table 302 and the slope table 304. In some instances, the required precision may lead to a prohibitively large set of tables. For example, the so-called DirectX8 standard calls for up to 22 bits of precision when calculating reciprocal values and reciprocal square root values. The size of the tables required to achieve this level of precision using the implementation shown in FIG. 3 would be prohibitively large. Therefore, a need exists for a technique that provides the necessary precision when approximating arithmetic functions, and that is relatively simple to implement.
The DirectX9 standard typically provides a maximum absolute error equivalent to only 9 to 10 bits of precision when calculating sine and cosine function values. Further, the DirectX9 standard, if implemented in software, typically requires 8 instructions to calculate sine and cosine function values. Since a pipelined hardware implementation of the DirectX9 standard may require up to four twenty-four bit hardware multipliers, such an implementation would be relatively expensive and, therefore, costly to implement. Additionally, the size of the tables required to achieve this level of precision using the implementation shown in FIG. 3 would be prohibitively large.
According to another known technique, the Taylor series approximation may be implemented to produce the sine and cosine functions. However, to provide a high level of precision, such as 8 or 9 bits of precision, a large amount of processing resources is required to implement the Taylor series approximations.
According to another technique, a floating point argument of a function addresses a floating point interpolating memory. A memory and a decoder produce coefficients in response to a floating point exponent. A polynomial evaluator produces a floating point representation of the evaluated function. However, since the floating point argument addresses the memory, additional computations are required in order to convert the floating point argument into a value for addressing the memory. As a result, additional processing and additional complexity are required, resulting in increased cost and/or increased delay in processing and evaluating the polynomial.