In graphics and scientific applications, logarithm computations occur frequently. A logarithm of a function can be defined as log(x×y)=log(x)+log(y). A natural logarithm value can be converted to a base 2 logarithm by the equation
                    log        2            ⁡              (        x        )              =                  In        ⁡                  (          x          )                            In        ⁡                  (          2          )                      ,where x may be a normalized floating-point number such that x=1.x0x1x3Kxn×2E. Then log2(x)=E+log2(1.x0x1x3Kxn), where log2(x)=E+log2(1.x0x1x3Kxn)∈[0,1).
One mechanism for implementing a logarithm (e.g., log2(x)) in a processing system, such as a central processing unit (CPU), is through the evaluation of a polynomial. For example, an optimal computation result can be realized by the following 7th order minimax polynomial −3.245537847+(7.133969564+(−7.494129806+(5.781438731+(−2.985020854+(0.9780947497+(−0.1839396192+0.1512535671e−1*x)*x)*x)*x)*x)*x)*x, which has a maximum error of 2.919206449e−7. The polynomial method, common on CPUs, may be implemented using seven multiply and accumulate (MAC) operations, with high precision required for the intermediate results of this computation.
Another mechanism for implementing a logarithm (e.g., log2(x), where x equals z0+z1) may be through the use of a Taylor series approximation, such as a 2nd order series as shown in equation (1):
                    y        =                              2                          x              2                                =                                    f              ⁡                              (                                  z                  0                                )                                      +                          (                                                z                  1                                ⨯                                                      f                    ′                                    ⁡                                      (                                          z                      0                                        )                                                              )                        +                          (                                                                    z                    1                    2                                    2                                ⨯                                                      f                    ′′                                    ⁡                                      (                                          z                      0                                        )                                                              )                                                          Eq        .                                  ⁢                  (          1          )                    A Taylor series approximation is typically implemented using dedicated hardware, as opposed to simply a CPU, since these operations are not common and multiple specialized tables are typically added to the processor. An example implementation of the 2nd order Taylor series described above is shown in FIG. 1.
FIG. 1 is a block diagram that illustrates functionality involved in processing a Taylor series approximation. That is, shown are various functional blocks of a processing mechanism 10, the functional blocks representing hardware devices and/or interconnect components (e.g., busses, wires) delineated in FIG. 1 according to the functionality being performed. The processing mechanism 10 includes registers 14 and 46, transfer blocks 20 and 22, lookup tables 24-28 (e.g., tableA 24, tableB 26, and tableC 28), round & truncate block 30, square function block 32, multiply blocks 34 and 36, add blocks 38 and 40, a subtractor block 42, and an interrupt block 44. The components and/or corresponding functionality would be understood by one having ordinary skill in the art, and thus discussion of the same except where noted below is omitted for brevity. The transfer block 20 is designated with an x, which represents variable z0 corresponding to the most significant eight (8) bits of the mantissa in register 14, wherein the top eight (8) bits correspond to address locations for various functions (e.g., base and derivative functions) of z0 located in tables 24-28. The transfer block 22 is designated with an xp, which represents variable z1 (herein also referred to as a least significant bits source operand) corresponding to the remaining fifteen (15) least significant bits of the mantissa in register 14. The register 14 (e.g., data input register) may hold data corresponding to a log2 (x) function in single precision (e.g., 32-bit) IEEE-754 floating point format, which includes a sign bit, exponent bits (e.g., 8 bits), and fractional or mantissa bits (e.g., 23 bits) in normalized format (i.e., leading one hidden, leading zeroes removed, and exponent scaled accordingly). The tableA 24 corresponds to the base z0 function, tableB 26 corresponds to the first derivative of the z0 function, and tableC 28 corresponds to the second derivative of the z0 function divided by two.
One way to reduce the complexity of the processing mechanism 10 shown in FIG. 1 is to simplify Equation (1). For example, Equation (1) can be rewritten in Horner's form as
                    y        =                              f            ⁡                          (                              z                0                            )                                +                                    z              1                        ⁡                          (                                                                    f                    ′                                    ⁡                                      (                                          z                      0                                        )                                                  +                                  (                                                            z                      1                                        ⨯                                                                                            f                          ′′                                                ⁡                                                  (                                                      z                            0                                                    )                                                                    2                                                        )                                            )                                                          Eq        .                                  ⁢                  (          2          )                    
Rewriting equation (1) as equation (2) eliminates the need for the square function block 32. Thus, the 2nd Order Taylor series in Horner's form may be implemented using three table lookups and two multiply-add operations. One exemplary instruction set for achieving the Taylor series implementation for the architecture described in FIG. 1 can be described as follows:FRAC R4:=Normalize((R0&0x7FFF)|((((R0&0x7F800000)>>23)−9)<<23));  (1)LOGTL1 R1:=TableLookup1 [R0&0x007F8000];  (2)LOGTL2 R2:=TableLookup2 [R0&0x007F8000];  (3)LOGTL3 R3:=TableLookup3 [R0&0x007F8000];  (4)FMAD R5:=R2+(R4*R3);  (5)FMAD R6:=R1+(R4*R5);  (6)EXPADD R7:=((R0&0x7F800000)>>23)+R6.  (7)Briefly, instruction line (1) provides an instruction to normalize the inputs. In other words, the hardware described above in association with FIG. 1 performs computations based on IEEE-754 floating point number format. Thus, in instruction (1), the least significant fifteen (15) bits of the mantissa are converted into a normalized floating point number. In instructions (2)-(4), a value in R0 is truncated (quantized) to eight (8) bits. Then, 3-table look ups are performed (ƒ(z0), ƒ′(z0), and ƒ″(z0)/2) and stored in registers R1, R2, and R3, respectively. In instructions (5) and (6), operations corresponding to equation (2) are performed using two floating point, fused multiply-and-add operations (FMAD). In instruction (7), the log of the exponent is added.
Several problems with the above-described architecture and instruction set are evident. For example, instructions (1) and (7) are non-standard on most architectures. Further, a significant number of instructions are required (e.g., seven instructions required). Also, as mentioned above, these operations are typically implemented in a dedicated processing unit, which may result in a low return on investment if the log function is implemented infrequently. It would be desirable to reduce the instruction set and implement such operations in an architecture that provides for frequent hardware utilization.