FIG. 1 shows a generic processing core 100 that is believed to describe many different types of processing core architectures such as Complex Instruction Set (CISC), Reduced Instruction Set (RISC) and Very Long Instruction Word (VLIW). The generic processing core 100 of FIG. 1 includes: 1) a fetch unit 103 that fetches instructions (e.g., from cache and/or memory); 2) a decode unit 104 that decodes instructions; 3) a schedule unit 105 that determines the timing and/or order of instruction issuance to the execution units 106 (notably the scheduler is optional); 4) an execution stage 106 having execution units that execute the instructions (typical instruction execution units include branch execution units, integer arithmetic execution units (e.g., ALUs) floating point arithmetic execution units (e.g., FPUs) and memory access execution units); and 5) a retirement unit 107 that signifies successful completion of an instruction. Notably, the processing core 100 may or may not employ microcode 108. In the case of micro-coded processors, the micro-ops are typically stored in a non volatile machine readable medium (such as a Read Only Memory (ROM)) within the semiconductor chip that the processor is constructed on and cause the execution units within the processor to perform the desired function called out by the instruction.
FIG. 2 shows a process for calculating transcendental functions or other functions with an approximation as presently described. For any such function, having an input operand X of n bits, the input operand can be divided into two sections X1 and X2. Specifically, X=[x1, x2, x3, . . . , xm−1, xm, xm+1, xm+2, . . . xn]=[X1, X2] where: X1=[x1, x2, x3, . . . , xm−1] and X2=[xm, xm+1, xm+2, . . . xn]. From FIG. 2, X1 is used as an input parameter to a look-up table 201 that produces coefficients C0, C1 and C2 in response thereto. The X2 term is kept as an individual term, and, is also squared to produce an X22 term. The approximation takes the form of f(X)=C0+C1X2+C2(X22) and is valid where X1≦X<X1+2−m.
The formulation of FIG. 2 can be implemented in a processing core (such as any processing core 100 referred to above in FIG. 1) to approximate various functions (such as a reciprocal function (1/X) and others). Here, the different functions are realized with different sets of coefficient values. For example, a first portion of the look-up table contains sets of C0, C1 and C2 coefficients (as a function of X1) for a reciprocal function, a second portion of the look-up table contains C0, C1 and C2 coefficients (as a function of X1) for another function, etc. The value of m (which defines the size of X1 and therefore the number of entries for a given function in the look-up table 201) depends on the function being approximated and the target precision. More details concerning the implementation of the formulation of FIG. 2 may be found in Pinero, J-A., et. al., High-Speed Function Approximation Using a Minimax Quadratic Interpolator, IEEE Transactions on Computers, Vol. 54, No. 3, March, 2005.