As is known to those of skill in the art, a polynomial is a mathematical expression comprising one or more terms, each of which consists of a constant multiplied by one or more variables raised to a non-negative integer exponent (e.g. a+bx+cx2, where a, b and c are the constants and x is the variable).
Polynomials are very common as they can be used to calculate a variety of values and/or model certain behavior. For example, a point (a1, a2, a3) 102 is determined to be on one side of a triangle 104 defined by three points (0,0,0), (b1, b2, b3), and (c1, c2, c3) as shown in FIG. 1a if equation (1) below is true:a1b2c3−a1b3c2−a2b1c3+a2b3c1+a3b1c2−a3b2c1≥0  (1)
In another example, a line passing through the point (0,0,0) and (v1, v2, v3) 106 is determined to pass through a sphere 108 defined by a radius r and centre (c1, c2, c3), as shown in FIG. 1b, if equation (2) is true:(v1c1+v2c2+v3c3)2−(v12+v22+v32)(c12+c22+c32−r2)≥0  (2)
When a polynomial is evaluated in hardware it may be evaluated using fixed point or floating point number representations. As is known to those skilled in the art a fixed point number representation is a representation of a number that has a fixed number of digits after the radix point (e.g. decimal point or binary point). In contrast, a floating point number representation of a number is a representation of a number where the radix point is not fixed (i.e. it can “float”). In other words the radix point can be placed anywhere within the representation.
The most common floating point standard is the Institute of Electrical and Electronics Engineers (IEEE) standard for floating-point arithmetic (IEEE-754). IEEE-754 specifies that floating point numbers are represented by three numbers: sign, exponent and mantissa (s, exp, mant). In general the three numbers (s, exp, mant) are interpreted, for a fixed integer bias, as shown in equation (3):(−1)s2exp-bias1,mant  (3)
IEEE-754 defines the four basic formats shown in Table 1 for floating point numbers with varying degrees of precision. In particular, they are encoded with 16, 32, 64 and 128 bits respectively.
TABLE 1ExponentMantissaBiasRoundoffSignWidthWidth2ew−1 −ErrorTypeNameWidth(ew)(mw)1(u)HalfF161510152−11SingleF3218231272−24DoubleF641115210232−53QuadF12811511216383 2−113
Floating point representations allow a greater range of numbers for the same number of bits (compared to fixed point number). Accordingly, both very large integers and small fractional numbers can be represented using floating point representations. However, since floating point numbers only have a limited number of bits they are prone to rounding errors. In particular, if the binary width of the exponent and mantissa are ew and mw respectively the number of bits of precision or significant bits is mw+1 (the floating point format has an implied bit of precision). The round off error u is half the distance between 1 and the next representable floating point value.
This rounding error inherent in floating point numbers means that performing arithmetic operations (e.g. evaluating polynomials) using floating point representations of the numbers (referred to herein as floating point arithmetic) does not always following standard real number arithmetic rules. For example, (4) illustrates some of the problems with floating point arithmetic, where “^” above an operation (e.g. {circumflex over (x)}, ⨣, {circumflex over (÷)}) denotes a floating point operation:a⨣0≠a a⨣(b⨣c)≠(a⨣b)⨣c a{circumflex over (×)}(b{circumflex over (×)}c)≠(a{circumflex over (×)}b){circumflex over (×)}c a{circumflex over (×)}(b{circumflex over (×)}c)≠(a{circumflex over (×)}b)⨣(a{circumflex over (×)}c)a{circumflex over (×)}(1{circumflex over (÷)}a)≠1a{circumflex over (×)}(b⨣c)≠(a{circumflex over (×)}b)⨣(a×c)a{circumflex over (×)}b=0a=0 or b=0  (4)
Accordingly floating point arithmetic is prone to error and adding more precision (e.g. bits to the floating point representation) does not always solve the problem. For example, consider a floating point implementation of the polynomial shown in equation (5):
                              (                                    (                                                (                                                            333.75                      ⁢                                              b                        6                                                              +                                                                  a                        2                                            ⁡                                              (                                                                              11                            ⁢                                                          a                              2                                                        ⁢                                                          b                              2                                                                                -                                                      121                            ⁢                                                          b                              4                                                                                -                          2                                                )                                                                              )                                +                                  5.5                  ⁢                                                            b                      ⁢                                                                                                            8                                                              )                        -                                          a                2                            ⁢                              b                6                                              )                +                  a                      2            ⁢            b                                              (        5        )            with the inputs a=77617 and b=33096. If the IEEE-754 single floating point representation (F32) is used (i.e. ew=8 and mw=23) the result is 1.17260361 . . . ; and if the IEEE-754 double floating point representation (F64) is used (i.e. ew=11 and mw=52) the result is 1.7260394005317847 . . . , despite the fact that the correct answer is −0.827396.
However, in certain situations evaluation of a polynomial using floating point arithmetic is required. Accordingly, there is a desire to be able to accurately evaluate polynomials using floating point arithmetic.
The embodiments described below are provided by way of example only and are not limiting of implementations which solve any or all of the disadvantages of known systems and method for evaluating polynomials using floating point components.