1. Field of the Invention
The present invention relates to an arithmetic and logical unit (ALU), and more particularly, to a pipelined divider with a small lookup table.
2. Discussion of the Related Art
Typically, a division operation in the ALU has the characteristic in that its frequency is relatively low in comparison to other arithmetic operations. Due to this characteristic, a divider has been designed to occupy a small installation area, and mostly implemented in an iterative manner. Accordingly, a delay time produced during performing the division operation is relatively long in comparison to that of other operations. In spite of the low frequency, the long delay time of the division operation greatly affects the whole system.
Recently, as the degree of integration of a very large scale integrated circuit (VLSI) becomes higher and the application fields that require a high performance become greater, a structure that performs a division without iteration has been proposed. Especially, a three-dimensional (3-D) graphic process has been highlighted as an important application of a processor, and a divider composed of pipelines having a high throughput becomes necessary to process 3-D graphics at a high speed.
FIG. 1 is a block diagram of a conventional divider proposed by P. Hung.
Referring to FIG. 1, P. Hung has proposed a pipelined divider by correcting the Taylor's-series expansion. According to this pipelined divider, up to second-order terms of the Taylor series of a reciprocal of a divisor Y are stored in one lookup table LUT1, and a division operation is performed with only twice multiplication using a lookup table reference and first and second multipliers MUL1 and MUL2. Thus, the divider can be constructed in the form of pipelines.
The divider of P. Hung has the advantage that it can somewhat reduce the size of the lookup table LUT1 in comparison to the existing system. However, as shown in FIG. 6, since this divider still requires a large-sized lookup table LUT1 (for instance, 13KB in a single precision, and 440 MB in a double precision), it has the disadvantage that it occupies a large chip area. Also, as the precision becomes higher, this problem becomes more strained.
Meanwhile, the existing divider may be briefly classified into a unit regression type such as a Sweeney-Robertson-Tocher (SRT) type and so on, and a multiplicative type.
The multiplicative type is a kind of approximation type, and uses multipliers and lookup tables. Two well-known algorithms among those using multipliers are a Newton-Raphson type algorithm and a series expansion type algorithm, and both of them calculate quotients by operating reciprocals brought from the lookup tables through the multipliers.
These methods have the disadvantage that they require a large-sized lookup table. For example, the divider of the Newton-Raphson type or the series expansion type having a 16-bit seed requires the lookup table size of 64KB in total. These methods can perform a division operation of the single precision through the twice iteration. In case of using a 8-bit seed, the size of the lookup table is small, i.e., of 128B, but the delay time greatly increases since a three-times iteration should be performed. An accurate quotient approximation method using the Taylor's series uses plural lookup tables, and requires lookup tables of 19.5KB in order to perform an operation of a single precision by once iteration. Here, the precision type follows the IEEE (Institute of Electrical & Electronics Engineers) standard, and is composed of a sign bit of one bit, an exponent part of 8 bits, and a fraction part of 23 bits. According to the normalized form of the fraction part, the most significant bit (MSB) is “1”, but the MSB is omitted in the floating-point expression as a hidden bit.
A recently proposed divider provided with pipelines has either a structure proposed by A. Liddicoat or a structure proposed by P. Hung (See FIGS. 1 and 2).
The divider of A. Liddicoat calculates up to a three-order term of the Newton-Raphson algorithm, and reduces the delay time using parallelism.
The divider of P. Hung uses the Taylor's series expansion, and has the advantage in that its structure is simplified and the size of the lookup table is relatively small in comparison to the existing divider. According to the algorithm proposed by P. Hung, the division operation can be expressed by the following equation 1 through the Taylor's series expansion.
                              [                      Equation            ⁢                                                  ⁢            1                    ]                ⁢                                  ⁢                              X            Y                    =                                    X                                                Y                  h                                +                                  Y                  1                                                      =                                          X                                  Y                  h                                            ⁢                              (                                  1                  -                                                            Y                      1                                                              Y                      h                                                        +                                                            (                                                                        Y                          1                                                                          Y                          h                                                                    )                                        2                                    -                  ⋯                                )                                                                        (        1        )            
Here, Yh is a value of up to an upper p bit of Y, Y1 is a value obtained by subtracting Yh from Y. X and Y are normalized fixed point numbers, and thus the respective values have the boundary condition of the equation 2.
                              [Equation  2]                ⁢                                  ⁢                              1            ≤            X                    ,                      Y            <            2                          ⁢                                  ⁢                  1          ≤                      Y            h                    <                      2            -                          2                                                -                  p                                +                1                                                    ⁢                                  ⁢                  0          ≤                      Y            1                    <                      2                                          -                p                            +              1                                                          (        2        )            
By approximating only up to the second-order term of the Taylor's series in the equation 1, the following equation 3 is obtained.
                              [Equation  3]                ⁢                                  ⁢                              X            Y                    ≈                                    X              ⁡                              (                                                      Y                    h                                    -                                      Y                    1                                                  )                                                    Y              h              2                                                          (        3        )            
According to the equation 3, the division operation can be performed by multiplying the dividend X by Yh−Y1 obtained from the divisor Y, and then multiplying the multiplication result by 1/Yh2. Here, it is possible to obtain Yh−Y1 by correcting the booth encoding of the multiplier MUL1 used for the multiplication of X by Yh−Y1 without the necessity of actually performing a subtraction.
As described above, the algorithm proposed by P. Hung approximates 1/Yh2 using one lookup table LUK1. This algorithm can somewhat reduce the size of the lookup table in comparison to the existing algorithm. Nevertheless, the divider proposed by the P. Hung also has the disadvantage in that its chip area is still large in comparison to the divider implemented by the existing iterative algorithm. In case of the single precision of 32 bits, it occupies a lookup table area of about 13KB, and in case of the double precision of 64 bits, its implementation is actually impossible (See FIG. 6). Accordingly, it is a very important subject to reduce the size of the lookup table in a pipeline type divider.