1. Field
Some aspects herein relate to methods and apparatus for use in the synthesis and manufacture of integer multipliers that are correct up to a given error tolerance and which perform multiplication as a sum of addends with faithful rounding.
2. Related Art
When modern integrated circuits (IC) designs are produced, these usually start with a high level design specification which captures the basic functionality required but does not include the detail of implementation. High level models of this type are usual written using high level programming language to derive some proof of concept and validate the model, and can be run on general purpose computers or on dedicated processing devices. Once this has been completed and the model has been reduced to register transfer level (RTL) using commercially available tools, or manually, this RTL model can then be optimised to determine a preferred implementation of the design in silicon.
The implementation of some types of multiplier in hardware often involve the determination of a number of partial products which are then summed, each shifted by one bit relative to the previous partial product. Visually this can be considered as a parallelogram array of bits as shown in FIG. 1, where the black circles represent bits.
When a function such as a multiplier is to be produced and manufactured in hardware a circuit to perform the multiplication is derived in Register Transfer Level (RTL) and from this a netlist of gates to produce a hardware circuit is derived. There are many applications in which the full result of a fixed point multiplication is not required, but an appropriately rounded result can be returned. A common approach is to use faithful rounding, which produces a result which is rounded to the required machine precision value above or below the precise result, but not always to the nearest one of these values.
A challenge is to create the most effective trade off between silicon area and error properties. Even for simple truncation schemes there are a wealth of design options and trade offs that can be made. Gathering error statistics for even modestly sized multipliers is extremely time consuming. In order to facilitate high level datapath synthesis capable of searching the design space of single or interconnected truncated multipliers in an acceptable time, analytic formulae must be found. The structure of the majority of truncated multiplication schemes of two n by n bit inputs a and b producing an n bit output y is as follows: truncate the multiplier sum of addends array by removing the value contained in the least significant k columns, denoted Δk, prior to the addition of the partial products [1]. A hardware-efficient function of the two multiplicands f(a,b) is then introduced into the remaining columns. Once the resultant array is summed, a further n−k columns are truncated, the result is then the approximation to the multiplication to the required precision. The structure of this general multiplier truncation scheme can be found in FIG. 1. This shows k columns of the array being truncated, and n bits being truncated from the sum of addends to produce a result of the required precision.
This formulation covers all the truncation schemes cited in this paper, as well as trivially incorporating non truncation schemes such as round to nearest, up: k=0 and f=2n−1.
The scheme may be summarised algebraically:
      y    =                  ⌊                              ab            +                                          2                k                            ⁢                              f                ⁡                                  (                                      a                    ,                    b                                    )                                                      -                          Δ              k                                            2            n                          ⌋            ⁢      a        ,  b  ,  n  ,      k    ∈    N  
The error introduced by doing so is:
      ɛ    =          ab      -                        2          n                ⁢                  ⌊                                    ab              +                                                2                  k                                ⁢                                  f                  ⁡                                      (                                          a                      ,                      b                                        )                                                              -                              Δ                k                                                    2              n                                ⌋                          ɛ    =                  (                              (                          ab              +                                                2                  k                                ⁢                                  f                  ⁡                                      (                                          a                      ,                      b                                        )                                                              -                              Δ                k                                      )                    ⁢          mod          ⁢                                          ⁢                      2            n                          )            +              Δ        k            -                        2          k                ⁢                  f          ⁡                      (                          a              ,              b                        )                                    ɛ    =          T      +              Δ        k            -                        2          k                ⁢                  f          ⁡                      (                          a              ,              b                        )                              
Where T=(ab+2kf (a·b)−ΔK)mod 2n. A design that exhibits faithful rounding is one such that:∀a,b|ε|<2n 
Note that if the correct answer is exactly representable then this perfect answer must be returned by a faithfully rounded scheme, otherwise |ε|≧2n. Early truncation schemes considered f being constant, [1] and [2] referred to as Constant Correction Truncated schemes (CCT). Following these, the proposal to make f a function of a and b appeared, termed Variable Correction Truncation (VCT) where the most significant column that is truncated is used as the compensating value for f, [3]. A hybrid between CCT and column promoting VCT has been proposed which only uses some of the partial product bits of the promoted column, termed Hybrid Correction Truncation [4]. Arbitrary functions of the most significant truncated column have been considered along with their linearization, one of these linearisations requires promoting all but the four most extreme partial products bits and adding a constant, called LMS truncation due to the fact it targets the least mean square error, [5] [6]. Forming approximations to the carries produced by Δk has also been put forward, termed carry prediction [7]. Apart from the AND array, a negative two's complement array has also been considered [8]. The particular case of constant multipliers have also been considered [9]. Further modified Booth multipliers have been studied while applying CCT and VCT [10]. Faithfully rounded integer multipliers have also been constructed by truncating, deleting and rounding the multiplication during the array construction, reduction and final integer addition, [11]. In terms of applications, DSP has been the main focus area but they also appear in creating floating point multipliers, where a 1 ulp accuracy is permitted [12]. The evaluation of transcendental functions has also been considered, utilising truncated multipliers as well as truncated squarers [13]. When surveying how the statistics of the error has been analysed, in general, exhaustive simulation is performed or error bounds were given without proof. In the original CCT paper, [1], the maximum error was stated without a full proof. In advanced compensation schemes such as [7], it is commented that it is difficult to know what kind of error is being generated and while exhaustive searches were conducted for n<9, for sizes above this, the only option was to resort to random test vectors. In [14], finding the best compensation function requires searching a space exponential in n and is only feasible for n<13. Further the schemes either find compensating functions heuristically or attempt to minimise the average absolute error or mean square error.
The issue with truncating bits in sum of products operations is that it is complex to determine the effect of truncation and usually error statistics need to be gathered which is time consuming and can lead to many iterations being required during RTL synthesis to produce just one sum of addends unit.