1. Field of the Invention
This invention relates to a floating-point serial-pipelined multiplier. More particularly, the present invention allows both addition and multiplication to be performed in one array of cells by delaying the Y operand, whereas the exponent propagates through the delay unchanged in the addition mode. This allows re-ordering of the mantissa digits for the multiply mode.
2. Description of the Related Art
Architectures well-matched to the computationally intensive task of real-time SONAR signal processing have been proposed by Whitehouse and SPEISER (1981) based on the systolic techniques of KUNG (1979). These architectures are characterised by arrays of identical, limited capability systolic processing elements (SPE's) with minimum complexity nearest neighbour interconnections which allow a large number of parallel arithmetic operations to occur. However, direct handling of the large-order matrices typical of SONAR applications is difficult. Physically realisable systolic arrays are currently of limited size and partitioning of the matrix computation is mandatory. This partitioning diminishes the benefits of the systolic approach.
A design methodology which minimises the partitioning problem is the bit-serial approach proposed by LYON (1981). It is based on minimum area serial data paths and fixed-point serial processing elements and allows many more processing elements to be included in a systolic array. For SONAR, however, the use of fixed-point arithmetic imposes unacceptable constraints. In particular, the large dynamic ranges typical of the application can only be met indirectly by increasing the precision of the representation (i.e. the number of bits). This solution slows the achievable processing rate and incurs area overheads of O(n) where n is the number of bits in the representation. Further, number overflows must be prevented by periodic rescaling which increases quantum noise in intermediate values. The net effect is a degradation of processor throughput.
To optimise system performance a floating-point format is required. This permits both precision and dynamic range to be independently optimised, and it has been shown that it can be implemented with a methodology related to the work of LYON. Bit serial data paths and processing elements are used and an associated mode bit differentiates between mantissa and exponent.
Each multiply/accumulate SPE as defined by WHITEHOUSE and SPEISER is constructed from several registers and a multiplier and accumulator. The major element is the multiplier. Current work on floating-point multipliers for systolic processor applications such as NASH (1984) uses parallel algorithms to implement the multiplication. The problem of implementing minimum complexity units is addressed as an application of this invention.
A study of the work on serial fixed-point multipliers, originally proposed by JACKSON et al. (1968), reveals many references to their inherent capability to perform concurrently the operation X.times.Y+Z. The present invention demonstrates a use for the addition operator which is applicable to the direct handling of time division multiplexed exponents in a serial floating-point data format. A canonic multiplication cell is described which when used in a linear array implements a serial-pipelined floating-point multiplier. For an m-bit mantissa and e-bit exponent, the area of the multiplier is O(m), independent of the exponent length. Further, the exponent length is arbitrary and allows the dynamic range to be varied during task execution.
The implications of this low complexity VLSI implementation optimised for both precision and dynamic range for real-time signal processors with computation rates in excess of 1000 million floating-point operations per second are significant.
Serial-pipelined multipliers of the general type to which the present invention is directed have attracted continuing attention for digital signal processing applications and reference may be had to a report of JACKSON, L. B., KAISER, J. F. and McDONALD, H. S. "An approach to the Implementation of Digital Filters", published in IEEE Trans. on Audio and Electroacoust., Vol. AU-16, pp. 413-21, September 1968. Further examples are LYON, R. F., "Two's Complement Pipeline Multipliers", IEEE Trans. on Comm. COM-24, pp. 418-425, 1976, also BALDWIN, G. L., MORRIS, B. L., FRASER, D. B. and TRETOLA, A. R., "A Modular, High-Speed Serial Pipeline Multiplier for Digital Signal Processing", IEEE J. Solid-State Circuits, SC-10, pp. 307-13, October 1975, and PEKMESTZI, K. Z. and PAPADOPOULIS, G. D., "Cellular two's Complement Serial-Pipeline Multipliers", The Radio and Electronics Engineer, Vol. 49, No. 11, pp. 575-580, November 1979.
Number representations have been restricted in these examples to sign-magnitude and two's complement fixed point, and none are directed to a serial floating-point multiplier of the type to which the present invention is directed. However, the implementations have varied from the direct addition of partial products to a five level re-coded Booth's algorithm. Examination of these implementations indicates that a minimum complexity design which offered most in terms of modularity was that proposed by PEKMESTZI and PAPADOPOULIS above. A brief description of a modification of this multiplier follows.
Operands are represented in two's complement notation: ##EQU1##
As the operands are less than one, the product is also less than one and can be written EQU A=(X.Y+2)mod2 (2.3)
Substituting (2.1) and (2.2) and after some algebra, EQU A=[.SIGMA.x.sub.i 2.sup.-i .SIGMA.y.sub.j 2.sup.-j +y.sub.0 (.SIGMA. x.sub.i 2.sup.-i +2.sup.-n+1)+ EQU (.SIGMA. (y.sub.j x.sub.0)2.sup.-j +2.sup.-m+1)+ ( x.sub.0 y.sub.0)]mod2 (2.4)
where ` ` represents logical negation.
Use of modulo 2 in equation (2.4) indicates that carries are not considered in the analysis.
The algorithm of (2.4) may be implemented with the basic cell of FIG. 4, for example. However, this figure differs somewhat from that presented by PEKMESTZI and PAPADOPOULIS. An additional delay element has been used for the partial product and carry terms to provide a simpler control structure. A further difference is the allowance for both operands to be serial.
A complete nxn serial-pipelined multiplier is constructed of n basic cells. Three cell types are required, corresponding to the last, second to last and other stages. Cell types are differentiated from each other by two links as shown in FIG. 4. Link positions are given in Table 1.
TABLE 1 ______________________________________ Link positions for a k-stage multiplier array. STAGE 1 . . . K-2 K-1 K ______________________________________ LINK A 1-2 1-3 1-3 LINK B 1-4 1-2 1-3 ______________________________________
Operands are entered with the least significant bit first. A control signal is entered in parallel with the data such that it is low only during the entry of the most significant bit (MSB). This signal performs the following functions:
(a) Latches Y.sub.i in cell (n-i) in order to invert the order of the bits in the Y operand. PA1 (b) Truncates the partial product sum from the preceding cell and substitutes the carry. PA1 (c) Realises via an EXCLUSIVE-OR gate the third term of (2.4). PA1 (d) Initialises all partial product terms to zero except in the final two stages. The second to last stage has a weight of 2.sup.-n added, and so effects rounding instead of truncation. In the last stage the term (y.sub.0 2.sup.-n+1) is added.
The term 2.sup.-m+1 of (2.4) is added to the first stage by setting the carry input to that stage high and holding the partial product input low. Terms y.sub.0 .SIGMA. x.sub.i 2.sup.-i and ( x.sub.0 y.sub.0) are realised by inverting the x.sub.i in the second to last stage.
However, it is desirable to implement a serial floating-point multiplier modified for purposes of the present invention as discussed above with respect to FIG. 4.