1. Field of the Invention
This invention relates to the efficient implementation of multiply/accumulate/delay algorithms of the form ##EQU1## which are commonplace computations in the implementation of various digital signal processing systems, such as digital filters and correlators, and more particularly relates to efficient implementations of FIR digital filters using a canonic signed-digit approach.
2. Description of the Prior Art
Finite impulse response (FIR) filters have become common over the last fifteen years for performing digital filtering. Such digital filters, which may be implemented in dedicated hardware, or by digital signal processors or by microprocessors, implement a filtering algorithm comprised of multiplying the input or delayed elements of the input with coefficients and summing the products to obtain an output. By altering the coefficients, many different filtering characteristics can be obtained. For some applications, it is necessary to allow the user of the filter to alter these coefficients. Thus, such filters are best implemented by using some form of programmable structure.
Specialized circuits such as the DSP 56200 available from Motorola of Schaumburg, Ill., have been specifically designed for implementing FIR digital filtering algorithms at high speed with programmable hardware. Such filters commonly have one or more delay elements, one or more coefficient multipliers that multiply either the input or the output of the delay elements with coefficients, and one or more adders that sum the output of the coefficient multipliers. Each such structure is relatively simple. However, if a filter providing thirty or more taps or delay elements is implemented, providing thirty or more repetitions of each such delay and computational element, a large amount of area on a semiconductor die is required. Various high speed programmable digital filters are described in C. Golla, F. Nava, F. Cavallotti, A. Cremonesi, P. Piacentini, G. Casagrande, and G. Campardo, "A 30M samples/s programmable filter processor," Proc. IEEE Int. Solid-State Circuits Conf., pp. 116-117, 1990; M. Hatamian and S. K. Rao, "A 100 MHz 40-tap programmable FIR filter chip," Proc. Int. Symp. Circuits and Systems, vol. 4, pp. 3053-6, May 1-3, 1990; and J. B. Evans, Y. C. Lim, and B. Liu, "A high speed programmable digital FIR filter," Proc. ICASSP-90, vol. 2, pp. 969-71, Apr. 3-6, 1990. These articles describe complex circuits having maximum data sample rates up to 100 Megahertz.
Frequently, to save space on a die, various components such as the adder and the multipliers may be shared in a time multiplexing manner. However, such multiplexing slows down the processing speed of the filter, resulting in a lower maximum data sample rate for the filter. The Motorola DSP 56200 is an example of such a chip as it uses a single multiplier. Due to the sharing of the multiplier, the Motorola DSP 56200 typically cannot process signals having high data sample rates. Typically, such circuits are used for much lower data sample rates such as below 1 MHz.
It has been known for a long time that finite impulse response filter coefficients can be implemented through the use of canonic signed-digits (CSD). Canonic signed-digits are described in, among other sources, H. Samueli, "An improved search algorithm for the design of multiplierless FIR filters with powers-of-two coefficients," IEEE Trans Circuits Syst., vol. CAS-36, pp. 1044-1047, July 1989.
The advantages of CSD may be seen as follows. It is well known that a signed-digit representation of a fractional number x can be written as: ##EQU2## or EQU x=S.sub.0 2.sup.0 +S.sub.1 2.sup.-1 + . . . +S.sub.M 2.sup.-M
where S.sub.k is an element of the set {-1,0,+1}. In the above equation, x has a word length of M+1 digits. The number of non-zero terms (i.e., where S.sub.k is not equal to 0) in x is the number of non-zero digits.
In general, there are several different signed-digit representations for a given number x. A minimal representation is one that requires the least number of non-zero digits. There may be more than one minimal representation for any given number X.
A canonic signed-digit (CSD) representation for a number is defined as the minimal representation for which no two non-zero digits S.sub.k are adjacent. The advantage of minimal signed-digit representation such as the CSD representation is that there are fewer non-zero terms in the equation. By having fewer non-zero terms, less hardware is needed in a physical implementation to represent the number.
It is easy to convert a binary number to CSD representation. For example, the number x=0.011111 (in binary) can also be represented by a signed-digit number as: EQU x=0.100001
where 1 represents a minus 1. Note that this CSD representation of x has fewer non-zero digits than the original representation of x; i.e., two non-zero values instead of five non-zero values in the original representation. To see that this is the same number as x, we can separate out the negative digit and subtract it from the positive digits:
______________________________________ .sup. .sup.11 0.100000 - 0.000001 0.011111 ______________________________________
to get the original representation of x. One advantage of a CSD representation is that it simplifies multiplication.
Whatever the means used to represent multiplier coefficient digits (e.g., decimal, BCD, binary, signed-digit, etc.), the multiplication operation is easily defined as simply a sequence of addition operations where the various partial products are added, with each partial product being computed by multiplying a multiplier coefficient digit by the multiplicand data and then performing an appropriate shift. In a general programmable multiplier, there must be hardware available to accomplish a multiplication and shift operation to generate a partial product for each multiplier digit.
A multiplier digit that might have the value zero is one that would generate a zero partial product. Therefore, unless it was known in advance where the zero multiplier digits were located, the hardware to generate the partial products would still be present, if programmable general-purpose multipliers are employed, even though no such partial products would need to be generated. An advantage of a minimal signed-digit multiplier coefficient representation, in particular CSD, is that it guarantees a certain minimum number of zero digits.
Algorithms for computing CSD coefficients in FIR filters that meet arbitrary specifications have been developed, as in H. Samueli, "An improved search algorithm for the design of multiplierless FIR filters with powers-of-two coefficients," IEEE Trans. Circuits Syst., vol. CAS-36, pp. 1044-1047, July 1989, for example.
As mentioned previously, a drawback of using such CSD coefficients in programmable filters is that they may still lead to severe hardware inefficiencies. In a digital FIR filter, which is most commonly structured in the direct configuration shown in FIG. 1(a) and the transposed configuration shown in FIG. 1(b), the input data samples x(n) are delayed by a string of one unit delays z.sup.-1 and processed by an array of multipliers C.sub.k, k=0, . . . , N-1 followed by adders 1.sub.0 through 1.sub.N-1. These multipliers are often called "taps" and the multiplier coefficients C.sub.k are often called tap coefficients. Examples of individual stages or filter taps are shown in FIGS. 2(a) and 2(b) for filters of FIGS. 1(a) and 1(b), respectively. Each multiplier and adder is typically a full hardware implementation of the arithmetical function for performing computations on a full input data word. The multiplier coefficient hardware typically has the same number of bits of precision for each tap as will be required for the most precise coefficient that will be used when the filter is programmed.
While we have referred to FIGS. 1(a) and 1(b) as an FIR filter, it should be noted (and is well known) that the same structures can implement the well-known correlation operation. Moreover, by appropriately interconnecting such structures, it is possible to implement a wide variety of DSP systems. For example, an interconnection of two FIR filter blocks is shown in FIG. 1(c) which, aside from a single pipeline delay of one input, implements the second-order Infinite Impulse Response (IIR) filter shown in FIG. 1(d).
In a straightforward programmable implementation of an FIR filter, whether or not CSD coefficients are used, many filter-tap multipliers would significantly waste valuable computational resources. All multiplier taps of a programmable structure would need to accommodate "difficult" coefficient values, that is, coefficients requiring relatively many non-zero digits. Yet in actual implementations, for a typical specific filter algorithm, most taps would not require such extreme capabilities. For example, the coefficient values that require more non-zero digits are often only those near the center of the FIG. 1(a) or FIG. 1(b) tap array of a typical lowpass FIR filter. Therefore, CSD has not apparently been used in a programmable digital filter because the lack of knowledge about where difficult (many non-zero digits) coefficients might appear results in a large percentage of wasted hardware for virtually any filter algorithm.