The implementation of a Sum of Products (SOP) expression can be used to realize signal-processing applications on hardware. A SOP implementation is hereby discussed with reference to a hardware realization of a digital filter. A digital filter is an electronic circuit processing discrete signal samples to perform a desired transfer function operation on the discrete signal samples. The digital filter is a Finite Impulse Response (FIR) or Infinite Impulse Response (IIR) filter.
The following components are used in the digital filter implementation.
Unit sample delay (11): A unit sample delay (Z−1) is shown in FIG. 1(a). A unit sample delay is a Parallel In Parallel Out (PIPO) register as it captures the sample at its input on the next positive clock edge.
Multiplier (12): Takes in an input ‘x’ and gives a multiplied output ‘y’=‘a*x’, where ‘a’ is the multiplicand. Its symbol is a triangle with a multiplicand by its side as shown in FIG. 1(b).
‘P1’ bit adders (13): Takes in two inputs each of width ‘p1’ bits and gives an added output of ‘p1’ bits. Hence this adder is called the ‘p1’ bit parallel adder and its symbol is ‘+’ in a circle as shown in FIG. 1(b).
‘P1’ bit subtractor (15): Takes in two inputs each of width ‘p1’ bits and subtracts one output from the other and gives an output of ‘p1’ bits. Hence this subtractor is called as the ‘p1’ bit parallel subtractor and its symbol is shown in a circle with ‘+’ in it and ‘−’ on one of the inputs as shown in FIG. 1(b).
Parallel shifter (14): Takes in an input ‘x’ and gives a ‘z’ bit left shifted output. Suppose ‘x=101’ and ‘z=2’ then ‘y=10100’, then its symbol is a line with ‘<<z’ on it (where ‘z’ can be any positive number) as shown in FIG. 1(b).
The working of unit sample delay is evident from the timing diagram shown in FIG. 1(c). The samples X0, X1, X2 . . . X7 at the input of unit sample delay in FIG. 1(c) appear as delayed by one Clock (Clk) period as they are captured by the next positive edge of the Clk. The transfer function H (Z) of a FIR filter has the generic form as given in equation (2a);
                                          H            ⁡                          (              Z              )                                =                                    ∑                              k                =                0                            N                        ⁢                                          b                k                            ⁢                              z                                  -                  k                                                                    ,                            (                  2          ⁢          a                )            where bK represents the coefficients, and z−k represents delay of k clock cycles.
The equation of a FIR filter is given below:
                                          y            ⁡                          (              n              )                                =                                    ∑                              k                =                0                            N                        ⁢                                          b                k                            ⁢                              x                ⁡                                  (                                      n                    -                    k                                    )                                                                    ,                            (                  2          ⁢          b                )            where y(n) is the output and x(n−K) is the delayed input.
The FIR filter has different types of implementations in hardware. Two of the important implementations of the FIR filter are the direct form of coefficient bank implementation as shown in FIG. 2(a) and the transposed form of implementation as shown in FIG. 2(b). The direct form of the FIR filter has an input X connected to the first unit sample delay. Unit sample delays (11) are connected serially to each other. Taps S0 to SN obtained from unit sample delays (11) are connected to respective coefficient multipliers (12) b0 to bN. The output of the multipliers (12) is added through a plurality of adders (13,15) to form the output Y of the filter.
The transposed form of the FIR filter (2b) has the input X connected to the plurality of coefficient multipliers b0 to bN (12) resulting in taps S0 to SN. The taps S0 to SN are connected to the plurality of unit sample delays (11) and adders (13,15) to form the output Y of the filter.
Given below is the equation of the IIR filter:
                                          H            ⁡                          (              Z              )                                =                                    (                                                ∑                                      m                    =                    0                                    M                                ⁢                                                      b                    m                                    ⁢                                      z                                          -                      m                                                                                  )                        /                          (                              1                -                                                      ∑                                          l                      =                      1                                        L                                    ⁢                                                            a                      1                                        ⁢                                          z                                              -                        1                                                                                                        )                                      ;                            (                  2          ⁢          c                )            where H (Z) is the transfer function, bm and a1 are the coefficients and z−k represent a delay of k clock cycles.
The equation of the IIR filter can also be written as in equation (2d);
                                          y            ⁡                          (              n              )                                =                                                    ∑                                  l                  =                  1                                L                            ⁢                                                a                  1                                ⁢                                  y                  ⁡                                      (                                          n                      -                      1                                        )                                                                        +                                          ∑                                  m                  =                  0                                M                            ⁢                                                b                  m                                ⁢                                  x                  ⁡                                      (                                          n                      -                      m                                        )                                                                                      ,                            (                  2          ⁢          d                )            where y(n) is the output, y(n−1) is the delayed output and x(n−m) is the delayed input.The implementation of a coefficient is the implementation of a number in hardware. Hence the term coefficient and number are used interchangeably in the specification.
The structure of a direct and transposed form IIR filter is shown in FIG. 2(c). The direct form structure has a plurality of unit sample delays (11), and two sets of direct form coefficient banks (1) formed by plurality of adders (13,15) and plurality of the multipliers (12) connected to the taps. The transposed form of the IIR filter has plurality of unit sample delays (11), and two sets of transposed form coefficient banks (1) formed by a plurality of adders (13,15) and plurality of the multipliers (12) connected to the taps as shown in FIG. 2(c). The parallel direct or transposed form of coefficient bank is the area consuming part in the parallel digital filter implementations. An existing way of implementing the parallel direct or transposed form of coefficient bank using the shift (12) and add (13,15) has been described in FIG. 2(b). A left shift is ‘0’ cost in parallel implementations as it needs only appending lines set at ‘0’ on the left of the parallel bus. The remaining part of shift (12) and add (13,15) implementation comprises adders. The implementation of parallel multipliers using shift and add is shown in FIG. 2(d).
An existing method for implementing the direct form of coefficient bank is described as follows. The existing method of implementation of direct form of coefficient bank has two parts, part 1 is an algorithm shown in FIG. 2(e), which takes coefficients taps as input and gives the equation of direct form of coefficient bank and horizontal subexpressions as the output, and part 2 is the generalized structure shown in FIG. 2(g), which is obtained by mapping the equation of the direct form of coefficient bank, and horizontal sub-expressions onto hardware.
The method is explained by using coefficient bank implementation for a given example.
Assuming that the example coefficients of an FIR filter are given by, Set 1={−79, 1044, −5890, 27916, 49362, −8382, 1628, −154, 63, 126}. The equation form of the direct form of coefficient bank for the example is as given below:Y=[−79]S0+[1044]S1+[−5890]S2+[27916]S3+[49362]S4+[−8382]S5+[1628]S6+[−154]S7+[S3]S8+[126]S9  (2e)
FIG. 2(f1) shows the direct form of implementation of the example FIR filter. Structure (1) is the direct form of coefficient bank, that comprises unit sample delays (11), multipliers and adders (12,13,15). The execution of the existing method (FIG. 2(e)) is explained with reference the coefficients of Set 1.
Step 1: This step illustrates the conversion of coefficients into Canonical Signed Digit (CSD) representation. Here the value of ‘i’ is −1.
−79=i0i0001; 1044=10000010100; −5890=i01001000000i0; 27916=100i0i0100010i00 49362=10i0000010i010010; −8382=i0000i01000010; 1628=10i010i00i00; −154=i0i010i0; 63=100000i; 126=100000i0
The direct form of coefficient bank with CSD representation of coefficients is as given below:Y=(2^0−2^4−2^6)S0+(2^2+2^4+2^10)S1+(−2^1+2^8+2^11−2^13)S2+(−2^2+2^4+2^8−2^10−2^12+2^15)S3+(2^1+2^4−2^6+2^8−2^14+2^16)S4+(2^1+2^6−2^8−2^13)S5+(−2^2−2^5+2^7−2^9+2^11)S6+(−2^1+2^3−2^5−2^7)S7+(−2^0+2^6)S8+(−261+2^7)S9  (2f1)orY=S0−S8+2(−S2+S4+S5−S7−S9+2(S1−S3−S6+2(S7+2(−S0+S1+S3+S4+2(−S6−S7+2(−S0−S4+S5+S8+2(S6−S7+S9+2(S2+S3+S4−S5+2(−S6+2(S1−S3+2(S2+S6+2(−S3+2(−S2−S5+2(−S4+2(S3+2(S4)))))))))))))))  (2f2)The number of adders required to implement equation (2f2) are 38. Hereafter, Adders/Subtractors are referred as adders for the ease of explanation.
Step 2: Horizontal subexpressions are formed from the CSD representation of coefficients. Table 1 shows the horizontal optimization for the example. From Table1, it is seen that subexpression in row 2^0 is −(S0-S8), and in row 2^6 it is (S0-S8). Since S0-S8 is
TABLE 1Showing Horizontal Optimizationpresent in two ‘rows’, it is called as horizontal subexpression.
Similarly S1−S3, −S2+S5, S3+S4 are other horizontal subexpressions. The equation of the direct form of coefficient bank of the example after horizontal optimization is:Y=X1+2(X3+S4−S7−S9+2(X2−S6+2(S7+2(−S0+S1+X4+2(−S6−S7+2(−X1−S4+S5+2(S6−S7+S9+2(−X3+X4+2(−S6+2(X2+2(S2+S6+2(−S3+2(−S2−S5+2(S4+2(S3+2(S4))))))))))))))))  (2g).The equation can also be written as equation 2(i) for showing hardware mapping.Y=X1+2(E1+2(E2+2(S7+2(E3+2(−E4+2(E5+2(E6+2(E7+2(−S6+2(X2+2(E8+2(−S3+2(−E9+2(−S4+2(S3+2(S4))))))))))))))))  (2i),where the expressions E1=X3+S4−S7−S9, E2=X2−S6, E3=−S0+S1+X4, E4=S6+S7, E5=−X1−S4+S5, E6=S6−S7+S9, E7=−X3+X4, E8=S2+S6, E9=S2+S5, and horizontal subexpressions X1=S0−S8, X2=S1−S3, X3=−S2+S5, X4=S3+S4. The number of adders required to implement the equation (2i) is 34.
The structure of the direct form of coefficient bank from the existing method of implementations for the above example is shown in FIG. 2 (f2). In FIG. 2(f2) substructures SS [X1 to X4] show the hardware mapping of horizontal subexpressions X1 to X4, substructure SS [E1 to E9] are the hardware mapping of the expressions E1 to Ep and substructure SS [Y] is the hardware mapping of the equation Y. The generalized structure of the direct form of coefficient bank from the existing method of implementations is shown in FIG. 2 (g), where the substructures SS [X1 to Xm] show the hardware mapping of horizontal subexpressions X1 to Xm, substructures SS [E1 to Ep] are the hardware mapping of expressions and E1 to Ep and SS [Y] is the hardware mapping of the equation.
FIG. 2(i1) shows the transposed form of implementation of the example FIR filter. Structure (2) is the transposed form of coefficient bank implementation. Individual equations for the transposed form implementation of the example coefficient bank are:S0=−79X, S1=1044X, S2=−5890X, S3=27916X, S4=49362X, S5=−8382X, S6=1628X, S7=−154X, S8=63X, S9=126X. 
The execution of the steps in the existing method (FIG. 2(h)) are explained below.
Step 1: Conversion of coefficients into Canonical Signed Digit (CSD) or any other representation. The conversion of individual coefficients to CSD with reference to Set 1 is as given below:S0=−79X=(i0i0001)X S1=1044X=(10000010100)X S2=−5890X=(i01001000000i0)X S3=27916X=(100i0i0100010i00)X S4=49362X=(10i0000010i010010)X S5=−8382X=(i0000i01000010)X S6=1628X=(10i010i00i00)X S7=−154X=(i0i010i0)X S8=63X=(100000i)X S9=126X=(100000i0)X 
Step 2: From the vertical subexpression, form the CSD representation of coefficients. Table 2 shows the vertical subexpressions for the above stated example. In Table 2, the expression (2^−2^6) is present in column S0 and in column S8 as −(2^0−2^6). The expression is also present in columns S3 and S9, as (−2^2)(2^0−2^6) and (−2^1)(2^0−2^6) respectively. As the expression (2^0−2^6) is present in different columns, it is called vertical subexpression. Similarly (1−(2^2)) and (1+(2^3)) are other vertical subexpressions.
TABLE 2Showing Vertical Optimization
Equations of the coefficients after the vertical optimization:S0=−79X=(Y1−(2^4))X S1=1044X=((1+(2^2)+(2^8))*(2^2))X S2=−5890X=((−1+(2^7)+(2^10)*(Y2))*(2^1))X S3=27916X=((−Y1+(2^2)−(2^8)−(2^10)+(2^13))*(2^2))X S4=49362X=((Y3−(2^5)*(Y2)+(2^13)*(Y2))*(2^2))X S5=−8382X=((1+(2^5)*(Y2)−(2^12))*(2^1))X S6=1628X=((−Y3+(2^5)*Y2+2^9)*(2^2))X S7=−154X=((−1+(2^2)*Y2−(2^6))*(2^1))X S8=63X=(Y1)X S9=126X=(Y1*(2^1))X, where,Y1=1−(2^6), Y2=1−(2^2), Y3=1+(2^3) are vertical subexpressions.
It is observed from the above analyses that the number of adders required to implement the transposed form coefficient bank is 20. For the example the vertical subexpressions are mapped into hardware as substructures SS (Y1, Y2, Y3) as shown in FIG. 2(i2). The individual equations S0 to S9 of the transposed form of coefficient bank are mapped into hardware as substructures SS (S0 to S9) as shown in FIG. 2(i2). The substructures are formed from the plurality of adders 13,15 and shifter 14.
The generalized structure of the transposed form of the coefficient bank (2) from the existing method of implementation is shown in FIG. 2 (j). It is formed with substructures SS (Y1 to Ym) formed from mapping the vertical subexpressions Y1 to Ym into hardware and substructures SS (S0 to Sn) formed from mapping the individual equations S0 to Sn into hardware. The substructures are formed from the plurality of adders (13,15) and shifter (14). It is observed that the existing structure of the coefficient bank is inefficient, when high magnitude coefficients are required to be implemented.
The constraints in the existing method for SOP expression implementation are further illustrated by implementing a number with CSD representation. The CSD representation of 45=10i0i01 and structure of the CSD implementation of the number (45) is shown in FIG. 2(k1). It is observed that the number of adders required to implement 45 in CSD based implementation as in FIG. 2(k1) is 3.
In CSD implementation the input is shifted by a value of (2^z) (where ‘z’ is a positive number) and then added or subtracted from the other shifts of the input. Another existing way of implementation of a 45 in hardware is 9*5 and is shown in FIG. 2(k2). The number of adders required for implementing 45 as in FIG. 2(k2) is 2. The input x is multiplied by number 9 and then the result 9x is multiplied with 5 to get an output of 45x.
It is evident from the implementation of the coefficient multiplier 45 that there can be an area efficient way of implementing coefficients, so as to reduce the number of adder/subtractors in the resultant hardware. Thus, a requirement is felt for an efficient method to achieve a reduced or minimal area implementation of a Sum of Products expression.