The following references are related to this field and may provide information relevant to the subject matter herein so that the disclosures of each of the following documents are incorporated herein by reference:    [1] A. Oppenheim and R. Schafer, Discrete-Time Signal Processing. Pearson Education, 2011.    [2] F. Van de Sande, N. Lugil, F. Demarsin, Z. Hendrix, A. Andries, P. Brandt, W. Anklam, J. Patterson, B. Miller, M. Rytting, M. Whaley, B. Jewett, J. Liu, J. Wegman, and K. Poulton, “A 7.2 gsa/s, 14 bit or 12 gsa/s, 12 bit signal generator on a chip in a 165 ghz ft bicmos process,” Solid-State Circuits, IEEE Journal of, vol. 47, no. 4, pp. 1003-1012, 2012.    [3] M.-J. Choe, Kwang-Hyun-Baek, and M. Teshome, “A 1.6-gs/s 12-bit return-to-zero gaas rf dac for multiple nyquist operation,” Solid-State Circuits, IEEE Journal of, vol. 40, no. 12, pp. 2456-2468, 2005.    [4] J. Xiao, B. Chen, T. Y. Kim, N.-Y. Wang, X. Chen, T.-H. Chih, K. Raviprakash, H.-F. Chen, R. Gomez, and J. Chang, “A 13-bit 9 gs/s rf dac-based broadband transmitter in 28 nm cmos,” in VLSI Circuits (VLSIC), 2013 Symposium on, 2013, pp. C262-C263.    [5] B. Mohr, N. Zimmermann, B. Thiel, J. Mueller, Y. Wang, Y. Zhang, F. Lemke, R. Leys, S. Schenk, U. Bruening, R. Negra, and S. Heinen, “An rfdac based reconfigurable multi-standard transmitter in 65 nm cmos,” in Radio Frequency Integrated Circuits Symposium (RFIC), 2012 IEEE, 2012, pp. 109-112.    [6] K. Parhi, VLSI Digital Signal Processing Systems: Design and Implementation. Wiley, 1999.    [7] Z.-J. Mou and P. Duhamel, “Short-length fir filters and their use in fast nonrecursive filtering,” Signal Processing, IEEE Transactions on, vol. 39, no. 6, pp. 1322-1332, 1991.    [8] C. Cheng and K. Parhi, “Hardware efficient fast parallel fir filter structures based on iterated short convolution,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 51, no. 8, pp. 1492-1500, 2004.    [9] Y.-C. Tsao and K. Choi, “Area-efficient parallel fir digital filter structures for symmetric convolutions based on fast fir algorithm,” Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 20, no. 2, pp. 366-371, 2012.    [10] Z.-J. Mou and P. Duhamel, “A unified approach to the fast fir filtering algorithms,” in Acoustics, Speech, and Signal Processing, 1988. ICASSP-88, 1988 International Conference on, 1988, pp. 1914-1917 vol. 3.    [11] I.-S. Lin and S. Mitra, “Fast fir filtering algorithms based on overlapped block structure,” in Circuits and Systems, 1993, ISCAS '93, 1993 IEEE International Symposium on, 1993, pp. 363-366 vol. 1.    [12] C. Cheng and K. Parhi, “Low-cost parallel fir filter structures with 2-stage parallelism,” Circuits and Systems I: Regular Papers, IEEE Transactions on, vol. 54, no. 2, pp. 280-290, 2007.    [13] Y.-C. Tsao and K. Choi, “Area-efficient vlsi implementation for parallel linear-phase fir digital filters of odd length based on fast fir algorithm,” Circuits and Systems II: Express Briefs, IEEE Transactions on, vol. 59, no. 6, pp. 371-375, 2012.    [14] R. Crochiere and L. Rabiner, Multirate digital signal processing. Prentice Hall, 1983.    [15] Z.-J. Mou, “Symmetry exploitation in digital interpolators/decimators,” Signal Processing, IEEE Transactions on, vol. 44, no. 10, pp. 2611-2615, 1996.
Finite impulse response (FIR) filters are commonly used in digital communication systems to perform a number of operations, including channel shaping or matched filtering, channel up/downconversion, channel synchronization, RF pre-distortion, and adaptive equalization. As a result, there has been a significant amount of research into methods of implementing FIR filters in FPGAs, ASICs and DSP processors (henceforth “FPGAs”). Over the years, a number of efficient techniques and structures have been identified and are now widely known to the industrial and academic communities [1].
Due to recent advances in RF integrated circuit technology, a current trend in the industry is to use wideband digital-to-analog and analog-to-digital converters to digitize or output large (up to multiple GHz) blocks of RF spectrum [2]-[5]. While this is very convenient from a system perspective, the higher sampling rates necessitated by these wideband signals place a significant computational burden upon the FPGA which processes the digital signal. In many applications, the required sampling rate of the digital signal exceeds the maximum clock frequency of the FPGA, which means that many traditional FIR filter structures are not directly applicable.
The straightforward method of handling this scenario is to construct a so-called parallel FIR filter, in which the high frequency input data is represented as L parallel streams, where L is the ratio of the sampling frequency to the FPGA clock frequency. The original prototype filter is decomposed into a set of L2 subfilters which are applied to the incoming parallel data. The outputs of these subfilters are then combined in order to generate L parallel output data streams [6]. The downside of the parallel FIR approach is its cost, which increases linearly with L. As the required sampling frequency increases, the number of parallel data streams becomes large, which can render even modest filtering operations computationally expensive. Consequently, there has been renewed interest in structures for the implementation of high-speed FIR filters in recent years [6]-[13].
One such structure is known as the Fast FIR Algorithm (FFA), described in detail in [6], [7], and [10]. The FFA decomposes the parallel filtering operation in a manner which allows some of the subfilters to be replaced with pre- and post-adders. This reduces the number of multiplications required to perform the filtering operation at the cost of extra additions—an advantageous tradeoff because the hardware cost of adders is typically much less than that of multipliers.
One downside of the standard FFA algorithm is that it is not well-suited to the case where the filter coefficients are symmetrical. Under normal circumstances, the number of multiplications required for a symmetrical filter may be cut in half by adding the common terms before multiplying. However, when the FFA is applied to a symmetrical filter, most of the FFA's subfilters are not symmetrical, thereby negating much of the benefit of the FFA. To address this deficiency, Tsao and Choi (T&C) proposed a modified FFA which is better-suited to symmetrical filters [9]. By changing the method used to construct the subfilters, T&C were able to generate a structure in which a higher proportion of the subfilters have symmetrical coefficients, resulting in a lower computational cost for symmetrical filters than the traditional FFA.
However, the T&C FFA was designed to handle only single rate FIR filters. The algorithm cannnot be directly applied to multirate filters, such as the high-rate interpolators and decimators which are needed to convert to and from the RF sampling frequency in modern systems. As a result, this critical filtering stage is typically relatively expensive in modern systems. In this paper, a technique for reducing the computational complexity of multirate filters through the use of the T&C FFA is presented. This technique is applicable to all multirate filters, but is particularly useful for high speed interpolators and decimators.
The output of a single-rate N-tap FIR filter with filter coefficients hi for an infinite length input sequence xn is expressed as
                                          y            n                    =                                    ∑                              i                =                0                                            N                -                1                                      ⁢                                                  ⁢                                          h                i                            ⁢                              x                                  n                  -                  i                                                                    ,                                  ⁢                  n          =          0                ,        1        ,        2        ,        …                            (        1        )            Unlike the single-rate filter of equation 1, an L-parallel FIR filter receives L input samples every clock cycle. Such an L-parallel filter can be expressed using a polyphase decomposition as given below
                                                        ∑                              i                =                0                                            L                -                1                                      ⁢                                                  ⁢                                                            Y                  i                                ⁡                                  (                                      z                    L                                    )                                            ⁢                              z                                  -                  i                                                              =                                    ∑                              j                =                0                                            L                -                1                                      ⁢                                                  ⁢                                                            X                  j                                ⁡                                  (                                      z                    L                                    )                                            ⁢                              z                                  -                  j                                            ⁢                                                ∑                                      k                    =                    0                                                        L                    -                    1                                                  ⁢                                                                  ⁢                                                                            H                      k                                        ⁡                                          (                                              z                        L                                            )                                                        ⁢                                      z                                          -                      k                                                        ⁢                                                                          ⁢                  where                                                                    ⁢                                  ⁢                                            X              j                        =                                          ∑                                  p                  =                  0                                ∞                            ⁢                                                          ⁢                                                z                                      -                    p                                                  ⁢                                  x                                      Lp                    +                    j                                                                                ,                                    H              k                        =                                          ∑                                  p                  =                  0                                                                      N                    L                                    -                  1                                            ⁢                                                          ⁢                                                z                                      -                    p                                                  ⁢                                  h                                      Lp                    +                    k                                                  ⁢                                                                  ⁢                and                                                    ⁢                                  ⁢                              Y            i                    =                                    ∑                              p                =                0                            ∞                        ⁢                                                  ⁢                                          z                                  -                  p                                            ⁢                              y                ⁡                                  (                                      Lp                    +                    i                                    )                                                                                        (        2        )            for i, j, k=0, 1, 2, . . . , L−1. For example, the standard 2-by-2 FFA described in [6] receives two input samples each clock cycle, as shown in FIG. 1a. This filter may be expressed mathematically as
                                                        Y              =                            ⁢                                                                    Y                    0                                    +                                                            z                                              -                        1                                                              ⁢                                          Y                      1                                                                      =                                                      (                                                                  H                        0                                            +                                                                        z                                                      -                            1                                                                          ⁢                                                  H                          1                                                                                      )                                    ⁢                                      (                                                                  X                        0                                            +                                                                        z                                                      -                            1                                                                          ⁢                                                  X                          1                                                                                      )                                                                                                                          =                            ⁢                                                {                                                                                    H                        0                                            ⁢                                              X                        0                                                              +                                                                  z                                                  -                          2                                                                    ⁢                                              H                        1                                            ⁢                                              X                        1                                                                              }                                +                                                      z                                          -                      1                                                        ⁢                                      {                                                                                            H                          0                                                ⁢                                                  X                          1                                                                    +                                                                        H                          1                                                ⁢                                                  X                          0                                                                                      }                                                                                                                          =                            ⁢                                                {                                                                                    H                        0                                            ⁢                                              X                        0                                                              +                                                                  z                                                  -                          2                                                                    ⁢                                              H                        1                                            ⁢                                              X                        1                                                                              }                                +                                                                                                      ⁢                                                z                                      -                    1                                                  ⁢                                  {                                                                                    (                                                                              H                            0                                                    +                                                      H                            1                                                                          )                                            ⁢                                              (                                                                              X                            0                                                    +                                                      X                            1                                                                          )                                                              -                                                                  H                        0                                            ⁢                                              X                        0                                                              -                                                                  H                        1                                            ⁢                                              X                        1                                                                              }                                                                                        (        3        )            
The filter of equation (3) can be represented in matrix form as given below.y2=qs,2×gs,2×hs,2×ps,2×x2  (4)where
            p              s        ,        2            T        =          [                                    1                                1                                0                                                0                                1                                1                              ]        ,          ⁢            q              s        ,        2              =          [                                    1                                0                                D                                                              -              1                                            1                                              -              1                                          ]        ,D=z−1, gs,2=I3, which is the 3×3 identity matrix, hs,2=ps,2×[H0 H1]T, y2=[Y0 Y1]T, and x2=[X0 X1]T. Note that in this paper, the superscriptT denotes transposition of a matrix and the subscript s indicates that the matrix in question refers to a standard FFA.
From equation (4), it is apparent that an implementation of the 2-by-2 standard FFA requires 3 subfilters (from subfilter matrix hs,2), one pre-adder (from pre-add matrix ps,2), and three post-adders (from post-add matrix qs,2). Assuming a symmetrical original prototype filter, only one of the FFA subfilters (H0+H1) is symmetrical.
In [9], T&C presented an alternate 2-by-2 FFA which is specifically defined for the case of a symmetrical prototype filter. The 2-by-2 T&C FFA, which is shown in FIG. 1b, is mathematically expressed as
                    Y        =                              {                                          1                /                                  2                  ⁡                                      [                                                                                            (                                                                                    H                              0                                                        +                                                          H                              1                                                                                )                                                ⁢                                                  (                                                                                    X                              0                                                        +                                                          X                              1                                                                                )                                                                    +                                                                        (                                                                                    H                              0                                                        -                                                          H                              1                                                                                )                                                ⁢                                                  (                                                                                    X                              0                                                        -                                                          X                              1                                                                                )                                                                                      ]                                                              -                                                H                  1                                ⁢                                  X                  1                                            +                                                z                                      -                    2                                                  ⁢                                  H                  1                                ⁢                                  X                  1                                                      }                    +                                    z                              -                1                                      ⁢                          {                              1                /                                  2                  ⁡                                      [                                                                                            (                                                                                    H                              0                                                        +                                                          H                              1                                                                                )                                                ⁢                                                  (                                                                                    X                              0                                                        +                                                          X                              1                                                                                )                                                                    -                                                                        (                                                                                    H                              0                                                        -                                                          H                              1                                                                                )                                                ⁢                                                  (                                                                                    X                              0                                                        -                                                          X                              1                                                                                )                                                                                      ]                                                              }                                                          (        5        )            
Equation 5 may be expressed in matrix form asy2=qc,2×gc,2×hc,2×pc,2×x2  (6)where
            p              c        ,        2            T        =          [                                    1                                1                                0                                                1                                              -              1                                            1                              ]        ,          ⁢            q              c        ,        2              =          [                                    1                                1                                              (                                                -                  1                                +                D                            )                                                            1                                              -              1                                            0                              ]        ,gc,2=diag[½ ½ 1], and hc,2=pc,2×[H0 H1]T. In this paper, diagonal matrices such as gc,2 are represented using the notation diag[rowval], where rowval is the set of values on the main diagonal. The subscript c is used to indicate that a matrix belongs to a T&C FFA.
Equation 6 indicates that an implementation of the 2-by-2 T&C FFA requires 3 subfilters, two pre-adders, and four post-adders, which is an increase of two adders compared to the standard FFA. However, two of the FFA subfilters, (H0+H1) and (H0−H1), are symmetrical.
Similarly, the matrix forms of the 3-by-3 standard and T&C FFAs are given in equations (7) and (8), respectively.y3=qs,3×gs,3×hs,3×ps,3×x3  (7)where
            p              s        ,        3            T        =          [                                    1                                0                                0                                1                                0                                1                                                0                                1                                0                                1                                1                                1                                                0                                0                                1                                0                                1                                1                              ]        ,          ⁢            q              s        ,        3              =          [                                    1                                              -              D                                                          -              D                                            0                                D                                0                                                              -              1                                                          -              1                                            D                                1                                0                                0                                                0                                2                                0                                              -              1                                                          -              1                                            1                              ]        ,gs,3=I6, hs,3=ps,3×[H0H1H2]T, y3=[Y0 Y1 Y2]T, and x3=[X0 X1 X2]T.y3=qc,3×gc,3×hc,3×pc,3×x3  (8)where
            p              c        ,        3            T        =          [                                    1                                1                                0                                1                                1                                1                                                1                                              -              1                                            1                                1                                0                                0                                                0                                0                                0                                1                                1                                              -              1                                          ]        ,          ⁢            q              c        ,        3              =          [                                                  1              -              D                                                          1              +              D                                                                          -                1                            -              D                                            D                                                              -                2                            ⁢              D                                            0                                                              1              -              D                                                                          -                1                            -              D                                            D                                0                                D                                D                                                0                                0                                1                                0                                1                                              -              1                                          ]        ,gc,3=diag[½ ½ 1½ ½], and hc,3=pc,3×[H0 H1 H2]T.
The above equations indicate that an implementation of the 3-by-3 standard FFA requires six subfilters, three pre-adders, and seven post-adders. In contrast, the 3-by-3 T&C FFA requires six subfilters, five pre-adders, and twelve post-adders. As with the 2-by-2 FFAs already discussed, the advantage of the 3-by-3 T&C FFA is that it yields a greater number of symmetrical subfilters: four, as opposed to two in the case of the standard 3-by-3 FFA.
It is possible to construct a composite L-by-L FFA by cascading multiple stages of 2-by-2 and/or 3-by-3 FFAs. In this scenario, L=Πi=0W−1li, where li=2 or 3 represents the FFA order of the i'th stage of the decomposition and W represents the total number of stages in the decomposition. To illustrate, consider decompsing an N-tap FIR filter to generate a 6-by-6 FFA. The first cascading stage applies a 2-by-2 FFA (l0=2) on the FIR filter which results in three subfilters of length N/2. The second cascading stage applies a 3-by-3 FFA (l1=3) to decompose each subfilter from the first stage into six subfilters, each of length N/6. Thus, the 6-by-6 FFA has a total of 18 subfilters, each of length N/6.
In general, an L-by-L standard FFA may be expressed mathematically as:Yp=QsHsPsXp  (9)where Xp and Yp are permuted versions of Y=[Y0 Y1 . . . YL-1]T and X=[X0 X1 . . . XL-1]T. The permutation process is described in detail in [6]. Ps, Hs, and Qs, are general pre-add, subfilter, and post-add matrices, which are described by equations (10-12).
                              P          s                =                                            p                              s                ,                                  l                  0                                                      ⊗                          p                              s                ,                                  l                  1                                                      ⊗            …            ⊗                          p                              s                ,                                  l                                      W                    -                    2                                                                        ⊗                          p                              s                ,                                  l                                      W                    -                    1                                                                                ⁢                                          ⁢          …                                    (        10        )                                                      H            s                    =                      diag            ⁡                          (                                                P                  s                                ⁢                H                            )                                      ⁢                                  ⁢        and        ⁢                                  ⁢                  H          =                                    [                                                                                          H                      0                                                                                                  H                      1                                                                            …                                                                              H                                              L                        -                        1                                                                                                        ]                        T                                              (        11        )                                                      Q            s                    =                                    ∏                              i                =                0                                            W                -                1                                      ⁢                                                  ⁢                          B                              s                ,                                  W                  -                  1                  -                  i                                                                    ,                            (        12        )            where Bs,i, which represents the post-add matrix at cascade stage i, is equal to
                    I                  l                      F            0                              ⊗              I                  l                      F            1                              ⊗      …        ⁢                  ⁢                  I                  F                      l                          i              -              1                                          ⊗              Q                  s          ,          i                      ,      F    i    ,the number of subfilters generated by an li-by-li FFA, is 3 for li=2 and 6 for li=3. Qs,i is an expanded version of qs,li, where each element in the original matrix is replaced by an m-by-m matrix, m=L/(Πn=0iln). The rules for this replacement are as follows: each 1 is replaced by Im, each 0 is replaced by 0m, and each z−1 is replaced by the m-unfolded version of z−1. This type of transformation will be seen repeatedly in this paper, and will be referred to as an m-by-m delay unfolding transformation.
The L-by-L T&C FFA applies the same cascading process but uses both standard and T&C basic FFA structures. The reason both the standard and T&C basic FFA structures are used is that applying the T&C FFA on non-symmetrical filters does not produce symmetrical subfilters. Thus, there is no benefit to the additional cost of the T&C FFA when decomposing any subfilters that are not symmetrical. To minimize the overall system cost, it is prudent to use the T&C decomposition on only the symmetrical subfilters and the standard FFA decomposition on the non-symmetrical subfilters.
The literature [9] discusses the process of constructing an L-by-L T&C FFA, but does not provide an explicit mathematical model for the resulting filter. In this document, one such model is presented for the proposed FFA-based multirate architecture. This model can be easily generalized for the single rate T&C FFA.