When building Digital-to-Analog Converters (DACs), there are many techniques that may be used. Depending upon the characteristics of the digital signal being recreated in the analog domain and on the required system characteristics, a DAC designer is free to choose from among all these techniques in order to best suit their application.
The generic DAC system shown in FIG. 1 demonstrates a number of features that are common to most systems that involve a DAC. An N-bit digital input signal and a sample clock are driven into the actual DAC (180). The output from the DAC is a discrete-time analog signal, where the digital signal has been represented in the analog domain by a series of discrete signal levels that change at discrete times governed by the sample clock. This discrete-time analog signal is then filtered by an analog reconstruction filter (190) to produce a continuous-time analog output. Depending upon the application, the DAC output can be represented in a number of different domains, including (but not limited to) voltage, current, charge, and pulse width. Depending upon the application, the reconstruction filter complexity and construction can vary widely, and in certain applications may not exist; in these applications the system does not require a continuous-time analog output but instead operates directly from the discrete-time output of the DAC.
Three of the most important characteristics of a system are the DAC resolution (the width of the digital sample bus), the DAC sample rate (the frequency at which the data is updated), and the signal bandwidth (what fraction of the available bandwidth the sample rate allows that the signal actually occupies).
The DAC resolution is fundamental in that it determines what the Quantization Noise (QN) of the signal is, and in turn the QN sets a fundamental floor on how accurate the representation of an ideal signal by digital approximation is. In general, the resolution of a DAC at its input is often expressed by the digital signal bit width, and the resolution at the DAC output is expressed in Least Significant Bits, or LSBs. One LSB is the minimum value that a signal at the discrete time output from a DAC can differ from any other signal at any other time. The QN is, in general, a uniform error with a width equal to one LSB, and appears as a white (i.e. flat) noise source in the frequency domain. A DAC with a larger number of bits at its input will have a higher resolution and therefore a lower QN.
Nyquist Vs. Oversampled DAC Systems
One common way of categorizing DAC systems is based on the signal bandwidth; in a “Nyquist” system the signal bandwidth can be as large as half the sample rate (i.e. the Nyquist bandwidth), however in many other systems typically referred to as “Oversampled” systems the signal bandwidth is smaller (and often much smaller, perhaps as low as 1/100 or 1/1000) of the Nyquist bandwidth.
There are many reasons to construct Oversampled DAC systems, many of which are well beyond the scope of the present disclosure, however one of the most important reasons in recent years has been to trade off sample rate for digital resolution. Practically speaking, this means that for a constant signal bandwidth, it is possible to trade off resolution in amplitude (i.e. digital resolution) for resolution in time (i.e. sample rate), allowing systems with higher sample rates to have lower digital resolution and yet have the same accuracy.
For Nyquist systems with a sample rate FDACNYQUIST with N-bit quantization, the Signal-to-Noise Ratio (SNR) due to Quantization, also known as Signal-to-Quantization Noise Ratio (SQNR), over the full Nyquist signal band FDACNYQUIST/2 for a full-scale sine wave signal is well known to be given by Equation 1:SQNRNYQUIST=1.76 dB+N·6.02 dB  Equation 1
If a Nyquist DAC were to run at a faster rate, M·FDACNYQUIST, the SQNR would remain constant, with the QN spread across a wider bandwidth M·FDACNYQUIST/2. If instead this DAC were treated as an Oversampled DAC with a sample rate M·FDACNYQUIST (the factor M is also known as the Oversample Ratio or OSR) the resulting SQNR over the original signal bandwidth FDACNYQUIST/2 is given by Equation 2:
                              SQNR          OVERSAMPLED                =                              1.76            ⁢                                                  ⁢            dB                    +                                                    (                                  N                  +                                      M                    4                                                  )                            ·              6.02                        ⁢                                                  ⁢            dB                                              Equation        ⁢                                  ⁢        2            
In other words, if an Oversampled DAC can be run with an OSR of 4, it can operate with one fewer bit on the digital input signal than the equivalent Nyquist DAC. For many DAC circuit architectures, reducing the digital input by one bit can result in a reduction in circuit complexity by half, therefore the higher the OSR the easier the DAC is to build.
Another advantage to building an Oversampled DAC instead of a Nyquist DAC is in the simplification of the reconstruction filter 180. In the frequency domain, the discrete-time analog signal output from the DAC will have copies of the digital input signal located around P·FDAC, where P are all possible integers, both positive and negative. Usually, only one of these copies (often P=0, i.e. around DC) is desired and the others are undesirable “images”, and the reconstruction filter is used to attenuate all images while passing the desired signal to the analog output. The closer in frequency the desired signal is to its immediate neighbor images, the harder (or more expensive) the reconstruction filter is to design, requiring either a high-order design, high-accuracy and/or high quality components, or both. For Oversampled DAC systems, the higher the OSR the further apart the images are in frequency from the desired signal, therefore the lower the reconstruction filter complexity (and cost) will be.
As digital circuit complexities have grown significantly over the last several decades, minimum transistor drawn sizes have shrunk, voltage rails have dropped, and transistor speeds have increased. These smaller transistors and lower voltage rails have, in general, resulted in poorer analog performance for any block that is integrated onto the same die as a digital circuit. This reduced analog performance has made the design of the DAC and reconstruction filter harder. At the same time, however, the increase in transistor speed has made Oversampled DAC structures easier to build, somewhat compensating for the poorer analog performance.
Quantization Noise Shaping
Equation 2 describes the SQNR of an Oversampled DAC that comes only from increasing sample rate. Beyond this, SQNR can be further improved by also doing Quantization Noise Shaping (QNS). In QNS systems, discrete-time filters are wrapped around a quantizer, and as a result the Quantization Noise can be filtered (or “Shaped”) to reduce its amplitude in the signal band.
One particular form of QNS is the Delta-Sigma Modulator, also variously known as a DSM, Δ-Σ Modulator, ΔΣM, Sigma-Delta Modulator, SDM, Σ-Δ Modulator, or ΣΔM. A DSM is shown in FIG. 2. A high-resolution input signal X is combined with the output Y of a quantizer (240) through two discrete time filters (a Feed-Forward Filter FDSM(z) 210 and Feed-Back Filter GDSM(z) 220) and an addition block (230), and fed as input into the quantizer. The quantizer is a nonlinear element, producing an output that has reduced amplitude resolution (i.e. fewer number of discrete signal levels) compared to its input. The input can itself be quantized (i.e. it has a number of discrete signal levels) or it can be a continuous (i.e. analog) input. In the extreme situation, the quantizer can be implemented as a simple slicer, producing a single bit output. In order to analyze a DSM, a simplifying assumption is often made, replacing the quantizer with an additional adder and a uniform QN source (also known as the “Quantization Error” or E), which results in the small signal model shown in FIG. 3. For clarity, FIG. 3 and all succeeding figures use the same numeric identifiers to identify components that are in common with earlier figures, and furthermore will use similar numbers to identify similar components.
FIG. 3 shows how the quantization error E is added (using the addition block 340) to the system in place of the quantizer 240 of FIG. 2. This substitution converts the nonlinear circuit of FIG. 2 into a linear model that we can analyze in Equation 3:
                                          Y            =                                                                                                                              F                        DSM                                            ⁡                                              (                        z                        )                                                                                    1                      -                                                                                                    F                            DSM                                                    ⁡                                                      (                            z                            )                                                                          ·                                                                              G                            DSM                                                    ⁡                                                      (                            z                            )                                                                                                                                ·                  X                                -                                                      1                                          1                      -                                                                                                    F                            DSM                                                    ⁡                                                      (                            z                            )                                                                          ·                                                                              G                            DSM                                                    ⁡                                                      (                            z                            )                                                                                                                                ·                  E                                            =                                                                                          STF                      DSM                                        ⁡                                          (                      z                      )                                                        ·                  X                                +                                                                            NTF                      DSM                                        ⁡                                          (                      z                      )                                                        ·                  E                                                              ⁢                                          ⁢                                    STF              DSM                        ⁡                          (              z              )                                =                                                    F                DSM                            ⁡                              (                z                )                                                                    1                -                                                                            F                      DSM                                        ⁡                                          (                      z                      )                                                        ·                                                            G                      DSM                                        ⁡                                          (                      z                      )                                                                                  ⁢                                                                                  ⁢                                  ⁢                                            NTF              DSM                        ⁡                          (              z              )                                =                      1                          1              -                                                                    F                    DSM                                    ⁡                                      (                    z                    )                                                  ·                                                      G                    DSM                                    ⁡                                      (                    z                    )                                                                                                          Equation        ⁢                                  ⁢        3            
Equation 3 introduces two new terms, the Signal Transfer Function STFDSM(z) and the Noise Transfer Function NTFDSM(z), which are the filters that the signal input X and error input E see at the output Y respectively.
If a DSM has a STFDSM(z) greater than NTFDSM(z) (i.e. F(z)>1) in a certain frequency band, the SQNR due to QNS in in this band will be greater than that of an Oversampled system that does not use QNS. Note that this observation holds no matter what the form Equation 3 takes; while many QNS systems are low-pass and are used to suppress QN at low frequencies (close to DC, sometimes referred to as Baseband or BB), it is just as valid to build a band-pass QNS system that suppresses QN in a narrow band around another frequency, potentially a Radio Frequency (RF) or an Intermediate Frequency (IF).
In general the order of NTFDSM(z) will determine how much QN will be suppressed and therefore how much the SQNR is improved for a given OSR, and in general a higher-order NTFDSM (z) will have a better SQNR than a lower-order one. However at the same time, the nonlinearities of the quantizer mean high-order NTFDSM (z)'s often end up being unstable, which makes for a challenging design task and often results in a severely limited input signal range, especially for a DSM that uses a 1-bit quantizer.
An alternative QNS system known as an Error Feedback Modulator (EFM), is shown in FIG. 4. As with a DSM, an EFM uses the quantizer 240 to produce a low-resolution output Y, however where a DSM feeds Y back to be combined with the X, an EFM uses an additional subtraction block 450 to calculate the quantization error and then feeds this back instead. As with the DSM, the EFM uses a Feed-Forward Filter FEFM(z) 410 and Feed-Back Filter GEFM(z) 420, however the filter designs tend to be different between the two blocks.
As with the DSM, the small-signal model of an EFM is created by replacing the quantizer with E and an addition element 540, as shown in FIG. 5, and as with the DSM, this model can be solved as shown in Equation 4. As with Equation 3, any frequency band where Equation 4's STFEFM(z) is greater than NTFEFM (z) will have an increased SQNR compared with an Oversampled system that does not use QNS.Y=FEFM(z)·X+(1+FEFM(z)·GEFM(z))·E =STFEFM(z)·X+NTFEFM(z)·E STFEFM(z)=FEFM(z)NTFEFM(z)=1+FEFM(z)·GEFM(Z)  Equation 4
Comparing Equation 3 with Equation 4, the forms that STFEFM(z) and NTFEFM(z) take are very different compared to STFDSM(z) and NTFDSM(z). In order for a low-pass DSM to pass signals around DC, FDSM(z) must take the form of an integrator with a large gain at DC, while for an EFM, FEFM(z) takes the form of a flat gain (perhaps even unity) across all frequencies. This difference, in turn, means that STFDSM(z) will not be flat across all frequencies and will tend to have signal amplitude droops close to the band edges, whereas an EFM will have a flat STFEFM(z). Finally, because FDSM(z) for a high-SQNR DSM takes the form of an high-order integrator, the DSM stability challenge is significantly harder than for an EFM.
QNS can be applied both to Analog/Digital Converters (ADCs) and Digital/Analog Converters, however ADCs tend to be implemented using the DSM structure, whereas DACs tend to be implemented using the EFM structure. This is because in an ADC, the QNS is a full analog system and the EFM error subtraction block 450 is an extremely challenging block to construct, whereas in a DAC, QNS is implemented using a full Digital Signal Processing (DSP) system and the error subtraction block is trivial.
In addition to applications in ADC and DAC systems, QNS also finds use in several other applications, such as in Fractional-N based frequency synthesis and in network timing jitter control. Applications in these areas and in others are beyond the scope of this disclosure; however the underlying implementations of QNS tend to be similar.
Multi-Bit Error Feed-Back Modulation DAC Systems
For Audio DAC applications, where the signal frequency content goes from DC to approximately 20 kHz, many existing DAC systems use a high-order QNS 1-bit DAC operating at sample rates comprised between 2 and 20 MHz, resulting in OSR's of 100 to 1000 or more. However for DSP modulation based communication systems, such as Digital Subscriber Line (DSL), and various RF technologies (such as WiFi, Cellular RF, WPAN, and more) the signal frequency content can be significantly higher, up to 100's of MHz or potentially even GHz, and it is simply impractical to build DACs with such extremely high OSRs. As a result, all such systems are built using multi-bit DACs. If the application and technology support it, these DACs may be built as Oversampled DACs and take advantage of QNS to further improve their resolution and reduce their SQNR while reducing the complexity of the analog portions of the designs. In addition to QNS, there are several additional techniques that a DAC system designer will likely use to minimize the effects of circuit non-idealities on the output signal's SNR, however these are beyond the scope of this disclosure.
A practical EFM Multi-Bit DAC system is shown in FIG. 6. A high-resolution digital input X is fed through a modified EFM structure to produce a lower-resolution intermediate digital signal Y, which in turn drives the DAC 680 and Reconstruction Filter 690 to produce the analog output signal. Comparing this figure to the generic EFM of FIG. 4, the Feed-Forward Filter 210 is removed to produce a flat STFEFM(z), leaving just the DSP Feed-Back Filter 620. The Quantizer 240 and Subtractor 450 are implemented with two nonlinear operators, an “MSB” operator 640 and an “LSB” operator 650, which trivially split the output from the adder 630 into two portions, one consisting of a number of the Most Significant Bits (MSB's, often referred to as “integer” bits) which will go to the output and the second consisting of the remaining Least Significant Bits (LSB's, often referred to as “fractional” bits) which form the digital error feed-back signal. In order to keep the width of the digital busses within the EFM under control, a second set of MSB/LSB operators (641 and 651) are often used to split X into integer and fractional portions (XINT and XEFM), allowing the EFM to only operate on the fractional bits XEFM. This in turn requires a final adder 631 to combine XINT with the output from the EFM, YEFM, to produce the output Y.
Implementing QNS elements (specifically elements 620, 630, 640, and 650) for the system of FIG. 6 is described below.
Efficient DSP Filters Using Sum-of-Products Structures
One of the most important structures in most DSP filters is a so-called Sum-Of-Products or SOP structure, and the implementation of SOP structures often determines the area and power of a DSP block. The most important SOP structure variant involves constant multipliers of a number of delayed versions of an input signal, described by Equation 5:
                                                        Y              =                            ⁢                                                                    B                                          0                      ,                      0                                                        ·                  X                                +                                                      B                                          0                      ,                      1                                                        ·                  X                  ·                                      z                                          -                      1                                                                      +                                                                                                      ⁢                                                                    B                                          0                      ,                      2                                                        ·                  X                  ·                                      z                                          -                      2                                                                      +                …                ⁢                                                                  +                                                      B                                          0                      ,                      N                                                        ·                  X                  ·                                      z                                          -                      N                                                                                                                                              =                            ⁢                              X                ·                                                      ∑                                          i                      =                      0                                        N                                    ⁢                                                            B                                              0                        ,                        i                                                              ·                                          z                                              -                        i                                                                                                                                                                                                      H                  0                                ⁡                                  (                  z                  )                                            =                            ⁢                                                Y                  X                                ⁢                                                                            =                      ∑                                                              i                      =                      0                                        N                                    ⁢                                      B                                          0                      ,                      i                                                        ·                                      z                                          -                      i                                                                                                                              Equation        ⁢                                  ⁢        5            
The filter H0(z) described in Equation 5 is a Finite Impulse Response (FIR) filter; Infinite Impulse Response (IIR) filters may also be implemented using SOP structures and are discussed further below, however it is easier to discuss efficient implementation techniques of FIR filters first. FIGS. 7A and 7B show the two most common SOP FIR Filters. FIG. 7A is the Direct Form I (DF-I) structure, which uses a series of delay elements 710 to create multiple delayed versions of the input X, each of which is multiplied by a constant factor using the multiplication elements 720, then added together with the adder elements 730 to produce the output Y. The SOP structure is readily seen to be built from the multiplication and addition elements. FIG. 7B is the Direct Form II (DF-II) structure, which drives all multiplication elements with X (instead of delayed versions of X as in the DF-I structure) and places delay elements 712 into output adder chain instead. DF-I and DF-II structures are mathematically identical, but create separate implementation challenges.
Efficient SOP implementations focus on using low-cost (both in area and power) implementations for the multiplication elements 720 and the addition elements 730, and in almost all cases involve merging them together into a single structure. Efficient multiplication is normally accomplished using Canonical Signed Digit (CSD) techniques, replacing arbitrary multiplication operations by a series of additions and subtractions of power-of-2 factors, which can be implemented very efficiently by shifting the bits of the input left or right. Efficient addition is normally performed by minimizing the number of carry propagate operations and by combining multiple additions together using Carry Save Arithmetic (CSA) techniques, which produce a redundant output form that requires a final Carry Propagate Adder (CPA) to produce the final output.
FIGS. 8A and 8B show DF-I and DF-II FIR structures using CSD and CSA techniques. The CSD elements 820 replace the multiplication elements 720, producing several shifted (i.e. multiplied by a power of two) versions of their inputs. In the DF-I structure of FIG. 8A, all CSD outputs are added together using a single very wide CSA structure 830, producing a redundant partial sum that is combined by the CPA 835 to produce Y. In the DF-II structure of FIG. 8B, double-wide delay elements 812 propagate the redundant outputs from multiple fewer-input (i.e. narrower) CSA structures 831, which eventually produce a redundant partial sum that is combined by the CPA 835 to produce Y. In addition to the use of CSD and CSA techniques, there are several additional area and power optimizations that come from combining common CSD/CSA sub-expressions which further improve implementation efficiency however these optimizations are well beyond the scope of the present disclosure.
Which structure, DF-I or DF-II, is most efficient depends heavily on the situation in which they are being used. In the DF-I structure, the critical path from X (or from the outputs of the delay elements 710) to Y passes through the CSD elements 820, the CSA 830, and the CPA 835. The CSD elements, consisting only of wires, are extremely fast. The CSA, because it does not need to propagate carry bits, is fast with relatively shallow logic cones, however the CPA is generally either slow because of deep logic cones for the simplest adder structures or it has a large gate count (and therefore a large area and power) when using faster and more advanced adder structures. At high clock rates, meeting digital timing through the CSD/CSA/CPA combination can be quite challenging, requiring large areas and/or high power dissipation. A common solution to the problem, shown in FIG. 9, is to insert an explicit pipeline register (an additional double-wide delay element) 913 between the CSA and CPA, allowing more time for the CSD/CSA structure and the CPA to evaluate. In extreme situations, meeting timing may even require the insertion of additional pipeline registers inside the CSA and/or the CPA, effectively increasing the pipeline delay 913. These additional pipeline registers add latency through the filter, which may have to be accounted for elsewhere in the system.
As with the DF-I structure, the DF-II structure's critical path is through the CSD elements 820, the CSA elements 831, and the CPA 835. As with the DF-I structure, pipeline registers can be inserted between the final CSA and the CPA, and they can also be inserted into the CPA. Because the DF-II CSA elements 831 are narrower than the DF-I CSA element 830, they tend to be faster. However unlike the DF-I structure, it is impossible to add pipeline registers into the CSA because these registers change the filter response, not just change the latency. At the same time, the double-wide delay elements 812 tend to increase both area and power, and DF-II structures also tend to have fewer available optimizations due to common CSD/CSA sub-expressions than do their DF-I equivalents. Finally, DF-II structures don't lend themselves well to parallel DSP implementations which will be discussed later. As a result of all these, DF-II structures tend not to be used at the highest clock rates, therefore the remainder of this disclosure will focus on DF-I structures.
IIR filters, in which the output is a function of previous outputs in addition to the inputs, can also be implemented using SOP structures. An IIR filter is described by Equation 6 and a DF-I structure that implements H0(z) from Equation 6 is shown in FIG. 10. Comparing this structure to FIG. 7A, the FIR (B0,i) portion appears in the delay elements 1010, the multiplication elements 1020, and the addition elements 1030. The IIR (A0,i) portion which feeds the Y output back into the filter, appears as delay elements 1011, multiplication elements 1021, and re-uses addition elements 1030.
                                                        Y              =                            ⁢                                                                    B                                          0                      ,                      0                                                        ·                  X                                +                                                      B                                          0                      ,                      1                                                        ·                  X                  ·                                      z                                          -                      1                                                                      +                                                                                                      ⁢                                                                    B                                          0                      ,                      2                                                        ·                  X                  ·                                      z                                          -                      2                                                                      +                …                ⁢                                                                  +                                                      B                                          0                      ,                      N                                                        ·                  X                  ·                                      z                                          -                      N                                                                      +                                                                                                      ⁢                                                                    A                                          0                      ,                      1                                                        ·                  Y                  ·                                      z                                          -                      1                                                                      +                                                      A                                          0                      ,                      1                                                        ·                  Y                  ·                                      z                                          -                      2                                                                      +                …                ⁢                                                                  +                                                      A                                          0                      ,                      N                                                        ·                  Y                  ·                                      z                                          -                      N                                                                                                                                              =                            ⁢                                                X                  ·                                                            ∑                                              i                        =                        0                                            N                                        ⁢                                                                  B                                                  0                          ,                          i                                                                    ·                                              z                                                  -                          i                                                                                                                    +                                  Y                  ·                                                            ∑                                              i                        =                        1                                            N                                        ⁢                                                                  A                                                  0                          ,                          i                                                                    ·                                              z                                                  -                          i                                                                                                                                                                                                            H                ⁡                                  (                  z                  )                                            =                            ⁢                                                Y                  X                                =                                                                            ∑                                              i                        =                        0                                            N                                        ⁢                                                                  B                                                  0                          ,                          i                                                                    ·                                              z                                                  -                          i                                                                                                                          1                    -                                                                  ∑                                                  i                          =                          1                                                N                                            ⁢                                                                        A                                                      0                            ,                            i                                                                          ·                                                  z                                                      -                            i                                                                                                                                                                                                      Equation        ⁢                                  ⁢        6            
Similar to FIG. 8A, FIG. 11 shows a DF-I IIR filter implemented using CSD and CSA techniques. As before, the generic multipliers 1020 and 1021 are replaced with CSD multiplication elements 1120 and 1121, whose multiple outputs are added together with the wide CSA adder 1130 and the CPA 1135.
As with FIG. 8A, the structure in FIG. 11 has a critical timing path, originating from X or from one of the delay units 1010 or 1011, going through one of the CSD multiplication elements 1120 or 1121, through the CSA 1130 and finally the CPA 1135. In order to better meet timing, a pipeline register can be added between the CSA and CPA, as is shown in FIG. 12. Similar to FIG. 9, the pipeline register (a double-wide delay element 1213) is added between the CSA and CPA, and the feedback to the multiplication elements is taken from the newly-delayed redundant output. The single-wide delay elements 1011 are replaced with double-wide delay elements 1211 and the feedback multiplication CSD elements 1221 are modified to use the redundant feedback. Finally, in order to keep the filter transfer function H0(z) identical, one of the feedback delay elements is removed, effectively replaced by the pipeline register 1213. As with FIG. 9, this pipeline register will increase the filter latency, and this may need to be accounted for elsewhere in the system.
If the structure in FIG. 12 still has a critical timing path that is too long, the designer can't simply increase the pipeline delay 1213 as is possible with the similar FIR structure because this will modify the filter response. Instead, the filter may be unrolled with the recurrence relation in Equation 7, which has a starting point H0(z) from Equation 6. All succeeding forms of Hj(z) are built by “unrolling” Hj-1(z) one clock cycle in order to calculate the previous version of Y, then applying substitution and simplification rules, taking advantage of the fact that standard addition and multiplication operators are both commutative and distributive. Each unrolling operation increases the length of the FIR (Bj,i) portion by one tap and increases the order of the exponent of z−1 in the IIR (Aj,i) portion by one, therefore increasing the allowed latency in the feedback path. There are other similar transformations that also allow increased latency in the feed-back loop of an IIR filter, but given their net effect is the same there is no reason to discuss them in this disclosure. There are several practical considerations for coefficient sensitivity and noise amplification that should also be considered when unrolling an IIR filter, however these are beyond the scope of the present disclosure.
                                                                                                              Y                    =                                                                                            X                          ·                                                                                    ∑                                                              i                                =                                0                                                                                            N                                +                                j                                                                                      ⁢                                                                                          B                                                                  j                                  ,                                  i                                                                                            ·                                                              z                                                                  -                                  i                                                                                                                                                                    +                                                  Y                          ·                                                                                    ∑                                                              i                                =                                                                  1                                  +                                  j                                                                                                                            N                                +                                j                                                                                      ⁢                                                                                          A                                                                  j                                  ,                                  i                                                                                            ·                                                              z                                                                  -                                  i                                                                                                                                                                                        ⁢                                              |                                                  j                          ≥                          0                                                                                                                                                                                                                                                                                                                        B                                                              j                                ,                                i                                                                                      =                                                          {                                                                                                                                                                          B                                                                                                                        j                                          -                                          1                                                                                ,                                        i                                                                                                                                                                                                                        i                                      =                                      0                                                                                                                                                                                                                                                                                          B                                                                                                                              j                                            -                                            1                                                                                    ,                                          i                                                                                                                    +                                                                                                                        A                                                                                                                                    j                                              -                                              1                                                                                        ,                                            i                                                                                                                          ·                                                                                  B                                                                                                                                    j                                              -                                              1                                                                                        ,                                            i                                                                                                                                                                                                                                                                                                          i                                      =                                                                                                                        1                                          ⁢                                                                                                                                                                          ⁢                                          …                                          ⁢                                                                                                                                                                          ⁢                                          N                                                                                +                                        j                                        -                                        1                                                                                                                                                                                                                                                                                                                                A                                                                                                                              j                                            -                                            1                                                                                    ,                                          i                                                                                                                    ·                                                                              B                                                                                                                              j                                            -                                            1                                                                                    ,                                          i                                                                                                                                                                                                                                                                i                                      =                                                                              N                                        +                                        j                                                                                                                                                                                                                                                                                                                                                                                                                A                                                              j                                ,                                i                                                                                      =                                                          {                                                                                                                                                                                                                  A                                                                                                                              j                                            -                                            1                                                                                    ,                                          i                                                                                                                    +                                                                                                                        A                                                                                                                                    j                                              -                                              1                                                                                        ,                                            i                                                                                                                          ·                                                                                  B                                                                                                                                    j                                              -                                              1                                                                                        ,                                            i                                                                                                                                                                                                                                                                                                          i                                      =                                                                              1                                        +                                        j                                                                                                                                                                                                                                                                                                                                A                                                                                                                              j                                            -                                            1                                                                                    ,                                          i                                                                                                                    ·                                                                              B                                                                                                                              j                                            -                                            1                                                                                    ,                                          i                                                                                                                                                                                                                                                                i                                      =                                                                              N                                        +                                        j                                                                                                                                                                                                                                                                                                                      ⁢                                          |                                              j                        >                        0                                                                                                                                                                                                      H                  j                                ⁡                                  (                  z                  )                                            =                                                                                          ∑                                              i                        =                        0                                                                    N                        +                        j                                                              ⁢                                                                  B                                                  j                          ,                          i                                                                    ·                                              z                                                  -                          i                                                                                                                          1                    -                                                                  ∑                                                  i                          =                                                      1                            +                            j                                                                                                    N                          +                          j                                                                    ⁢                                                                        A                                                      j                            ,                            i                                                                          ·                                                  z                                                      -                            i                                                                                                                                              ⁢                                  |                                      j                    ≥                    0                                                                                                          Equation        ⁢                                  ⁢        7            
From a practical perspective, this means that if evaluating the CSA 1130 for H0(z) requires three clock cycles in order to meet timing, the filter can be unrolled twice, and H2(z) can be implemented as shown in FIG. 13. The CSA pipeline delay 1313 is increased from one cycle to three and the Feed-Forward and Feed-Back multiplication elements 1320 and 1321 are modified to use the unrolled filter values.
Efficient Parallel DSP Filters Using Sum-of-Products Structures
Even with the unrolled filter structure of FIG. 13, the maximum throughput of a DSP filter will be limited by the maximum implementable clock rate because this is a serial filter, operating on one input sample every clock cycle. For example, if a digital core can operate with a maximum feasible 1 GHz clock rate, the maximum sample rate of the filter is limited to 1 Gigasamples per second (i.e. 1 Gsps). In order to operate on higher sample rate signals (for example 8 Gsps), the DSP filter must be implemented using parallel techniques, i.e. operate on parallel blocks of data. Each data block consists of a number of successive input samples and produces a number of successive samples, and all operations are performed in parallel. The block width (often written as “P”) determines the degree of parallelization in the system, and is determined by the required improvement in throughput. For example, in the case of 8 Gsps operation with a 1 GHz clock, we require P=8. A naïve parallel implementation of FIG. 12 with P=2 (the minimum for “parallel” operation) is shown in FIG. 14. The increased complexity of this structure requires a significant change in drawing style compared to previous figures; however as before similar elements have been given similar identification numbers to make comparisons easier.
The two-wide input signal block consists of the signals X and X·z−1, while the two-wide output signal block consists of the signals Y and Y·z−1. Delayed versions of the input signal block are produced by the delay elements 1410, which take the form z−2 reflecting the fact that each clock edge delays the signal by the block width (P=2). Similarly-delayed versions of the output signal block (in redundant form) are produced by the delay elements 1411. Appropriately-delayed versions of the input and outputs are fed into two identical merged CSD/CSA structures 1430 and 1431. Each CSD/CSA structure combines elements 1120, 1221, and 1230 and embodies the required Sum-of-Products structure in order to implement H0(z). Finally, the redundant CSD/CSA outputs are combined by two CPA structures 1435 to produce the output signal block. As before, pipeline delay elements can be added to the CPA structures in order to improve timing, however pipeline delay inserted into either CSD/CSA structure will change the filter response. Higher parallelization factors (P>2) can be readily constructed by interconnecting multiple CSA/CSA structures with delay units z−P in a manner similar to FIG. 14.
The throughput of this naïve parallel structure is no better than the serial structure because the critical path goes through both CSD/CSA structures: the output from 1430 feeds directly into the A0,1 input of 1431. As a result, the critical path of FIG. 14 is twice as long as in FIG. 12, which means that its maximum clock rate is cut in half, which cancels the throughput improvement that would otherwise be achieved by using a parallel structure.
The structure in FIG. 15 avoids this limitation by replacing the CSD/CSA structure 1431 which implements H0(z) with the structure 1531 which implements H1(z), i.e. the once-unrolled version of H0(z) described by Equation 7. This removes the A0,1 input (replacing it with A1,N+1 and B1,N+1) and as a result the output from 1430 no longer feeds directly into 1531. Instead, one of the delay registers 1411 appears between 1430 and 1531 which cuts the critical path in half and approximately doubles the throughput. If the CSD/CSA structures have difficulty meeting timing, both 1430 and 1531 can be unrolled twice to produce H2(z) and H3(z), which in turn allows an increased latency of two samples in the feed-back loop, meaning that an additional pipeline delay register z−2 can be inserted into the CSD/CSA structures. Additional pipeline registers can be inserted by unrolling the filter further.
In general, it is possible to transform a serial IIR filter H0(z) implemented with a structure of FIG. 12 into a parallel equivalent with a block size P by applying Equation 7 to produce P−1 delayed versions of H0(z). The P filters H0(z) through HP-1(z) are then implemented using efficient CSD/CSA Sum-of-Products implementations and connected with delay registers (with values z−P) in a manner similar to FIG. 15, with the end result that the resulting parallel filter has P times the throughput of the original serial filter. For block sizes P>2, the drawing of the resulting filter will be significantly more complex, however it is nonetheless readily derived from FIG. 15.
Error Feed-Back Modulators Implementation with Sum-of-Products Structures
Similar to serial implementations of DSP filters, serial implementations of Error Feed-Back Modulators are limited by the maximum feasible clock rate of the DSP block.
Assuming that the desired STFEFM(z) is given by Equation 8, FIG. 16 shows a practical serial implementation of an EFM using an SOP structure with CSD/CSA/CPA techniques. Note that even though STFEFM(z) takes the form of an FIR filter, Equation 8 uses the A0,i notation of an IIR filter (not the B0,i notation of an FIR filter) to emphasize the fact that the actual EFM implementation places the SOP structure in a feed-back loop.
                                                                        STF                EFM                            =                              1                +                                                      G                    EFM                                    ⁡                                      (                    z                    )                                                                                                                          =                              1                +                                                      ∑                                          i                      =                      1                                        N                                    ⁢                                                            A                                              0                        ,                        i                                                              ·                                          z                                              -                        i                                                                                                                                                                    =                              1                +                                                      z                                          -                      1                                                        ·                                                            ∑                                              i                        =                        1                                            N                                        ⁢                                                                  A                                                  0                          ,                          i                                                                    ·                                              z                                                  -                                                      (                                                          i                              -                              1                                                        )                                                                                                                                                                                                      Equation        ⁢                                  ⁢        8            
Comparing FIG. 16 to FIG. 6, the filter GEFM(z) 620 is implemented as an SOP structure with the delay operators 1621, the CSD multiplication operators 1622, the CSA adder 1630 (which also takes the input XEFM as an input) and the CPA adder 1635. The output from the CPA is separated into integer and fractional portions with the MSB/LSB operators 1640 and 1650, and the integer portion is used as the output YEFM while the fractional portion (the error) is fed back into the filter.
Notice that the critical path in this EFM goes through the CSD and CSA (both fast) and the CPA (slow). So long as the CPA is in the critical path, it will tend to limit the maximum clock speed (and therefore the throughput) of the structure. Unfortunately, the nonlinear MSB/LSB operators follow the CPA and are neither commutative nor distributive, so it is impossible to remove the CPA from the critical path as was done in FIG. 12. There has been some research done on structures where the MSB/LSB operators are applied to the redundant CSA outputs and the redundant signal is fed back (removing the CPA from the critical path and increasing throughput) however this modification tends to reduce the performance of the EFM significantly, and furthermore circuit-level optimizations within the CSD and CSA structures can easily change the EFM's characteristics in unpredictable ways. As a result, these structures have found limited use.
Similar to DSP filters, parallelization can be applied to EFM's in an attempt to improve their throughput. FIG. 17, drawn in a style similar to FIG. 14, shows a naïve approach to creating a 2-wide parallel (i.e. P=2) version of FIG. 16, which is also similar to the parallel quantization noise shaper used as part of U.S. Pat. No. 7,873,227. The input to the P=2 EFM consists of the 2-wide block XEFM and XEFM·z−1 and the output is the 2-wide block YEFM and YEFM·z−1. CSD elements 1622 and CSA 1630 are merged to create the two identical merged CSD/CSA structures 1730 and 1731, and the single-sample z−1 delay elements 1621 are replaced with double-sample z−2 delay elements 1721. As with the serial EFM structure the parallel EFM structure requires the CPA 1635 to be in the critical path.
As with the naïve parallel IIR filter, this parallel EFM's throughput is limited by the need to evaluate multiple arithmetic blocks (in this case the CSD/CSA/CPA) in series: the output from the merged CSD/CSA operator 1730 is fed (through CPA and LSB operators) into the merged CSD/CSA structure 1731 without any delay element. As a result, the throughput of this 2-wide parallel EFM is, to a first order, limited to approximately the same throughput as an equivalent serial EFM. This is similar to the naïve parallel IIR filter, however because the EFM requires the CPA operator to be evaluated for every output sample, the parallel EFM will tend to have lower throughput than the equivalent parallel IIR. This is at least partially offset by the fact that the EFM only requires evaluation of the LSB's in each CPA, which will in most situations be faster than evaluating the full CPA in an IIR.
As with the serial EFM, attempts to “unroll” the parallel EFM to create an equivalent structure to FIG. 15 are frustrated by the nonlinear MSB/LSB operators that force all CPA's to be evaluated in series and make it essentially impossible to compute previous error feedback values in parallel using a recurrence relation similar to Equation 7. One possible approach, similar to one used in parallel Decision Feed-Back Equalizers, involves parallel speculative pre-computation of previous error feed-back values followed by a final selection stage, however the complexity cost of this approach grows exponentially with the block size P, limiting it to only very small block sizes. For extremely high throughput EFMs where the block size is large (for example P=8 with a clock rate of 1 GHz for 8 Gsps throughput) this becomes impractical.
Therefore, improvements in noise shaping devices to enable high throughput are desirable.