Parallel processing has been widely used in modern high-speed communication systems to improve system throughput. One of the major challenges in parallel processing is to speed up the block decision feedback loop in a parallel processing decision feedback circuit. In decision feedback circuits such as decision feedback equalizer, decision feedback decoder, differential pulse code modulation, etc., current and past decisions are used to adjust the detection or decoding of subsequent symbols. In a parallel processing decision feedback circuit, serial input data is buffered to form a block of N data samples and processed by N parallel branches that each contain a decision feedback unit. Since the decision in each branch depends on the decisions of L preceding branches, the N decision feedback units are series connected and form a timing-critical block decision feedback loop. The latency of a block decision feedback loop limits both the operating clock frequency and the block size of a parallel processing decision feedback circuit which heavily impact the speed and throughput of a system.
One technique used to reduce the latency of a block decision feedback loop is to pre-compute all possible decision values and then use a block decision feedback multiplexer (“MUX”) loop to select the final decision values. In the following exemplary description, the symbols to be detected by a parallel processing decision feedback circuit are assumed to be binary, i.e., chosen from an alphabet of size A=2. A typical pre-computation based parallel processing decision feedback circuit 100 is shown in FIG. 1. It consists of N parallel branches to process N input signal values r0(k) r1(k) . . . , rN-1(k) and generate N output signal values d0(k) d1(k) . . . , dN-1(k) in parallel where k is the discrete time index. The decision of the nth branch's output signal values dn(k) for n=0, 1, . . . , N−1 depends on L decisions immediately preceding it. All possible decision values of dn(k), i.e., pn0(k) pn1(k) . . . , pnM-1(k) where M=2L, are pre-computed at pre-compute and decision stage 102 outside a block decision feedback MUX loop in order to reduce the loop latency. The block decision feedback MUX loop consists of N series connected M-to-1 MUXs which forms a timing critical path as shown in the dashed lines in FIG. 1. As one of ordinary skill will appreciate, the critical path is the path with the longest computation time and zero delay (i.e., the path for branch rN-1(k).)
In a synchronous parallel processing decision feedback circuit, the final output signal values of each different branch must be ready before a common clock event of sampling flip-flops. Each MUX's output signal value depends not only on the preceding decisions of the block in the current clock cycle but also on the decisions of the past block in the previous clock cycle. The inter-block dependency and the intra-block dependency together lead to intra-clock-cycle dependency which requires the overall latency of a block decision feedback MUX loop be less than one clock period, i.e.,
                                          N            ⁢                                                  ⁢                          τ              MUX                                +                      t            su                    +                      t            h                          <                  1                      f            clk                                              (        1        )            Where fclk is the operating clock frequency, TMUX is the latency of a M-to-1 MUX, tsu is the setup time of a flip-flop, and th is the hold time of a flip-flop. Without loss of generality, we ignore tsu and th. Equation (1) can be reformulated into
                              f          clk                <                  1                      N            ⁢                                                  ⁢                          τ              MUX                                                          (        2        )            
As can be seen from Equation (2), the iteration bound of a block decision feedback MUX loop is approximately NτMUX. As one of ordinary skill will appreciate, the iteration bound is the maximum loop bound for a given circuit, where the loop bound for a particular feedback loop is the lower bound on the loop computation time. The loop bound is typically expressed as the quotient of the loop's computation time divided by the number of delays in the loop.
In addition, both the clock operating frequency and the block size of a parallel processing decision feedback circuit 100 are limited by the latency of the block decision feedback MUX loop.
It is, therefore, desirable to mitigate at least some of the foregoing problems by providing a parallel processing feedback circuit with an increased operating clock frequency and an increased block size.