In serial transmission systems operating at high bit rates over standard pc-boards or coaxial cables, data receivers may receive significantly distorted signals. Inter-symbolic interference or intersymbol interference (ISI) generated by bandwidth finiteness, reflections due to impedance mismatches and other limiting factors of the transmission media increase the probability of an erroneous recognition of a received bit. For these reasons, it becomes necessary to place, at the receiver input, a circuit to recover the signal before sending it to a re-sampler. Otherwise, the signal arriving at the sampler could be affected by amplitude reduction (vertical eye closure) and/or by timing jitter (horizontal eye closure) as depicted in FIG. 1.
Inside the receiver, a clock and data recovery block (CDR) has the function to reconstruct the right clock timing for correctly re-sampling the received data ideally at the middle of the “eye”, however, horizontal (timing) and vertical (amplitude) degradation of the eye negatively affect the CDR capability of correctly recovering the incoming signal (bit). In fact, as a consequence of timing jitter and of amplitude reduction suffered by the transmitted data pulse signal, the CDR is required to have an adequately enhanced precision in positioning the sampling clock at the center of the eye and sensitivity to small amplitude signals.
A typical serial transmission chain is shown in FIG. 2. A linear equalizer is usually placed at the input of the receiver implementing a frequency transfer function with the target of matching the reverse of the transfer function of the transmission channel H(s). If such a match is achieved, the aperture of the “eye” is improved, both horizontally and vertically.
Upon increasing the operating frequency, the capability of such a linear equalizer acting as a high pass filter matching the reverse of the transfer function of the transmission channel may be inadequate to provide sufficient compensation of the channel frequency losses. As a result, a different technique of equalization, known as decision feedback equalization (DFE), is implemented between the linear equalizer and the re-sampler. The DFE may even completely substitute traditional linear equalization.
FIG. 3a shows an example of the degradation of a unit pulse (namely a pulse whose amplitude is 1 Volt and whose duration is a 1 bit unit interval (UI)) caused by bandwidth finiteness and other limiting factors of the transmission channel. The resulting pulse has a lower peak value and a longer duration.
Considering the transmission channel as a linear system, a generic received signal can be seen as the superposition of individual pulses of positive or negative polarity, based on if positive or negative bits are transmitted. An example of a train of adjacent data pulses having the same amplitude and sign as received is shown in FIG. 3b. 
If we assume the receiver to be correctly sampling each bit of the received data pulse signal at its pulse peak (C0 or cursor value), postcursor amplidude values of pulse tails of the bits preceding the bit subject of sampling, eventually as well as precursor amplitude values of successive bits as received, sum to the cursor value as an ISI contribution to the sampled amplitude of the incoming signal. The known DFE technique is based on the principle that because the previous data bits are known, their contributory effects in producing ISI on the incoming data bit may be determined and deleted by subtracting a quantity equal to the ISI that is produced on an incoming data bit.
A DFE uses sampled values (bn) and respective sampling errors (en) to estimate channel-dependent coefficients (ci) that multiply with the correspondent previous bits and subtracts the results from the incoming data bit. An exemplary implementation of a DFE using four coefficients is shown in FIG. 4. The bn value is provided by a comparator COMP1 that checks if its input is positive or negative and produces a bn signal whose amplitude is set to +vth or −vth, according to the input signal polarity. A second comparator COMP2 compares the input and the output of the comparator COMP1 for providing error information to an estimator (LMS) of the coefficients ci.
In a practical implementation, the comparator COMP1 may not materially be present because it can be seen as part of the sampling flip-flop FF1. In this case, for the generation of the sampling error information (en) the input and the output of the flip-flop FF1 can be directly monitored by any circuit adapted to perform the logic function of the COMP2 comparator. Commonly, Least Mean Squares (LMS) algorithms are employed to estimate the coefficients ci and find the best set of coefficients ci that minimizes or reduces the mean square error en between the value of the expected bits (+/− a certain threshold Vth) and the received bits.
Whether a single estimated coefficient is used (simplest implementation with a single correction tap) or several coefficients are used (more refined implementation with several correction taps) for enhanced ISI deletion, in order to ensure correct behavior of a DFE circuit in terms of data recovery, a first or unique correction by the first (ci) of the estimated coefficients needs to be effected before sampling the next bit. To satisfy this requirement, the DFE feedback path for the first or unique estimated coefficient c1 cannot have a signal propagation delay greater than the bit period (Tbit), and usually the propagation delay is smaller than the bit period.
On the other hand, to improve the Clock Recovery capability of correctly recovering the incoming signal phase, the Clock Recovery needs to receive, as its input, the same signal equalized by the DFE corrections. In case of a Clock Recovery based on the analysis of the data transitions (as, for example, the case of bang-bang CDRs), the DFE feedback needs to be applied before the transition of the data, which puts the constraint that the maximum delay for the application of the DFE correction needs to be less than Tbit/2. Reference is directed to the article “NRZ Timing Recovery Technique for Band Limited Channels,” by Bang-Sup Song, IEEE Journal of Solid-State Circuits, Vol. 32, No. 4, April 1997.
Often receivers use a half rate clock, where the expression means that the frequency of the clock that generally is recovered from the incoming data bitstream is half that of the bit-rate of the transmitted data pulse signal, and both rising and falling edges are utilized to sample the incoming data. On the other hand, because the DFE corrects the incoming bit on account of the ISI of a single previous bit or of several previous bits, a DFE implementation as that shown in FIG. 4 will necessarily be a full-rate system.
The DFE can be adapted to a half-rate clocking scheme of the receiver by using a multiplexer that selects which of the two samples (the data sampled by the rising clock edge and the one sampled by the falling clock edge) has to be alternately used as the previous bit (pre-cursor bit) to be multiplied by the ci coefficient before being subtracted from the input bit (cursor bit), as with the exemplary circuit of FIG. 5. The flip-flops FF1 and FF3 provide a sampled value of their input at the rising edge of the clock, while the flip-flops FF2 and FF4 provide a sampled value of their input at the falling edge of the clock. The multiplexers select their input 1 on the high level of the clock and their input 2 on the low level of the clock.
In this description, the clock ck of the multiplexers has been depicted as being the same clock of the flip-flops. However, it is possible to have some difference between the clock of the multiplexers and the clock of the flip-flops, without changing the basic concept.
As previously stated, the DFE corrections have a setting time Tbit for the data recovery and a setting time Tbit/2 for the clock recovery requirement. Considering FF1 and FF3 as providing sampled values of their inputs at every rising edge of the clock in FIG. 6, the generation of signC1 and signC3 is more critical than the generation of signC2 and signC4, because the C1 and C3 multiplexers switch during the commutation of FFout1 and FFout3, while C2 and C4 multiplexers switch while FFout2 and FFout4 are stable. Therefore, signC1 and signC3 are affected by the FF1 and FF3 clock to Q delay, while signC2 and signC4 are only affected by the multiplexer delay. Considering, the operation of FF2 and FF4 that provide sampled values of their inputs at the falling edge of the clock in FIG. 6, the evaluation of signC1 and signC3 are still critical because the multiplexers that generate them switch during the commutation of FFout2 and FFout4.
Despite the fact that signC1 and signC3 appear to have the same timing requirements, in practical implementations signC1 usually represents the bottleneck of the system. This is due to the fact that the signC3 multiplexer inputs FFout3 and FFout4 have already been converted into high swing digital voltage levels by the two respective samplers FF1, FF3, and FF2, FF4. On the contrary, the signC1 multiplexer inputs FFout1 and FFout2 come from a single flip-flop (FF1 and FF2) that, according to the data rate, to the channel and to the transmission amplitude, may sample a small amount of analog data. In fact, the COMP1 squarer may be commonly avoided to minimize or reduce the feedback delay or, even if it is present, its squaring capability to convert its input into high swing digital voltage levels may be too low. FF1 and FF2 clock to Q delays can therefore lead to a failure of the condition of the total delay being less than Tbit/2.
Prior art implementations address the problem of the DFE critical timing path using a sense amplifier based FF1 and FF2 sampler, as provided in the article “A 6.25-Gb/s Binary Transceiver in 0.13-μm CMOS for Serial Data Transmission Across High Loss Legacy Backplane Channels”, by Payne et al., IEEE Journal of Solid-State Circuits, Vol. 40, No. 12, December 2005. The problem of the DFE critical timing path may also be addressed by implementing the first correction via the loop unroll technique. Reference is directed to the article “A 6.4-Gb/s CMOS SerDes Core With Feed-Forward and Decision-Feedback Equalization,” by Beukema et al., IEEE Journal of Solid-State Circuits, Vol. 40, No. 12, December 2005. The first implementation, dealing with the boost of the first flip-flop sensitivity, is intrinsically bandwidth limited as the ratio bandwidth over sensitivity is limited by the technology. The second implementation requires a hardware overhead, increasing the area and power consumption of the stage.