1. Field of the Disclosure
Embodiments of the present disclosure relate to an analog baud-rate Mueller-Muller algorithm based clock and data recovery (CDR).
2. Description of the Related Art
Integrated circuits (IC) may need to communicate with other ICs or modules in any given system design. The ever increasing processing and computation speed of ICs has created a growing demand for high-bandwidth input and output (IO) on these ICs, which is achieved by increasing the signaling rate of each IO pin as well as increasing the number of IO pins on the chip. Today internal circuits can run at 10's of Gbps, but the performance of the link is limited by the characteristics of the channel, namely, the electrical path from one IC die to the other. In order to achieve desired data rates over existing channels, many multi-Gbps links use complex signal processing to overcome the channel limitations. One such example for improving performance of IOs is to change the signaling method and the channel media by using high speed serialized deserialized links (SERDES). These circuits convert data between serial data and parallel interfaces in each direction.
Implementations of SERDES are sometimes combined with implementations of encoding/decoding circuits. The purpose of encoding/decoding is typically to place at least statistical bounds on the rate of signal transitions to allow for easier clock recovery in the receiver, to provide framing, and to provide DC balance. A common coding scheme used with SERDES is 8B/10B encoding. This supports DC-balance, provides framing, and guarantees transitions. The guaranteed transitions allow a receiver to extract the embedded clock. The control codes allow framing, typically on the start of a packet.
The 8 B/10 B SERDES parallel side interfaces may have 1 clock line, 1 control line and 8 data lines, however it may have the clock and control lines integrated in the data lines. Another common coding scheme used with SERDES is 64 B/66 B encoding. This scheme statistically delivers DC-balance and transitions. Framing is delivered through the deterministic transitions of the added framing bits. Also, SERDES can be implemented in combination with pseudo-random binary sequence (PRBS) scrambling data. There exist a number of other coding schemes that could also be used to implement SERDES that provide the necessary transitions for clock extraction.
A clock and data recovery (CDR) circuit is used to align sampling clock at the receiver with incoming data adaptively and is critical for high speed serialized deserialized link (SERDES). Working with a received signal can pose the issues of clock recovery and optimum phase selection. Clock recovery is the process of synchronizing a receiver clock with the transmitter clock used when the signal was generated. Phase selection is the process of selecting a phase with respect to the receiver clock at which to sample the received signal. Such a phase selection is acceptable when it provides a good signal-to-noise ratio (SNR) for accurate data recovery from the received signal. The process of clock recovery, and sometimes phase selection as well, is called clock and data recovery. This is a useful ability because it allows a designer to avoid worrying about tracing lengths and delays and attempting to match them for all parallel data streams. The concerns that are raised are those of area, power, and latency of the implemented CDR.
Many clock and data recovery schemes today use a phase lock loop (PLL). This method is costly both in area and power because PLLs are known to consume a relatively large amount depending on the application. Another method of CDR is 2×-oversampling the data. When data rates are slower (less than 5 or 6 Gbps), a popular choice of CDR is bang-bang CDR, which relies on 2×-oversampling of the incoming data. As the data rate goes up to 10 Gbps and above, it is no longer practical to do oversampling at the required timing accuracy.
Thus, a baud rate CDR (no oversampling) is a preferred method. Without oversampling, some sort of manipulation (add, subtraction, or comparison) of adjacent incoming data samples is required to extract timing information. One such algorithm for timing extraction is called Mueller-Muller (MM) algorithm which was first described in a journal article in 1975. The MM-algorithm is a method for generating a timing error signal H(−1) (also called a timing error detector). The MM algorithm only requires one sample per symbol. It had been implemented in some applications for long distance telecom. However, for those applications, which at most have 10-12 lanes per ASIC, power, area and latency of the receiver are not major considerations and such applications can afford a fairly sophisticated MM-based CDR with lots of computation. For a modern CPU, the number of high speed lanes is on the order of hundreds, and thus, power, area, and latency of the receiver itself are critical to overall CPU performance. Therefore, it is important to balance the performance, complexity, power usage, and area in the baud rate CDR design.
Digital MM-based CDR uses two front-end 4.5 bit ADCs to digitize the incoming data signal, then applies the MM-algorithm on the digitized data to extract timing info. The big disadvantage of such approach is that it required two very fast ADC front end (6.25 Gbps), which consume lots of power and area. In addition, the accuracy of timing extraction is limited by ADC quantization. Last, this implementation requires the use of TX pre-cursor or RX FFE, which add additional latency to the serial link.