High-speed serial links are used to transmit data from chip to chip over wired media, such as a printed circuit board or a backplane. The general link model is displayed in FIG. 1. A transmitter 1 sends data over a data channel 2, which is then received by a receiver 3. Transmitter 1 and receiver 3 are integrated on-chip. The data channel 2 can be a combination of printed circuit board, connectors, backplane wiring and cable. In general, the receiver 3 has to perform clock recovery to account for variations in the symbol timing.
The aggregate data rates in future chip to chip communication will soon reach several Tbits/s in some applications. Since serial links are analog in nature, ordinary scaling in power and area, as seen for digital logic, does not apply. Hence, the relative area and power consumption of the chip input/output interface versus logic is increasing. On the receiver side, most power is spent for clock generation. In consequence, it is a challenge to find a serial link receiver architecture which minimizes area and power consumption.
In high-speed links, sub-rate receiver architectures are frequently used. This allows clocking the receiver at an integer fraction 1/S of the data rate, thereby relaxing the requirements on the sampling latches and the clock distribution circuitry. Thus, sub-rate receivers allow exploring the speed limits of a given technology and reducing the power consumption.
Typical values for S range between 2 and 8. FIG. 2 displays the required sample clocks for a quarter rate (S=4) receiver, where four data bits D0 to D3 are sampled in one clock cycle. In order to extract also the timing information the incoming data signal has to be over-sampled, with an over-sampling factor M typically being either 2 or 3, wherein in FIG. 2 M=2. Hence, the clock generator has to supply a total number of S×M equidistantly spaced clock phases, i.e. a quarter rate receiver with an over-sampling factor M=2 generates S×M=8 clock phases φ1 to φ8 as depicted in FIG. 2. Additionally, means have to be provided to shift these clocks φ1 to φ8 in phase by some controlled amount in order to align the clocks φ1 to φ8 to the phase of the incoming data signal. This phase shift should not be limited to a finite phase range in order to allow plesiochronous operation. A plesiochronous operation describes an operation that is almost, but not quite, in synchronization—in other words, almost synchronous.
In a dual loop architecture for clock and data recovery (CDR), which is described in S. Sidiropoulos, M. Horowitz, “A Semi-Digital Dual Delay-Locked Loop,” IEEE J. Solid-State Circuits, vol. 32, no. 11, pp. 1683-1692, November 1997, the clock phases are generated from a clean local reference clock. A second loop, functioning as a digital delay locked loop, then locks the sampling phases to the random input data.
In J. Kim, M. Horowitz, “Adaptive Supply Serial Links with Sub-1-V Operation and Per-Pin Clock Recovery”, IEEE J. Solid-State Circuits, Vol. 37, No. 11, pp. 1403-1413, November 2002, a sub-rate dual-loop clock and data recovery circuit is described. In FIG. 3 the CDR circuit for S=4 and an over-sampling factor M=2 is shown. A reference clock φref enters a phase-locked loop (PLL) 12, which then generates at its output a number of k clock phases φ1 to φk. These clock phases φ1 to φk are then fed to a phase rotator 7, which allows setting the phase by some digital value, wherein the digital value is given by a digital control signal ctrl. The clock coming out of the phase rotator 7 enters a phase generator 8, which provides S×M=8 equidistantly spaced clocks to be used in S×M=8 sampling latches 9. The resulting samples (four data bits, and four edge bits) then enter a digital loop filter 10, which finally controls the phase rotator 7. This forms a digital delay locked loop (DLL) 11, which tracks the phase and small frequency deviations of the input data.
In the embodiment according to FIG. 3, the phase shift is achieved by inserting the phase rotator 7 in the digital DLL 11. The phase rotator 7, however, increases the loop delay, suffers from non-linearity, and requires careful control of the signal slew rates.
The circuit described in K.-L. Wong et al., “A 27-mW 3.6 Gb/s I/O Transceiver,” IEEE J. Solid-State Circuits, vol. 39, no. 4, pp. 602-612, April 2004, achieves a simultaneous shift in the clock phases by introducing a programmable imbalance in the charge pump currents. This has the disadvantage of limiting the adjustable delay range to some unit intervals, which disallows plesiochronous operation.