1. Technical Field
The present invention relates to a system and method for automatically correcting duty cycle, quadrature relationship, and amplitude relationship between two reference clocks in a closed loop phase rotator sub-system, which may be embodied on an integrated circuit.
2. Description of the Related Art
A common clock generation architecture employed in high data-rate serializer-deserializer (SERDES) input/output (I/O) cores employs a single low-noise phase locked loop (PLL). The PLL signal is distributed to one or more of a transmitter (Tx), a receiver (Rx), or a transceiver subsystem which includes “phase rotators” to frequency and phase offset the fixed-frequency PLL clock so that the local receiver clock can be phase-locked to an incoming data stream.
Referring to FIG. 1, a block diagram of a clock-generation subsystem is shown. In this example, a PLL 10 output clock is divided by two using a quadrature divider 12. The quadrature divider 12 produces output “In-phase” (I) 13 and “Quadrature-phase” (Q) 14 clocks, each divided by two from the PLL clock frequency and shifted from each other by 90 degrees. These quadrature clocks 13 and 14 are distributed to one or more data I/O cores 15. Each I/O normally contains a local clock buffer 16 which may be used to improve the quadrature accuracy and/or duty cycle of the I and Q clocks 13 and 14, respectively before being applied to local phase rotators 17 and 18.
This local clock buffer is implemented using open-loop methods, meaning that the clock signals pass through the buffers 16 without any feedback correction applied to improve the output duty cycle and/or quadrature relationship of the clocks. Other devices may include latches 21.
The clocks 13 and 14 are shown being distributed to a number of data receivers 15 for illustrative purposes. Each receiver 15 can have an independent input data stream (e.g., Data Input 1) which must be frequency and phase locked by the local clock-and-data recovery (CDR) unit 22. In a typical application, to achieve frequency and phase lock, the CDR 22 updates the phase of the local edge phase rotator 17 such that an edge crossing from the output of edge rotator 17 is coincident with the edge crossing of the incoming data stream (Data Input 1). Data rotator 18 is programmed to a phase offset suitable for data detection, normally ½ bit interval, shifted from the edge rotator 17. Latches 21 are used to capture the data and edge information which is processed by CDR unit 22. It is understood that the basic data and edge detection receiver is sufficient for illustration purposes of the phase rotator based clock generation system.
The phase rotators 17, 18 are capable of generating an output clock 19 and 8 with phase varying from 0 to 360 degrees by mixing and combining the I and Q input clocks with varying weights as shown in FIG. 2.
Referring to FIG. 2, the phase rotator 17, 18 works by summing a weighted combination of input quadrature clocks 23 and 24 to create an output clock 30 with programmable phase. As an example, a digital analog converter (DAC) 26 outputs weights 27 and 28. When an I clock digital analog converter (IDAC) 27 weight inputs to an I clock mixer 25 and is set to 1 (meaning 100% selected) and a Q clock DAC (QDAC) 28 weight is set to 0 (meaning 0% selected), the phase rotator 17, 18 outputs the I clock, which by definition has a 0 degree phase. Similarly, when the Q clock DAC 28 weight is set to I and the I clock DAC 27 weight is set to 0, the phase rotator outputs the Q clock, which has a 90 degree phase if and only if the input Q clock 24 is in perfect quadrature with the I clock. Intermediate phases can be achieved by weighting the I and Q clocks with corresponding multipliers which achieve the desired output phase as shown in table 33 in FIG. 2.
A common implementation of a phase rotator in a serializer/deserializer core using ½ rate clocking (meaning the I and Q clock frequencies are ½ of the received data rate) employs a total of 64 phase steps from 0 to 360 degrees, achieving a phase resolution of 5.625 degrees. Such a design provides a time resolution of 32 steps across one received data bit duration. Although the detailed circuit implementation of the phase rotator components (DACs 27 and 28, mixers 25, and summer 29 can use many different techniques, every phase rotator implementation will be limited in phase accuracy by the fundamental accuracy of the I and Q input clocks 23 and 24.
Since the I and Q clocks are distributed from an LC PLL to many Tx/Rx cores, the quadrature relationship of the clocks can become mismatched due to different I/Q path delays in the clock distribution. Further, the duty cycle of the clocks can become inaccurate due to mismatch and delay differences in clock buffer devices.
Referring to FIG. 3, a timing diagram illustrates quadrature clocks. The clock waveform crossing times T1, T2, T3 and T4 can be related to the resulting duty cycle and quadrature relationship error through the following definitions and formulas:T=average clock ½ period=T4/2  (1)DUTYI=T2/(2*T)*100%  (2)DUTYQ=(T3−T1)/(2*T)*100%  (3)IQ=integ(I*Q)=(T3−T2+T1)/T*90 deg  (4)
A perfect IQ clock has DUTYI=50%, DUTYQ=50%, and IQ=90 degrees, meaning the I and Q clock + and − polarity duration are identical, and the Q clock is delayed from the I clock by exactly 90 degrees, which corresponds to ¼ of the full clock period 2*T. To see how non-50% duty cycle and non-90 degree quadrature can translate to time jitter in the clock generator, the values T1, T2, T3 and T4 can be expressed as a function of the duty cycle and quadrature relationship as follows;T1=(IQ/180 deg+(DUTYI−DUTYQ)/100%)*T  (5)T2=DUTYI/50%*T  (6)T3=T1+DUTYQ/50%*T  (7)T4=2*T  (8)
To simplify the jitter analysis, it can be assumed that the phase rotator creates an edge clock from either the I or Q signal and a data clock from the Q or I signal, respectively, at the clock crossing intervals 0, T1, T2, T3, and T4 in FIG. 3. Since the edge clock sets the time reference for the sampling system, the data clock jitter can be computed as the difference from an ideal sampling position (T/2 delayed from the edge clock in this description) to the actual sampling position (T/2 delayed+error).
In an asynchronous clock-recovery system, the edge phase will shift from 0 to T4 over time as the receiver system tracks an incoming data signal with a non-coherent (different frequency than local PLL) clock. Therefore, at the waveform crossing intervals, the possible edge and data sampling positions are given as shown in Table 1:
TABLE 1Data Sample JitterEdge SampleData SampleData Sample Jitter0T1(T1 - T/2)T1T2(T2-T1 - T/2)T2T3(T3-T2 - T/2)T3T4(T4-T3 - T/2)
The peak-to-peak data sample jitter added by the non-ideal quadrature clocks can be expressed as the maximum of the sample jitter in Table 1 minus the minimum of the sample jitter, which can be compactly computed as:Data Sample Jitter=max(T1,T2−T1,T3−T2,T4−T3)−min(T1,T2−T1,T3−T2,T4−T3)  (9)
The jitter is zero only when T1, and the separation between T2 and T1, T3 and T2, and T4 and T3 all equal T/2. This condition can happen only if the I and Q clocks are in perfect quadrature (Q delayed from 1 by T/2) and have a 50% duty cycle.
In the prior art, a local open-loop “coarse clean up” buffer 16 (FIG. 1) at each local clock generator is commonly employed to clean up the IQ clocks as much as possible for quadrature accuracy and duty cycle before the signals are applied to the phase rotator.
Referring to FIG. 4, a common implementation of a “coarse clean up” buffer in the prior art is shown which provides two output paths which form I+Q and Q−I from summers 40 with two current-mode-logic (CML) clock inputs I and Q augmented with DC-blocking clock buffers 41 on the output to improve the duty cycle. This operation improves the quadrature relationship and duty cycle of the output clock signals. However, limitations in match accuracy in the “coarse clean up” buffer itself due to variations in the devices used to build it puts a fundamental limitation on the achievable accuracy. In particular, load resistances 42, device gains 43, and buffer stage bias currents 44 are all susceptible to significant mismatch effects when realized in deep-submicron CMOS technology. These mismatch effects effectively create unwanted DC offset on the outputs which add error to the duty cycle and quadrature relationship of the I and Q clocks.
Studies of realized integrated circuits indicate that even after a “coarse clean-up” buffer and any associated open-loop duty-cycle clean-up clock buffers, the reference clock errors arising from device mismatches can induce a data sample time jitter as predicted by formula (9) on the order of 20% of a received bit width (which is equal to the time interval T in a half-rate clocking architecture) or more. This level of degradation is typically not acceptable for applications in high data rate (5-10 Gb/s and beyond) SERDES since time jitter of 20% is not available in the jitter budgets due to large jitter from crosstalk, reflections, inter-symbol interference (ISI), and other core degradation sources including random clock jitter from the PLL. In many common data transmission applications, channel and core induced jitter will result in 15% or less remaining jitter margin within a one bit-interval sample interval even with a perfectly linear phase generation subsystem. In addition to mismatch issues open-loop systems are unable to truly compensate I/Q separation error effects due to the basic I+Q/Q−I algorithm, which simply propagates this error from time to amplitude domain where it still negatively affects rotator accuracy.