Clock-data-recovery, also known as CDR, loops in serial data transmission links measure the position of the transition between logic states one-to-zero, zero-to-one, or multilevel-transitions and use a control loop to dynamically adjust the phase of the data sampling clock to an optimum position for minimum errors. This requires high-speed sampling latches in the receiver front end, a high-speed phase detection logic function, the appropriate clock generation building blocks, and a means to adjust the phase of the clock. In Gu-Yeon Wei, Jaeha Kim, Dean Liu, Stefanos Sidiropoulos, and Mark A. Horowitz, “A variable-frequency parallel I/O interface with adaptive power-supply regulation”, IEEE Journal of solid-state circuits, vol. 35, no. 11, November 2000, p. 1600-1610, and in Stefanos Sidiropoulos, Mark A. Horowitz, “A Semidigital Dual Delay-Locked Loop”, IEEE Journal of solid-state circuits, vol. 32, no. 11, November 1997, p. 1683-1692, phase adjustment loops according to the above description are depicted. US2004/0218705A1 gives a detailed description of a CDR system with digital phase adjustment.
It has to be noted that the adjustment of the clock phase as outlined above is controlled by a digital signal. This digital signal inherently carries information about the dynamic phase difference between the reference signal and the data signal. The reason for this is that the CDR control loop operates to minimize the phase difference between clock and data phase as it is detected by the high-speed phase detection logic function—the error signal in the loop is therefore an equivalent representation of the phase difference between clock and data, and the digital block controlling the phase of the clock is holding the phase value.
There are systems where only one serial link is used to connect transmitter and receiver. An extension of these systems can be achieved when two or more serial links are bundled to form a serial port to achieve higher aggregate data throughput. One advantage of this approach lies in the fact that the instantaneous phases of all data lanes in a port are highly correlated because the same clock signal is used in the transmitter to launch all the data signals. An example for such a system is described in the IEEE 802.3ae XAUI standard.
To further extend the concept of a serial port, an additional signal containing information about the clock can be forwarded from the transmitter to the receiver in the form of a dedicated clock lane. This signal is called the ‘forwarded clock signal’ and its instantaneous phase is again highly correlated to the instantaneous phases of all data lanes and vice versa. An example for such a system with a forwarded clock signal is described in the HyperTransport standard or the OIF SFI5 standard. It should be noted that those applications feature a full-duplex signaling scheme where two serial ports are connected between two chips in opposite direction, one south-bound and one north-bound.
The phase correlation can be exploited in the receiver when sharing phase update information between the individual channels, in particular sharing the phase information of the forwarded clock signal with the individual data lanes of a serial port. The phase information inherently present in the forwarded clock signal can therefore be used to replace a standalone receiver clock generator to generate the base signal for the data sampling clocks by means of a clock buffer driven by the forwarded clock signal, a Phase Locked Loop, PLL, with a reference frequency derived from the forwarded clock signal, or Delay Locked Loop, DLL, with its input signal derived from the forwarded clock signal.
A benefit of using the forwarded clock signal to act as the clock generation unit in the receiver is that the individual CDR loops of the data lanes can be implemented with reduced bandwidth and therefore power and/or first chip area, because most of the phase uncertainty is eliminated by leveraging the phase correlation between data and clock. In particular, the individual CDR loops could operate only periodically to save power.
The phase of the recovered data from each lane in the serial port is referenced to the recovered clock signal. In general, the recovered data is transferred to a digital core where the information contained in the data signals is processed. For this, the phase of the data signals has to be aligned with the clock of the digital core. This is achieved by means of a synchronizer consisting typically of a First-In-First-Out, FIFO, buffer with separate write and read addresses. The separate write and read addresses enables the FIFO to accept data from one side to be written at a given position write address, and to write data to the output of the FIFO depending on the read address. The distance of the write and read address has to be larger than the maximum phase difference between the clock of the recovered data and the clock of the digital core to avoid non-causal behavior e.g. reading data before it is written. A control loop can be added to ensure the appropriate update of the read and write addresses to achieve a relatively improved performance.
Three problems are associated with this traditional approach. First, the recovered data on the receiver chip has to be passed to the digital core functionality of that chip where it is actually processed. However, due to the fact that the phase of the recovered data is dynamically adjusted in the CDR loop to achieve minimum errors, there is a dynamic phase difference between the recovered data and the core clock of the chip. To compensate for this difference, a synchronizer circuit is used to add phase elasticity between the output of the CDR and the input of the core logic as described above. The larger the dynamic phase uncertainty between the recovered data clock and the core clock, the more stages have to be added to the FIFO in the synchronizer. This increases the latency of the transmission system which is particularly problematic for interconnect systems over short distances and for latency critical links as found in memory applications. Second, the use of the forwarded clock signal in the function of the receiver clock generator has the disadvantage of introducing a single point of failure. When the clock signal is not arriving at the receiver, for example because a connector in the transmission channel has oxidized, the entire receiver will stop functioning due to lack of a proper clock. This is particularly negative in systems with high reliability requirements. A spare lane is typically added for systems with such high reliability requirements, and a high-frequency analog switch is added to the front end to route the clock to the spare lane. High-frequency switching is a demanding task because of the associated parasitic capacitive loading of the lanes and the jitter penalty associated with the programmable routing. Third, because the forwarded clock signal is required to be a continuous signal, the transmission of any additional information via that reserved lane, other than the phase information, is denied. In particular, it is not possible to use that lane for out-of-band signaling of status information or for signaling of equalization settings. It is also very difficult to assess the quality of the clock signal. However, not knowing the clock quality leads to the situation where unscheduled down-time of the system is highly probable. To ensure high quality clock signals, special care can be applied to the routing of the clock signal, for example to avoid unwanted cross talk signal injections. However, this complicates the board design and, in the case of a failover mode when the clock is transmitted over a data lane, the specific layout is no longer present.
US Patent Publication number 2004/0208270A1 describes a method to generate, distribute and share the phase update information of one or many CDR loops with one or many other CDR loops and it describes a method for a clock generator whose phase is controlled via said shared phase information. In contrast to the setup described in the previous section where the correlation information was distributed in the form of an analog signal, systems such as the one detailed in 2004/0208270A1 distribute the correlation information in digital signal form. The advantage of this approach is that the digital signals are not prone to any noise or drift due to their quantized nature. It is also the case that the digital phase information can be distributed at a fraction of the frequency of the high-speed forwarded clock signals, thereby reducing complexity, power and first chip area.
FIG. la illustrates a serial link system according to the prior art formed by a full-duplex configuration of two serial link ports between a first chip 101 having a core 103 and a second chip 102 having a core 104. The second chip 102 includes a serial transmitter port 105 connected via a plurality of data lanes 107 and a clock lane 108 to the receiver serial link port 106 on the first chip 101. Similarly, the first chip 101 includes a serial transmitter port 105 connected via a plurality of data lanes 107 and a clock lane 108 to the receiver serial link port 106 on the second chip 101. The first and second chip are synchronized by distribution of a first reference clock signal 110 to first chip 101 and a second reference clock signal 111 to second chip 102. The first and second reference clock signals 110 111 originate from a reference clock generator 109. The reference signals 110 111 may have the same frequency, but with variable instantaneous phases. They may have a rational number ratio between their frequencies, and they may be generated by different independent sources. Such a configuration might be found for example in multi-processor applications where two processors exchange information. U.S. Pat. No. 6,334,163 and 5,832,047 describe such serial link systems and the HyperTransport consortium has specified such a serial link system in the HyperTransport standards.
The serial links as depicted in FIG. 1a can be modified to implement a unidirectional serial link system or a daisy chained serial link system as depicted in FIG. 1b and FIG. 1c. These and other similar configurations are outlined in the HyperTransport standards as published by the HyperTransport consortium. FIG. 1b for example illustrates a block diagram of such a unidirectional serial data transmission system from a first chip 101 to a second chip 102. The first chip 101 includes a serial transmitter port 105 connected via a plurality of data lanes 107 and a clock lane 108 to the receiver serial link port 106 on the second chip 101. The two chips are synchronized by distribution of a first reference clock signal 110 to the first chip 101 and a second reference clock signal 111 to the second chip 102, both reference signals originating from a reference clock generator 109. Such a configuration might be found for example in a switch chip where information is flowing through a chip in one direction only. FIG. 1c illustrates a similar system as depicted in FIG. 1a, but enhanced according to the prior art to form a daisy chain serial port between a first chip 101 having a core 103, and second chip 102 having a core 104. A receiver serial link port 106 on the first chip 101 receives data on a plurality of data lanes 107 and information on a clock lane 108 from an upstream device, not shown. A transmitter serial link port 105 is connected via a plurality of data lanes 107 and a clock lane 108 to the respective receiver serial link port 106 on the second chip 102. A transmitter serial link port 105 on the second chip 102 transmits data and clock data to a downstream device, not shown. The two chips are synchronized by distribution of a first reference clock signal 110 to first chip 101 and a second reference clock signal 111 to second chip 102, both reference signals originating from a reference clock generator 109. The reference signals 110 111 may have the same frequency, but with variable instantaneous phases. They may have a rational number ratio between their frequencies, and they can be generated by different independent sources. Such a configuration might be found for example in memory buffer applications one memory buffer is transferring data from upstream memory buffers to memory buffers downstream. Example for such a system is shown in the JEDEC Fully Buffered DIMM standard.
FIG. 2 illustrates a receiver according to the prior art. A plurality of data lanes 107 and a forwarded clock signal lane 108 are incident to the receiver of the serial port. A clock failover mode switch logic 201 is controlling a multiplexer 202 via a control signal 232 and gets its trigger signal 208 from a analog clock quality analysis block 207. The output signal 204 of the multiplexer 202 is fed to a clock buffer 205 which can comprise a PLL. The analog clock quality analysis block 207 analyses the quality of the clock signal 204 and can also be fed by the buffered clock signal 225 via a second path 206. The output of the clock buffer 205 is fed into a plurality of CDR loops 212 which are connected to each of the incoming data lanes 107. The CDR loop comprises a data receiver frontend 211, a CDR logic 213, a phase position logic 210 and a phase adjuster 209 to adjust the phase of the buffered clock signal 225. The CDR logic 213 analyzes the phase difference between the input signals of the data receiver frontend 211, and minimizes this phase difference by updating the phase position logic 210 accordingly. The CDR logic feeds the recovered data signal 214 and the recovered clock signal 215 from each CDR loop 212 to a synchronizer block 216 where the dynamic phase difference between the recovered clock signals 215 and the core clock signal 224 is compensated. Each synchronizer 216 comprises a circular FIFO 217, a write/read address logic 218 and a synchronization flip-flop 219. The clock of the synchronization flip-flop 219 is derived from the core clock signal 224 which is generated in the core clock generator 222 and thereby referenced back to the external reference clock signal 223.
There may be a significant amount of dynamic phase difference between the clock path that extends from the reference clock generator 109, the reference clock distribution 110 to the transmitter 105, over the clock signal 108, through multiplexer 202, clock buffer 205, CDR loop 212 to the recovered clock signal 215 on one side, and the clock path that extends from the reference generator 109, the reference clock distribution 111, core clock generator 222 to the core clock output signal 224. There are many sources for the dynamic change of the phases in the above described paths which can include for example variations in the supply voltage affecting delays in electronic circuits, variation in the temperature affecting delays in electronic circuits, variation in the humidity affecting board impedances, noise effects affecting delays in electronic circuits. The synchronizer requires the ability to compensate for the worst case combination of all potential phase variations to guarantee a low error probability in the serial link transmission system. This is driving the requirement to add many stages to the FIFO in the synchronizer, which is directly proportional to latency introduced to the serial link transmission system. Latency prevents data from being processed quickly, and so such effects are unwanted in most serial link transmission systems. Also, more FIFO stages necessitate higher power consumption and more first chip area to implement the synchronizer.
FIG. 5 illustrates a synchronizer according to the prior art. The data from a clock-data-recovery loop 214 is fed into a receive FIFO 217 where the data is stored in the sequence of arrival. The recovered clock from the clock-data-recovery loop 215 is fed to the load address logic 501 where the address of the storage place in the FIFO is generated. An unloading address logic 504 selects one of the outputs of a multiplexer 502 and feeds this output signal 503 to a synchronization flip-flop 219. The clock signal from the core 224 is connected to both the unload address logic 504 as well as the synchronization flip-flop 219. The load address logic 501 and the unload address logic 504 are combined in a single write/read logic block 218. Similar synchronizers between two clock boundaries are disclosed in the following; Application Note 130 “CDR in Mercury Devices”, Altera Corporation, February, 2001; Technical Note “Introduction to the sysHSI Block ispXPGA and ispGDX2”; Lattice Semiconductor Corporation, April, 2003; by Ajanta Chakraborty “Efficient Self-Timed Interfaces for Crossing Clock Domains—a thesis submitted to the Department of Computer Science, The University of British Columbia, August, 2003; by Ingemar Söderquist, “Globally Updated Mesochronous Design Style GUM-design-style”, Proceedings of the 28th European Solid-State Circuits Conference, 24-26 Sep. 2002, Florence, Italy.
As noted above, the systems according to the state of the art require that the FIFO in the synchronizer has sufficient number of stages to handle the maximum dynamic phase variation between the recovered data clock signal and the core clock signal. Furthermore, the systems are prone to a single point of failure in that a failure in the clock path will result in failure of the entire serial port. Routing of the clock lane in order that unwanted signal injections are avoided may be difficult. Finally, the forwarded clock signal is required to be a continuous signal without the possibility to add out-of-band information transfer to this lane.