1. Field of the Invention
The present invention generally relates to controlling clocks in a circuit and particularly distributed clocks so that the synchronization of the different clocks is within a predetermined timing delay of one another.
2. Background Description
A growing number of standards require use of High-Speed Serial Links (HSSL) to send large amounts of data between chips in a system (e.g., a computer-based control system, a communication system, or the like) while minimizing the number of chip i/o pins and circuit board connections. Some of these standards require sufficiently large bandwidth that multiple HSSL channels are required to implement the physical layer of the interface.
When using multiple HSSL channels, data at the transmission source is distributed between the multiple HSSL channels using an arbitrary ordering algorithm, and then must be reconstructed at the receiver. To create reasonable design limits on the receiver, and to ensure interoperability of the transmitter and receiver, constraints may be required on the transmitter regarding this distribution between channels. For example, if data is distributed between channels by bit striping (or other technique) across the available channels (i.e., bit 1 is transmitted on channel 1, bit 2 on channel 2, bit N on channel N, bit N+l on channel 1, etc.), then the propagation delay of the various HSSL channels must be controlled such that the receiver can reliably demultiplex the data and reconstruct the original message content.
However, a complex problem arises when the propagation delay difference between the channels can exceed a bit time of the serial data. The receiver must employ a “deskew” algorithm to determine, and correct for, the difference in propagation delay between the HSSL channels. The deskew range is a design parameter which can have a significant effect on the complexity of the receiver design, and if insufficient, impacts interoperability of the system. The definition of the interface standard therefore may dictate a skew budget to various system components (i.e., the amount of propagation delay difference between channels that may be introduced by each system component). At the transmitter, this requires tight control of the skew between channels of the transmitted data.
The architecture of typical HSSL transmit macros is that each HSSL channel provides a W to 1 multiplexer function, where W-bit parallel data using a 1/W clock rate is multiplexed onto a high-speed serial channel that is 1-bit wide and runs at the full baud rate. Typically, the HSSL macro has a lower frequency clock input (perhaps, as an example, with a frequency of ¼ the baud rate), and contains a phase lock loop (PLL) which performs clock frequency multiplication. This provides a stable high frequency clock locally within the macro, and avoids the problems associated with distributing a high-speed clock over a large area within the chip. The PLL clock is then divided down within the macro to provide the 1/W clock rate to logic that drives the data input to the HSSL transmit macro.
But, in applications that require multiple HSSL channels, the HSSL channel architecture is duplicated for the number of channels required. Thus, an N channel implementation requires N independent HSSL channels, each of which has its own independent 1/W clock supplied to upstream logic. These 1/W clocks may be supplied from a common PLL or may be supplied from independent PLLs, depending on the architecture of the available HSSL macros. Because of the use of multiple parallel PLLs, and because of routing delay variance between channels, which applies even in cases when a common PLL is used, the N 1/W clocks will have significant phase variation. Additionally, this phase variation may drift over time as the chip undergoes changes in temperature or supply voltage.
The HSSL channel architecture described above (and specifically the necessity of having N independent 1/W clocks with individual phase variation) conflicts with the ability to tightly control the skew introduced between channels in the transmission system. A typical chip architecture will use a common clock domain to supply parallel data to the interface transmit logic associated with the HSSL macros. This data must be distributed between HSSL channels using an appropriate algorithm, and must be re-timed independently to each of the N independent 1/W channel clocks. This is typically done using one of several techniques to retime data across clock domains; an example technique being to use a FIFO which is written based on the chip's primary clock domain, and which is read based on the local channel 1/W clock. However, the skew requirements of the interface standard prevent the N channel clocks from being treated entirely independently. It is necessary to ensure that data being read out of the N FIFOs associated with each of the N HSSL channels starts in each of the N clock domains within a tolerance window. A one clock cycle slip in the startup of one FIFO relative to another FIFO would introduce W bits of skew into the transmit system, which likely exceeds the specified skew budget for the transmit component.
Synchronous start-up of N parallel FIFOs using N independent clocks requires a trivial logic implementation if N is sufficiently small such that the chip area over which the logic is implemented is very confined, or if the frequency of the 1/W clocks is sufficiently low relative to the propagation delay of the clock trees. However, if N is sufficiently large enough to force distribution of the FIFOs into physically separated areas, and if the clock tree propagation delay requires a significant portion of the cycle time for the 1/W clock, then the logical/physical implementation must consider achievable propagation delays given the necessary wire length and clock tree delays. In this case, achieving an implementation that meets both the skew requirements of the interface, and the chip timing requirements, is non-trivial. The present invention provides a solution to the above described problem.