The field of the present invention pertains to data communications between digital systems. More particularly, the present invention relates to a method and system for high performance source synchronous data communication between integrated circuit devices.
The field of data communications represents one of the most rapidly evolving technologies in wide spread use today. Data communications and data processing has become important to virtually every segment of the nation""s economy. Whole new industries and companies have organized around the need for, and the provision of, data communications. Through the use of specialized semiconductors for signal processing and data compression, various multimedia applications are evolving which orient data communications to the transport of voice, data, and video information, the types of information desired by the everyday consumer.
Recently, the computer and data processing industries are seeing a large expansion in the requirements for high performance, high speed data communications between multiple integrated circuits on, for example, printed circuit boards. For example, it is becoming increasingly common to implement high-performance digital systems using multiple integrated circuit modules, or chips, interconnected on a high-speed printed circuit board. The multiple chips are typically highly integrated, having several million transistors per chip, and operating at very high speeds (e.g., 500 MHz or above). With such technology, the speed and integrity of the data communications between the chips becomes very critical.
Data is commonly transferred between computer systems and terminals by changes in the current or voltage on a metal wire, or channel, between the systems. These interconnections are typically etched into the material of the printed circuit board itself. A data transmission in which a group of bits moves over several channels simultaneously is referred to as a parallel transmission. A data transmission where the bits move over a single channel, one after the other, is referred to as a serial transmission. Computers and other data processing systems which are located on a single printed circuit board normally use parallel transmission because it is much faster.
As the level of integration and the operating speeds of the multiple chips on printed circuit boards increases, the transmission of data between and among the multiple chips via the channels of the printed circuit board suffer a number of limitations. One such limitation is due to the fact that most digital systems are designed to operate synchronously with respect to the individual integrated circuits which comprise the system. For example, the multiple chips coupled to a printed circuit board are typically designed to operate synchronously with respect to one another, using well-defined clock frequency and phase relationships. However, as the operating speeds of the multiple chips increases, the tolerance of the system for xe2x80x9ctiming skewxe2x80x9d among the multiple data channels decreases. The timing relationship between, for example, the clock signal shared among the multiple chips and the corresponding data signals conveyed across the channels becomes increasingly critical. Prior Art FIG. 1 below illustrates this problem.
Prior Art FIG. 1 shows a typical high-speed multichip device 10. Device 10 includes a first chip 11 (e.g., chip 1) a second chip 12 (e.g., chip 2). Chips 11 and 12 are communicatively coupled via data channels of a printed circuit board. One such data channel 14 is shown. Chips 11 and 12 share a common clock signal 16 and operate synchronously with respect to clock signal 16.
As described above, as the operating speeds of multichip device (e.g. system 10) increases, the tolerance of the system for timing skew among the data channels and with respect to the clock signal decreases. FIG. 1 depicts this problem. The high level of integration of chips 11 and 12 causes a clock insertion delay, depicted as clock insertion delay 15, as the clock signal 16 propagates among the millions of transistors comprising chips 11 and 12. As the clock signal reaches logic elements 17 and 18 deeply within chips 11 and 12, the phase relationship of the output of logic element 17 and the input of logic element 18 can vary significantly. For the device 10 to remain synchronous, the output of logic element 17 needs to be received at the input of logic element 18 prior to the next cycle of clock 16. The propagation delay, Tpd 13, from the output pin of chip 11 to the input pin of chip 12 constitutes a significant portion of this delay.
As system 10 is designed, engineers account for the various delay factors in designing system 10 to operate at its maximum speed. For example, the delay of clock 16 propagates to each chip is accounted for by, for example, precisely defining the length of the channels transmitting clock 16 to each chip. Similarly, the length of the data channels, such as data channel 14, between the chips is precisely defined. However, the clock insertion delay incurred in each chip as clock 16 propagates among the millions of transistors comprising the chips cannot be as precisely controlled. Numerous variables (e.g., fabrication process variation, temperature, voltage fluctuation, etc.) affect the propagation delay, and unfortunately, many of these variables cannot be precisely ascertained or controlled. The variables affect the xe2x80x9csetup-and-holdxe2x80x9d timing tolerances of the overall device.
Prior Art FIGS. 2A-2C illustrate the setup-and-hold timing tolerance problem. FIG. 2A shows a typical logic element 21 as contained in chips 11 and 12. Element 21 depicts an edge triggered flip-flop having a data input, a data output, and a clock input as shown. FIG. 2B shows a diagram of the proper timing relationship between data 22 and the clock signal 23. As depicted in FIG. 2B, ideally, the rising edge of the clock signal 23 is placed such that perfectly corresponds to the setup time 24 of the data input 22 and the hold time 25. This provides the maximum likelihood that the correct value of the data input is clocked into logic element 21. FIG. 2C shows a diagram of an improper timing relationship between data 22 and the clock signal 23. In this case, the phase relationship between the clock signal 23 in the data 22 has deteriorated such that the setup and hold times 24 and 25 are not properly placed with respect to the phase of the data signal 22. In this case, the rising edge of clock signal 23 does not correspond to the correct value of the data input 22, leading to xe2x80x9cindeterminatexe2x80x9d operation of the logic element 21. This deterioration is typically due to the uncontrollable variables described above (e.g., fabrication process variation, temperature, voltage fluctuation, etc.).
Hence, a significant amount of uncertainty exists regarding the maximum possible speed of the multichip device, which leads to extensive testing to determine xe2x80x9csafexe2x80x9d operating margins, device malfunctions, and/or less than optimal device configurations. Device 10 must be engineered such that it retains enough margin to ensure proper operation taking into account performance variables such as process variation, temperature, and the like.
One attempted solution creates individual serial data bit streams out of each channel. This scheme encodes the clock signal directly into the bit stream, recovering the clock signal at the receiver and reconstructing the data word through signal processing techniques. This system requires complex (e.g., expensive) signal processing at the transmitting chip and the receiving chip and is thus generally impractical for printed circuit board type devices.
Another attempted solution performs a complex set of analyses on test signal patterns on each of the channels between the multiple chips. The results of the analysis are used to reconfigure compensation or filter circuits between the chips to account for the respective skew distortion in each channel. One such technique used for multichip devices involves custom configuring the length of the clock signal channel with respect to the data channels. The problem with this solution, in addition to its excessive expense, is that the propagation delays causing the skew are dynamic. As described above, a number of the variables that affect the propagation delay are not constant from device to device (e.g., process variation, temperature, voltage, etc.) and the variables themselves are constantly changing.
Thus, what is required is a method and system which overcomes the limitations of prior art source synchronous multichip device implementations. The required solution should accurately and reliably compensate for skew distortion caused by propagation delay (e.g., the clock signal skewing from the proper phase relationship with the data signals). The required solution should realize higher clock speeds for a given multichip implementation than possible with prior art systems. The required system should minimize the effects of process variation on skew distortion. The required system should not require extensive and complex testing to characterize the propagation delay of the clock signal or excessively interrupt data transmission for channel testing. The present invention provides a novel solution to the above requirements.
The present invention provides a method and system that overcomes the clock skew distortion limitations of prior art source synchronous multichip device implementations. The present invention accurately and reliably compensates for clock signal skew distortion caused by propagation delay. The present invention realizes higher clock speeds for a given multichip implementation than possible with prior art systems. Additionally, the present invention is able to minimize the effects of process variation on skew distortion, does not require extensive and complex testing to characterize the propagation delay of the clock signal, and does not excessively interrupt data transmission for channel testing.
In one embodiment, the present invention comprises a clock edge placement circuit for implementing source synchronous communication between integrated circuit devices. The clock edge placement circuit includes a delay line having an input to receive a clock signal from an external clock source. A corresponding output is included to provide the clock signal to external logic elements. The delay line is adapted to add a propagation delay to the input, wherein the propagation delay is sized such that the phase of the clock signal is adjusted to control synchronous sampling by the external logic elements. The delay line is configured to allow the dynamic adjustment of the propagation delay such that the phase of the clock signal at the output remains adjusted to control synchronous sampling by the external logic as variables affecting the phase of the clock signal change over time. A plurality of taps are included within the delay line, wherein each tap his configured to add an incremental delay to the input to generate the variable delay.
By dynamically adjusting the propagation delay between the input and the output using the taps, the clock edge placement circuit of the present invention overcomes the clock skew distortion limitations of prior art source synchronous multichip device implementations. The present invention accurately and reliably compensates for skew distortion caused by propagation delay within the chip. For example, instead of distributing a common clock signal in parallel among the multiple chips of the multichip device, the clock signal can be transmitted directly, along with the data, from chip to chip with the assurance that the clock edge placement circuit will adjust the phase of the clock signal in accordance with the setup-and-hold times required to maintain reliable synchronous sampling.
In so doing, the clock edge placement circuit allows the integrated circuit devices to realize higher clock speeds for a given multichip implementation than possible with prior art systems. The dynamic edge placement process of the clock edge placement circuit its able to minimize the effects of process variation on clock skew distortion, does not require extensive and complex testing to characterize the propagation delay of the clock signal, and does not excessively interrupt data transmission for channel testing.