1. Technical Field
The present invention relates to high speed digital communication systems, and more specifically to a high speed serial interface for ASIC to ASIC data transmissions.
2. Background Art
In high speed digital systems, it is often necessary to send digital information between discrete Integrated Circuit (IC) packages. Often, this issue becomes the critical bottleneck in overall system performance. This is readily apparent when considering communications systems where the overall throughput of the system (also referred to as the bandwidth of the system) is a fundamental limitation in determining how successfully the system can perform its intended application. One solution for increasing bandwidth is to design systems with more input/output (I/O) hardware to increase throughput. However, increasing the hardware requirements has a direct impact on cost in that the lower the overall bandwidth of a given IC package, the more IC packages become necessary to fulfill the total bandwidth requirement for this system. Thus, a preferred solution is to provide IC's with a greater package I/O bandwidth.
Unfortunately, there is a limit on the available bandwidth of any given IC package I/O due to the fact that as data transfer rates (i.e. frequency) between IC's increase, the reliably of the transferred data decreases. Reliability becomes particularly problematic in high speed transfers where a predictable phase relationship between the transmitted data, the received data, and the clocks on both the transmit and receive sides must be maintained in order to provide a synchronous timebase for the two discrete packages.
With older process technologies, it was fortunate that the package I/O bandwidth was typically greater than the computational bandwidth, i.e., the inherent bandwidth limitations imposed by the operating performance of the process technology. Typical I/O bandwidths approaching 50 million bits per second (Mbs) per package pin were achievable, and were adequate to handle the slower computational bandwidths. Unfortunately, today's modem sub-micron process technologies provide internal ASIC (Application Specific Integrated Circuits) operating performance that results in computational bandwidths that far surpass the available package I/O bandwidths required in today's systems. Techniques, such as transmitting data on both edges of the clock, push achievable bandwidths to 200 Mbs per package pin, but are still far from what is desired.
There have been several previous attempts at solving the above problem. The most obvious is to increase the number of package I/O connections. The problem with this solution is that the number of package I/O pins required to result in an I/O bandwidth that can provide adequate package to package bandwidth that will meet the requirements of the internal processing capabilities can lead to a very large pin count, and hence very expensive packages. For the rare application, this may be acceptable, but for most applications this solution is prohibitively expensive. Other related solutions have centered around developing better package and/or ASIC I/O technology that provides a higher per pin package bandwidth. An example of this technology would be Low Voltage Differential Signal (LVDS) I/Os that use, as the name implies, a pair of package pins that switch differentially within a limited peak to peak voltage range. Even though the LVDS I/O requires two package pins per I/O, the currently achievable per package pin bandwidth is 400 Mbs, with 600 Mbs on the near horizon (and 1000 to 1200 Mbs desired within the next couple of years).
The LVDS I/O (and others like it) would seem to solve the package I/O to package I/O bottleneck. However, as the package pin bandwidth is increased, the time duration of each data bit accordingly decreases since the ideal bit time is one half of the transmitting clock period if data is transmitted on both edges of the clock. Therefore, the time it takes for data to travel from one package to the next becomes a critical factor in the realistically achievable per pin bandwidth. Since the data transmitted is synchronous to its clock domain, the receiving ASIC must be able to acquire the data synchronous to the same clock domain and within a predictable phase-time shift in order to reliably retrieve the data without inserting errors or extracting unnecessary information. At high bit rates (>300 Mbs), one of the major issues to contend with then becomes the overall time it takes to transmit data from one ASIC to another since this time can become a significant portion (if not greater than) the ideal duration (in time) of the data bit.
There have also been several attempts at solving this problem of maintaining phase coherence between the data transmitted from one ASIC to the same data received at another ASIC when each ASIC has its own clock domain. One solution is to transmit the clock along with the data from the transmitter ASIC to the receiver ASIC, therefore requiring another pair of package pins for this transmitted clock. This results in an achievable per bit bandwidth half that desired unless multiple data bits are transmitted along with one clock. In this manner, the per pin bandwidth consumed by the LVDS I/O pins used for the clock is averaged over the total per pin bandwidth of the associated data to minimize the adverse effect the clock has on the achievable per package pin bandwidth. The problem with this solution is that tight phase coherence must be maintained over all of the LVDS I/O to LVDS I/O connections for the data and the clock. Maintaining this phase coherence for a significant ratio of data connections to clock connections (e.g., >8) is extremely difficult for high bandwidth transmission (e.g., >500 Mbs per data bit=250 Mbs per package pin) since everything in this connection path must have exactly matched lengths.
Another solution has been to “imbed” the clock in the data stream by guaranteeing enough data transmissions to enable the use of a Phase Locked Loop (PLL) to recover the original clock, and hence the phase of the data received by the receiver ASIC. The problem with this technique is that a unique PLL is required for every data input at the receiver ASIC. Since there is a limitation on the number of PLLs allowed per ASIC (typically around four), this limits the number of connections between the transmitter and receiver ASIC to three (one PLL is typically required for the ASIC system clock.)
Another type of solution is to pass the data received at the receiver ASIC through a delay chain comprised of many delay elements, each having a minimum amount of delay available in the process technology that the receiver ASIC is implemented in. A special sequence of data (called a “training sequence”) is transmitted from the transmitter ASIC. As the receiver ASIC receives the training sequence, the output of each delay element is captured into a storage element. By analyzing this captured output the receiver ASIC could pick out the delay element output that establishes a relative phase match that is as close to the center of the data “eye” (i.e., most centered between bit transitions) as possible. This technique has the advantage of eliminating any phase coherency requirements between data inputs at the receiver (since each input is individually phase aligned), but has a couple of significant drawbacks that limit its usefulness.
The first of these is that the capability of canceling out the absolute phase shift of the data is inherently dependent on the delay chain. Since the per element delay in the delay chain will vary over the permissible process, voltage, and temperature range for the ASIC, the ability of the data delay chain to compensate for absolute phase shift will vary as these three factors change. A second problem is that the solution is inherently frequency dependent. Since it is required to store >2 bits of information in the delay chain to facilitate bit transition detections (2 are required), the number of delay elements in the delay chain must be therefore large enough to ensure this even under the conditions which produce the largest per element delay. If a wide range of operational frequency is desired, then the delay chain must be designed to contain these necessary >2 bits at the slowest frequency (where the per bit time is greatest). Once the design is implemented (i.e., fabricated), it cannot be guaranteed to operate at less than the lower limit of the frequency range it was designed for. To provide a reasonably large range of operation, the delay chain must therefore contain a very large number of elements.
Yet a third problem is that this type of implementation has an inherent limitation in its ability to track for transmitter to receiver phase delay drift over time without significantly increasing the number of delay elements. Phase delay drift compensation can only be accomplished by first providing a mechanism for monitoring the movement of data transitions within the delay chain. Subsequent phase delay drift compensation is accomplished by changing the delay element output used to return data to the system (thereby remaining in the center of the data eye). The continued monitoring of phase delay drift requires data transition detection from this new delay element output. Therefore it can be seen that any appreciable amount of phase delay compensation would necessarily result in significantly more delay elements in the data delay chain. Of course, all process, temperature, voltage and frequency issues as discussed above still apply.