As the computer, network, and software industries advance, there are increasing demands for high-speed and high-bandwidth inter-chip and inter-chassis interconnections. Bandwidth and latency of data flow are increasingly becoming limiting factors in increasing system performance. Traditional I/O technology, such as TTL, GTL, and HSTL can not provide the data transfer rates required by emerging system bandwidth requirements. Serial interconnects such as fiber-channel do provide Gigabit data rates, but with only one pair of differential wires. The emerging system requirements demand Gigabit data rates for each bit of a multiple-bit parallel channel. High-speed serial channels with clock forwarding operate with the benefit of a dedicated Delay Lock Loop (DLL), or Phase Lock Loop (PLL), for the channel""s single bit. For parallel multiple-bit channels, a dedicated DLL per bit would be very expensive. Thus fiber-channel, per se, cannot meet new bandwidth demands and additionally has problematic power consumption and latency.
A typical emerging requirement calls for transfers of the equivalent of 80-bits worth of data at 200 MHz. Even when the clock is forwarded in parallel with the data (clock forwarding), it is very difficult to predict and match clock timing across all incoming data for such high-speed parallel interconnects. In present systems with only a single clock for multiple parallel data bits, skew between the data bits cannot be tolerated. In the prior art systems, the data rate has been limited by transmission error rate requirements to those rates where the skew inherent in the channel configuration is negligible compared to the width of a bit-time.
The present invention optimizes clock timing for each received channel bit while only using a single DLL for all received bits. It thus provides the benefits of a dedicated DLL per received channel bit, without the associated cost. (In a preferred embodiment each group of transmitter bits also employs an additional associated PLL.) The present invention permits data rates as high as 1.6 Gbit/s on each pair of differential wires of a multiple-bit parallel channel. This rate is achieved even with data skew between data bits as large as 2-bit-times. In a preferred embodiment, the invention uses an 8-to-1 data serialization circuit in the transmitter to convert 80-bit parallel 200 MHz data to 10-bit parallel 1.6 Mb/s data. The data is signaled over a multiple-bit parallel channel that uses 10-bits transmit, 10-bits receive, and a forwarded clock in each of two directions. On the receiver side, the serialized data are captured using a forwarded clock and de-serialized. A DLL generates 16 master phases without reference to the word boundaries of data being transmitted. These 16 unreferenced phases are input to a phase rotator that, via a series of calibration steps, maps the unreferenced phases into named phases, and in doing so references the phases to the word boundary of the data being transmitted over the slowest data line of the parallel channel. The named phases are then input to a data interpolator in each receiver, which generates 16 local phases. The 16 local phases correspond to the data-bit centers and data-bit edges for each of the 8 bits transferred per miajor channel clock period. In a bit-centering calibration step, a training pattern is evaluated by each receiver and each data interpolator dynamically adjusts a delay applied to the 16 local phases to establish the local center-data phases in the center of the bits received by the corresponding receiver. In an additional calibration step, on a per-wire basis, 8 contiguous bits are selected as the data outputs from a 10-bit window. The local center-data phases are used to serialize and de-serialize the channel data for the receiver.
The present invention finds particular application in the design of the channel interface circuitry for contemporary high-speed multiprocessor systems, such as those disclosed in the applications previously incorporated by reference above.