As computer and other digital systems become more complex and more capable, methods and hardware to enhance the transfer of data between system components or elements typical continually evolve. Data to be transferred include signals representing data, commands, or any other signals. System components or elements can include different functional hardware blocks on a single integrated circuit (IC), or on different ICs. The different ICs may or may not be on the same printed circuit board (PCB). System components typically include an input/output (I/O) interface specifically designed to receive data from other system components and to transmit data to other system components. Generally speaking, existing I/O interfaces can be categorized into serial and parallel “links”. Regardless of the type of I/O interface, transferred data must be synchronized between system components for proper operation. Synchronization includes accounting for or compensating for several phenomena that potentially cause errors, including signal jitter and signal skew. The phenomena include differences between component clocks, and physical attributes of the data paths that create noise and affect the integrity of the transferred signal. Current approaches to handling serial I/O and parallel I/O interfaces deal with these data synchronization issues, but have limitations.
A typical serial link embeds clock information within the data stream and extracts the clock information at the receiver using a clock recovery scheme. Such schemes are also known as per-line closed-loop timing. Guaranteeing transition density requires encoding the data, typically using 8B/10B codes. A disadvantage of this approach is that it adds bandwidth overhead and increases complexity, which hurts performance and increases cost.
A typical parallel link sends a clock signal, or strobe, with a group of N data signals (for example, N may be 8 in a double data rate dynamic random access memory (DDR DRAM)). Depending on the data rate and the level of sophistication required, one of the following “source-synchronous timing” methods is used: the receiver simply samples the data with the strobe directly if the strobe has already been shifted by half a bit time relative to the data sent by the transmitter; or if the strobe is aligned with the edge of the data sent by the transmitter, the receiver delays the strobe by the same fixed amount across the group of data to sample the data eye at the nominal center.
Each of the two parallel link approaches require very tight matching of the trace impedance and trace length across the group of data and the strobe to achieve high data rates. To alleviate this, each bit receiver can delay the strobe by a different amount to place its own clock at the center of its own data. This is sometimes called per-bit deskew. A disadvantage of this parallel scheme is that the strobe (which is usually sent across a circuit board and distributed to the group of data) is noisy, thus reducing the system timing budget. In addition, the receiver simply uses or delays the strobe, which adds jitter rather than filtering jitter. In some implementations, a strobe is transmitted for each data bit rather than for a group of bits, which increases pin counts and cost.