Modern data processing systems require the rapid transfer of data between integrated circuits (“chips”). For example, a central processing unit (CPU) transfers data to the memory system, which may include a memory controller and off-chip cache. In a multi-CPU system, data may be transferred between CPUs. As CPU speeds increase, the speed of the interface between chips (bus cycle time) becomes a limiting constraint because latencies across the interfaces may exceed the system clock period.
When data is launched from one chip to another chip, it can be launched simultaneously within numerous clock/data groups. Each clock/data group consists of multiple data bits and a clock signal, each of which travels over an individual conductor. Due to process variations and varying conductor lengths, the individual bits within a clock/data group may arrive at the receiving chip at different instances. Therefore, the individual bits of data and the clock within a clock/data group must be realigned upon arrival on the receiving chip. At the receiving end, the clock/data signals can be delayed to align the signals with respect to a sampling edge of the received clock. While it is necessary to align the individual data bits within a clock/data group at the receiving end, such delays can cause jitter and other forms of distortion. In addition to causing jitter and distortion, delaying data signals can require extensive administrative overhead and additional circuitry.
In order to process a plurality of skewed data bits, some systems employ an elastic interface. Some systems of elastic interface incorporate a per-bit de-skew mechanism in which the slowest (latest-arriving) bit in a clock group is identified and all earlier bits have delay added such that they become just as late as the latest bit. The sampling clock is then delayed such that it is centered on the de-skewed data bit's “data window” or “data window.” This de-skew method therefore requires extensive state machine-based overhead to identify the latest bit, and also requires data delay lines that are long enough to cover the maximum skew between the earliest and latest bit in the clock group. It then requires more state machine-based overhead to identify the edges of the data eye along with overhead for calculating a clock delay value that would center the clock on that eye. Furthermore, the clock edge which launches the data at the driver is also the clock edge which captures the data on the receiver, and as such, tight controls of the relationship between the clock and data path are required to achieve optimal performance.
With many elastic interface designs, a double data rate (DDR) signaling method is employed, with the “even” data beats launched on the rising clock edge and the “odd” data beats launched on the falling clock edge. This scheme gives rise to different alignment situations. When the data arrives at the receiver, each bit may have a different amount of delay, and if a bit is de-skewed to the closest clock edge, it may be sampled with either a rising or falling edge (whichever is closest). Similarly, odd data beats might be de-skewed and sampled on either rising or falling clock edges. These de-skew/sampling situations give rise to different methods of fully aligning all the bits on the bus to optimize different performance aspects.
Thus, there is a need in the art for methods and apparatuses that enable choosing from among more than one alignment modes in elastic interface systems.