Modern data processing systems require the rapid transfer of data between integrated circuits (“chips”). For example, a central processing unit (CPU) transfers data to the memory system, which may include a memory controller and off-chip cache. In a multi-CPU system, data may be transferred between CPUs. As CPU speeds increase, the speed of the interface between chips (bus cycle time) becomes a limiting constraint because latencies across the interfaces may exceed the system clock period.
When data is launched from one chip to another chip, it can be launched simultaneously within numerous clock/data groups. Each clock/data group consists of multiple data bits and a clock signal, each of which travels over an individual conductor. Due to process variations and varying conductor lengths, the individual bits within a clock/data group may arrive at the receiving chip at different instances. Therefore, the individual bits of data and the clock within a clock/data group must be realigned upon arrival on the receiving chip. At the receiving end, the clock/data signals can be delayed to align the signals with respect to a sampling edge of the received clock. While, aligning the individual data bits within a clock/data group at the receiving end is necessary, such delays can cause jitter and other forms of distortion. In addition to causing jitter and distortion, delaying data signals can require extensive administrative overhead and additional circuitry.
On many high-speed communication interfaces it is necessary for the transmitting entity to send a “training” pattern to the receiving entity for the receiver to properly align and synchronize with the driver. This training pattern may consist of a repeating multi-beat pattern consisting of a single ‘one’ followed by n “zeroes”, where n is a function of the receivers FIFO depth—typically 3 or 7 for a 4-bit deep FIFO, and 7 or 15 for an 8-bit deep FIFO. For many elastic interfaces, such training patterns are used to align (de-skew) the interface and to estimate the driver-to-receiver latency (also commonly referred to as the “target time” or “target cycle”). This method of sending training patterns, in its simplicity, has some disadvantages. First, the patterns have very few data transitions. Further, the transitions, denoting the edge of the data eyes, are used to align the bus. Second, because there is only a single 1 in a field of 0s, there is the potential for some distortion causing narrowing of the lone pulsed ‘1’ due to various circuit and transmission line related effects.
Some elastic interface designs require the bus to first be quiesced and a training pattern be transmitted while the bus is idle from functional state. Such quiescing of the bus can require a substantial amount of administrative overhead and thereby cause a loss of performance. Further, such systems often require dedicated, non-elastic, between-chip wires that tend to introduce many bugs related to stopping and restarting interfaces. Therefore there is a need for methods and apparatus that overcome such noise problems associated with training patterns at high speeds.