Source synchronous communication standards are important to enable high-speed data transfer between devices. Board skews and delay variation make it challenging to complete a synchronous transfer with a single central board clock or even a single clock forwarded with a large number of data bits. Consequently, what is typically done is a large data bus is divided into small groups of bits and a clock or strobe associated with each group of bits is forwarded along with the respective data. An assumption is made that any board skew or delay variation will affect both the clock or strobe and data bits in each group such that the clock or strobe can be reliably used to capture the respective data.
One issue with this approach is that data synchronized to various different clocks or strobes must often be synchronized to a single clock in the receiving device to facilitate data processing on all the data received. There are a few known approaches that have been used to achieve this in programmable-logic devices (PLDs), or, more specifically, field-programmable gate arrays (FPGAs).
Run-time controllable delay chains may be used on the input data paths to delay the data as necessary so it can be successfully captured by a single clock in the receiving device. To achieve this, it is important to determine the phase relationship between the incoming data and the clock in the receiving device. This can be done on a group basis (data bits and associated clock/strobe) by sampling different delayed versions of the clock/strobe with the clock in the receiving device. Using that information, the data can be appropriately delayed to facilitate reliable capture. The disadvantage of this approach is the complexity associated with the hardware needed to support dynamic delay calibration to adjust delays for process/voltage/temperature variations. There can be additional complexity in the controller logic to keep the data capture reliable and ensure all the data is aligned.
In another approach, the clock within the receiving device can be adjusted so that the data can be reliably transferred directly from the clock/strobe domains to the receiving device clock domain. This approach may be combined with circuitry (in the IO periphery of FPGAs) that capture the data using the strobe and de-serialize it so that the data is still synchronous to the strobe, but it toggles at a more manageable frequency (which is desirable for FPGAs that have slower core logic speeds than comparable ASICs). That lower-frequency data is then re-synchronized to a receiving device clock domain. A disadvantage with this approach is that it can be difficult if not impossible to determine a single clock phase within the receiving device that will suit all the clock/strobe domains at high speeds.