In source-synchronous designs, a receiving device receives from a transmitting device both data and a clock characterizing the timing of the data. Examples of such devices include double data rate (“DDR”) synchronous dynamic random access memory (“SDRAM”), quad data rate (“QDR”) SDRAM, reduced latency DRAM (“RLDRAM”), and so forth. In addition, memory controllers designed to interface with such memories must implement a source-synchronous protocol in order to properly read and write data. Various non-memory devices may also implement source-synchronous designs.
In a typical read scenario for a source-synchronous memory, a read clock that accompanies read data from the memory needs to be forwarded to data sampling flip flops of the memory controller's physical interface (e.g., a transceiver or “PHY”). In the case of DDR3 and DDR4 SDRAMs, there may be one clock (“DQS”) per 4 bits of data (“DQ”) or per 8 bits of data. In the case of QDRII SDRAMs, there may be 36 bits of data (“Q”) per echo clock (“CQ”, “CQ#”) pair. Prior clock forwarding solutions suffer from excessive intrinsic delay on the read clock path due to wider interfaces. Significant power noise induced jitter can also accrue on the clock path to effectively reduce the maximum achievable data rates for wide interfaces. In addition, prior solutions also suffer from skew across the wider interfaces. Thus, in general, as the width-to-clock ratio of a memory interface increases, the maximum data rate of the interface generally drops.