In source synchronous systems, the data source transmits the data over a data path along with the clock over a clock path to an endpoint. If the clock and data paths are electrically matched (e.g., same wire lengths on paths with the same impedance), whatever skew incurred by the data signal as it propagates on the data path is matched by clock signal as it propagates on the clock path. In this fashion, the clock and data are synchronous at the destination despite any path timing distortion during propagation. As compared to distributed clock-tree techniques, the resulting data speeds for source synchronous transmission are typically much higher. Source synchronous data transmission is thus a popular technique for data transmission on a wide variety of microprocessors and systems-on-a-chip (SOCs).
As the complexity of SOCs increases, more and more sub-components are included that may all be source-synchronous endpoints. The synchronization of so many endpoints becomes very challenging, particularly as data transmission rates are increased. FIG. 1 illustrates a conventional SOC 100 including a memory controller 105 for transmitting source synchronous data to a plurality of endpoints, which may include physical interface modules for communicating with an external memory. For illustration clarity, only a first endpoint 115 and a final nth endpoint 120 are shown in FIG. 1. Each of the n endpoints receives a data signal and a clock signal pair over respective data and clock paths from memory controller 105. In that regard, memory controller 105 interfaces with or includes a clock source such as a PLL 110 that clocks data registers within memory controller 105 and corresponding data registers in the endpoints. The clock and data paths between memory controller 105 and the endpoints are all electrically matched to each other. For example, the path lengths are the same, the path impedances match, and so on. Such matching will in general keep the received clock and data pair synchronous with each other across all the endpoints. However, the distance between memory controller 105 and the endpoints may be relatively large such as several millimeters. To maintain the signal strength over such relatively long propagation paths, each of the data and clock paths may include a plurality of buffers. The buffering and the relatively long propagation paths as well as the inevitable temperature, voltage, and process variations across the die may cause the clock and data signal pairs to become skewed relative to each other and thus no longer phase aligned when received at the various endpoints despite the electrical matching of the clock and data paths. The skew between the data and clock pairs then becomes worse as they are launched from the endpoints to an external memory. At relatively low data rates, the skew may be tolerable in that it may be small as compared to the relatively long periods for the clock frequency at such low data rates. However, as the data rate goes ever higher, the source synchronous transmission shown in FIG. 1 may become untenable due to the errors caused by skew, jitter, and duty cycle distortion.
Accordingly, there is a need in the art for improved clock distribution architectures for systems with multiple source-synchronous endpoints.