Enabled by sub-micrometer semiconductor device fabrication technologies MPSoCs have become a key component for modern communications and computation systems. Lately, the trend to integrate more and more processing cores in a single silicon die has gained momentum, aided by promising benefits in mechanical footprint, computational performance, energy and cost efficiency. Hence, increasing the number of cores directly translates into high performance through parallel processing and high efficiency compared to single-core solutions.
Nowadays, hundreds of thousands of cores are integrated on one single chip. To ensure a stable and well defined system a common synchronization strategy is to separate clocking of the processing blocks. The globally asynchronous locally synchronous (GALS) clocking yields a simplified clock tree and allows clock generation on-chip to minimize the number of required I/O pins. Hence, the clock frequencies and supply voltages, within a heterogeneous MPSoC, can be dynamically adjusted per core. However, the flexibility, scalability and other benefits of GALS clocking technique goes along with performance penalties caused by additional communication latencies between disjoint clock domains. This exactly describes the bottleneck of the GALS approach.
In contrast, for high performance microprocessors a globally synchronous design as shown in FIG. 1, where all cores (11) of a clocking network (13) share one master clock (12), is used. The communication latencies between cores are drastically reduced compared to GALS clocking. Considering next generation MPSoCs, a very large chip area has to be clocked synchronously. Implementing a master clock based clock tree, see FIG. 1, the clock signals within MPSoCs have to be transmitted over ranges of some millimeters, which is a well-known bottleneck for speed, power and reliability. Furthermore, traditional globally synchronous clocking circuits have become too difficult for large MPSoCs with many cores, constantly growing chip size and wire induced delays. In addition, the clock trees consume a significant amount of power which is critical for mobile communication systems.
Both clocking techniques, GALS and the globally synchronous design, reach their limits at large scale networks like massive Multiple-Input Multiple-Output (MIMO) systems and MPSoCs.
Another strategy for network synchronization and clock distribution relates to self-organized synchronization of distributed network nodes in absence of an entraining master clock.
“Mutually connected phase-locked loop networks: dynamical models and design parameters” by F. M. Orsatti, R. Carareto, J. R. C. Piqueira, IET Circuit Devices Syst., 2008, Vol. 2, No. 6, pp. 495-508 relates to distributing clock signals by using mutually connected architectures instead of master-slave type architectures. A mathematical model of mutually connected digital PLL networks is studied numerically; with the class of phase detectors restricted to JK flip flop phase detectors and charge-pump phase detectors. With the setup described in Orsatti et al., it is impossible to build a mutually connected network with three or more nodes by using XOR PDs. Additionally, signal transmission times are explicitly neglected. Conditions for the existence of synchronized states are derived, depending on individual node parameters and network connectivity, considering that the nodes are nonlinear oscillators with nonlinear coupling conditions.
“Multiple synchronous states in static delay-free mutually connected PLL networks” by F. M. Orsatti, R. Carareto, J. R. C. Piqueira, Signal Processing 90 (2010) 2072-2082 relates to mutually connected networks of digital phase-locked loops. A mathematical model of mutually connected digital PLL networks is studied numerically, with the class of phase detectors restricted to JK flip flop phase detectors. Even for static networks without delays, different synchronous states may exist for the network.
However, these papers deal with networks for which a time delay between oscillators is not present or negligible. Moreover, in both papers, the class of phase detectors is restricted to JK flip flop and/or charge-pump phase detectors. Hence, the solution presented there does not include networks with different types of phase detectors and cannot be applied to networks exhibiting a significant time delay between network nodes.
WO 2013/178237 A1 relates to a communication network of interconnected communication nodes, each node comprising an oscillator that is mutually coupled to oscillators of other communication nodes. The oscillator generates periodic synchronization pulses. The communication node further comprises a transmitter for transmitting the synchronization pulses to other communication nodes; a receiver for receiving synchronization pulses from other communication nodes; and a synchronization unit for synchronizing the phase of the synchronization pulses generated by the oscillator with the phase of the synchronization pulses received from other communication nodes by adjusting the phase of the synchronization pulses generated by the oscillator upon receipt of synchronization pulses from other communication nodes. The synchronization unit adjusts the phase of the synchronization pulses generated by the oscillator in such a way that a guaranteed network-wide synchronization is achieved for all communication nodes of the communication network.
However, WO 2013/178237 A1 explicitly limits a transmission time delay of the synchronization pulses between the communication nodes to one eighth of the period of the oscillator. Hence, this disclosure does not provide a suitable solution for networks exhibiting a transmission time delay exceeding one eighth of the period of the oscillator, e.g., highly integrated chip networks. Moreover, this solution assumes pulse coupling. Stochastic synchronization pulse emission is required to guarantee synchronization. Hence, this solution is not suitable for clock distributions with time-continuous coupling.
US 2009/183019 A1 relates to a system with multiple clock islands, each clock island clocked by a common clock generator. A predetermined amount of the clock skew may be introduced by programmable delay elements to smear out, over time, instantaneous power supply current demands of respective logic. Moreover, additional delayers are used to compensate for the clock skew between different clock islands for the purpose of information transmission.
Hence, US 2009/183019 A1 aims at establishing a clock skew in a system with a single clock generator using programmable delay elements.