One valuable feature (among many others) of modern network-on-chips (NoCs) is the ability to run part of the network at a first clock frequency and part of the network at a second clock frequency. This allows part of the network with high data transfer bandwidth requirements to run fast while other parts run more slowly, easing the timing closure challenge for engineers and electronic design automation (EDA) tools. Separate clock domains also allow parts of a chip to run at different frequencies depending on the data processing requirements, using a slower clock frequency to save dynamic power while data processing requirements are relatively low. Separate clock domains are also useful when parts of the network are a significant distance apart on the chip because clock tree insertion and balancing across significant distances is difficult. Allowing parts of the NoC to run just with localized clocks avoids that difficulty.
A network on chip employs a unit of logic known as a clock domain adapter to transfer data correctly between logic in two different clock domains. In particular, a clock domain adapter for transferring data between asynchronous clock domains is known as an asynchronous clock domain adapter. The logic of an asynchronous clock domain adapter generally comprises two portions: a sender that sends data and a receiver that receives data.
FIG. 1 is a simplified block diagram illustrating an example asynchronous clock domain adapter unit 100. The adapter unit 100 includes circular buffer 102, multiplexer 104, write control 106 and read control 108. Data sender logic is clocked in a sending clock domain while data receiver logic is clocked in a receiving clock domain. In adapter unit 100, the signals that pass between the sender and receiver include data elements in circular buffer 102, also known as a bisynchronous first-in-first-out (FIFO) buffer. The circular buffer 102 can be coupled to the multiplexer 104. The multiplexer 104 outputs data elements from the circular buffer 102 based on a read pointer (RdPtr) value. The write control unit 106 is configured to control a Gray coded write data counter for generating a write count (WrCnt). The read control unit 108 is configured for controlling a Gray coded read data counter 106 for generating a read count (RdCnt) and the read pointer (RdPtr).
Another valuable feature of an asynchronous clock domain adapter is the ability to power off part of the network-on-chip without causing functional failure or data loss or corruption for the rest of the chip. The ability to power off part of a chip is useful for saving power. Power-off is typically used when a processing function is not required. For example, a video codec intellectual property (IP) block might be powered off in a mobile device application processor when no video is being played.
The set of logic that is powered on or off together is known as a power domain. Within a wake-up sequence, some (usually most) of the logic in the power domain is reset to a known state. This enables the engagement of an appropriate data transfer protocol from a predictable state of operation. In conventional power disconnect units the logic on both sides (the already-awake and the waking-up) run on a common clock. This ensures that one side does not take on an unpredictable state while the other side is beginning to engage the communication protocol.
In conventional NoCs, an asynchronous clock domain adapter sender and an asynchronous clock domain adapter receiver can reside in the same power domain. If one is running while the other is powered off the adapter unit can take on an unpredictable state, leading to data loss or instability. Specifically, when an asynchronous clock domain adapter unit is powered on and reset, the state of WrCnt in the sender and the state of WrCnt in the synchronization registers of the receiver are both the same and the state of RdCnt in the receiver and RdCnt in the synchronization registers of the sender are the same and RdPtr is known to the write control unit 106. If one wakes up and is reset while the other is still running they would tend to reset with unsynchronized pointers, leading to data being sent twice or data being lost or other unpredictable behavior.
In other words, an asynchronous clock domain adapter unit and a power disconnect unit may operate correctly in series if there is no asynchronous clock domain adapter between the master and slave sides of a power disconnect unit and there is no power disconnect unit between the asynchronous clock domain adapter sender and the asynchronous clock domain adapter receiver.
When laying out a chip it is often desirable to have logic within a single clock domain localized within a common region. This is because it is difficult to insert and balance a clock tree when the clock nets extend over significant distances. It is also often desirable to have logic within a single power domain localized within a common region. This is because it is difficult to comingle the wiring carrying power from many different power supplies within a shared region of the chip. Basically, localization is valuable, and increasingly so as modern chips are designed with increasingly many power domains and increasingly many clock domains.
Furthermore, it is valuable to have a small number of network-on-chips. This is because the complexity of transferring data through a network-on-chip occurs at the edges of the network where packets are encoded and decoded and all of the interconnecting logic is relatively simple. The fewer network-on-chips, the less logic overhead is required for encoding and decoding packets. Such logic is expensive in silicon die area, logic path delay, and clock cycles of latency for transferring data. As a result, network-on-chips must span multiple, and usually many, clock domains and power domains.
FIG. 2 illustrates a network-on-chip with a physical distance between sides of a power disconnect unit downstream of an asynchronous clock domain adapter unit. An impediment to localization arises when an asynchronous clock domain adapter and a power disconnect unit are placed in series within a data link in a network-on-chip. FIG. 2 shows an initiator IN that request a write transaction to send data to a target TA through an asynchronous clock domain adapter sender SE REQ in the request path, an asynchronous clock domain adapter receiver RE REQ in the request path, a downstream disconnect unit master side manager MA, and a disconnect unit slave side manager SL. Response data is returned through SL and MA and an asynchronous clock domain adapter sender SE RSP and an asynchronous clock domain adapter receiver RE RSP. The physical placement of the units is such that MA and SL are far apart (indicated by the dashed line). In this configuration the units have good localization in the power domains but have poor localization in clock domains. The clock signal of clock domain Y spans the significant distance between the logic of MA and SL. This configuration challenges clock tree insertion.
FIG. 3 illustrates a network-on-chip with a physical distance between senders and receivers of asynchronous clock domain adapter units upstream of a power disconnect unit. More particularly, FIG. 3 shows a configuration of the same components but with MA and SL close together and with SE REQ separated from RE REQ by a significant distance and SE RSP separated from RE RSP by a significant distance. This configuration is preferable to that of FIG. 2 for clock tree insertion because no clock signals spans a significant distance. However, this configuration requires the power supply of power domain A to span the significant distance, which challenges supply rail routing.
FIG. 4 illustrates a network-on-chip with a distance between senders and receivers of asynchronous clock domain adapter units downstream of a power disconnect unit. More particularly, FIG. 4 shows a configuration in which the power disconnect unit is upstream of the asynchronous clock domain adapters in the request data flow. The asynchronous clock domain adapter unit senders and receivers are placed at a significant distance. This configuration suffers a supply rail routing challenge.
FIG. 5 is a network-on-chip with a distance between sides of a power disconnect unit upstream of an asynchronous clock domain adapter unit. More particularly, FIG. 5 shows a configuration with the same components but with the power disconnect unit master side manager and slave side manager placed at a significant distance. This configuration suffers a clock tree insertion challenge.
Some network-on-chips do not include a response path. For such configurations FIGS. 2-5 are applicable, but without the SE RSP and RE RSP components and no response data path.
FIG. 6 illustrates an example power disconnect unit. Request data from master to slave and response data from slave to master are connected as in any system of a single power domain, except that they are separated by power isolation cells. The SocketConn signal indicates to the slave that the master is connected and can send traffic. The SlvRdy signal indicates to the master that the slave can be safely powered off without the loss of transactions in flight. SocketConn and SlvRdy are also connected between master and slave through power isolation cells. The clock signal is generated in the power-on domain and connected to the power-off domain through an isolation cell.
The disclosed invention pertains, particularly, to networks of clocked logic. A unit with an ability to correctly transfer data between logic in a first clock domain and logic in a second clock domain is known as an asynchronous clock domain adapter. This is because the two clocks have no synchronized relationship to each other. The invention does not pertain to networks of asynchronous logic, also known as self-timed logic. Such networks transfer data without a corresponding clock signal.