Conventional integrated circuits (ICs) use timing signals to control sequences of events across a chip, and this is known as synchronous design. A clock signal is sent to each module on a chip and data signals are sent on separate lines. If a first module sends data to a second module following a clock transition Tn, then the second module will capture the data at the next clock transition, Tn+1. The use of synchronous design is a crucial factor in constraining the complexity of problems in integrated circuit design.
A typical system on a chip (SoC) may have a large number of clocks. All of the logic in all of the modules clocked by a single clock, and all of the logic and data connections between such modules is known as a single clock environment and this conforms to synchronous design. Other design techniques are employed for data signals that cross between logic or modules that are clocked by different clocks.
Whilst synchronous design between modules generally limits IC design problems there are problems with this approach. Firstly, for a synchronous chip to function correctly, a reliable clock signal has to be available across every part of the design. Tracks are generally used to transport the clock signal on a chip. High drive strengths will be required in order to overcome capacitance on these tracks. Lines with high drive strengths risk being cross-talk aggressors, meaning that they interfere with other lines on the chip. The solution is generally to limit the maximum length of any one track, and use repeaters where the track length is longer than this maximum length. However, in order to distribute the clock, there may well not be one long thin track, but a tree branch fan-out to a number of destinations. Driving a lot of circuit track has a number of undesirable effects including inducing cross-talk and transistor lifetime degradation. Furthermore, there are often design problems in driving many buffers to all destinations on a chip.
The problems with synchronous designs may be partially overcome in asynchronous designs, in which modules on a chip may operate at their own independent speeds, and no clock signal is transmitted between modules. However, an entirely asynchronous design is an extremely difficult proposition in practice, due to the uncertainty of when signals in the circuit are valid.
An approach has been proposed which is globally asynchronous, locally synchronous (GALS). This means that the logic in each module on a chip is synchronous while the connections between modules are asynchronous. This approach promises to solve timing problems and reduce power consumption, all without designers needing to learn fundamentally new skills or abandoning any of the existing huge investment in predefined, synchronous IP (Intellectual Property) circuit blocks. However, for asynchronous communication between modules on a chip, there are two basic requirements: the receiving unit has to know when to read the data line; and the sending unit has to know when it can send a new value. In synchronous designs these issues are controlled by the system clock, and by knowing the timing characteristics of the link, timing can be controlled such that these requirements are met.
Asynchronous design is significantly more difficult both for manual analysis and to automate because of the computational complexity. In synchronous design, it is only the final, settled output of each logic cone that needs to be analysed in terms of its logic value and timing; that is, what is the longest path that a transition could take to propagate through the logic cone and also what is the shortest path (used to determine how long the result will remain stable after a subsequent clock). This synchronous design analysis only need be performed at two process extremes, the slowest PVT (process, voltage, temperature) for the longest path and the fastest PVT for the shortest path. In synchronous design, it does not matter if the output of any cone of logic changes any number of times or glitches prior to the final settled output time because the resultant data is only captured once coincident with the following clock edge. However, in asynchronous design multiple output changes and glitches need to be avoided and the analysis for such needs to be performed across all variations of input timings and all combinations of timing paths through the logic cone.
In asynchronous design, the simplest approach is to use two wires—one for ‘1’s and the other for ‘0’s. When both wires are low, then no data is transmitted and the receiver knows that there is no data value on the wire. When either wire is set to high, the receiver knows that there is data on the wire, and depending on which wire is set to high, the data will be a 0 or a 1. Another approach is to have one wire designated as a clock or strobe, and the second wire (or collection of wires) carrying the data.
Particular problems arise when switching multi-wire asynchronous on-chip communications. It is possible to consider switching such communications, but this often involves a complex feedback path to communicate a handshake of each bit of data, or is limited by the number of switching elements that the separate wires can pass through before the difference in path delays for each wire and each gate within the switch becomes too large.
Consider normal serial data in a synchronous system. As explained above, either the clock is considered global, or the clock is routed alongside the data wire. At each switching node the potential misalignment between the data dn the clock increases and eventually becomes too large. The normal procedure is to limit the impact of switching by either limiting the clock frequency or by limiting the “size” (the number of cascaded switch elements) of the switch. The most common solution to this problem is to “retime” the signal at each switch step. Retiming involves capturing the data in a flip-flop and passing the output of that flip-flop, along with the clock, onto the next switch step. Thus one clock cycle of latency is added for each switch step. This is expensive, both in terms of latency and in terms of the area and power consumption of the flip-flop. Thus, the integrity of the data is corrected by resynchronizing with each clock.