A digital circuit is composed of two types of components: combinational and sequential. As shown in FIG. 1A the combinational components 1A10, 1A20, 1A30, and 1A40 implement Boolean functions, whereas the sequential components 1A50, 1A60, 1A70, and 1A80 act as memory elements that store the state of the circuit. The sequential components are usually implemented with flip-flops 1A50, or latches 1A60, and 1A70, or sometimes combinations of latches 1A80 in a master/slave arrangement. Most digital circuits use one or more clocks to synchronize the events produced in their components. Flip-flops are activated by one of the edges of a clock (rising or falling). Latches are activated by one of the levels of the clock (high or low). A register is a group of flip-flops or latches.
In conventional synchronous design, and as shown on FIG. 1B at 1B00, a clock 1B10 is a periodic signal with a period P 1B20 that is longer than the longest delay D 1B30 between pairs of sequential components 1B40, 1B50 separated by combinational logic 1B60. The frequency of a clock is generated externally from the circuit. For correct operation in real systems, the clock signals must be designed in such a way that the clock pulses arrive at the sequential components within close tolerances. In real systems, the sequential components require some finite duration tS (setup time) 1B80 during which duration the signal to be stored is stable (unchanging). It is only after this duration tS during which the signal to be stored has been stable that the signal can be reliably stored in the sequential component. Similarly, a finite duration tH (hold time) 1B90 is required for a stored value to propagate to the sequential element output after a clock edge. That is, for the sequential components to operate correctly, the setup and hold constraints must be satisfied before and after the corresponding active edges of the clock. As shown in the graph of FIG. 1C, the logic propagation delay of the components of the circuit may vary from one component to another due to process variations, and may vary dynamically due to environmental conditions (e.g. temperature and power supply variations). As shown, the transition from logic 1 to logic 0 under nominal conditions is shown as trace 1C10. Under conditions of low voltage, the transition requires more time to complete as shown in traces 1C20 and 1C30. Under conditions of high temperature, even at nominal voltage, the transition requires more time to complete as shown in trace 1C40. Also, the propagation delay through the combinational components may vary depending on the data involved in the computations. In a synchronous system using a global clock signal, the frequency of the clock signal must be defined in such a way that it can accommodate the worst-case delays under any process, environmental and operational conditions. For these reasons, designers are conservative in their design of synchronous circuits and, in the actual circuitry, the clock often runs at a frequency slower than the one it could run at if it could dynamically adapt to any combination of process variations, data variations, and changing environmental operating conditions.
Referring to FIG. 1D, while synchronous circuits such as those depicted in 1D00 include an external global clock, and thus operation is subject to the full range of variations and conditions explained above, asynchronous circuits 1D50 do not use a global clock for the synchronization of the sequential components. Instead, in asynchronous circuits 1D50, the clocking scheme is based on local handshakes between communicating components 1D60, 1D61, 1D70, and 1D71). Such schemes are typically implemented by a pair of signals called Request (Req) 1D80 and Acknowledge (Ack) 1D90. The events of these signals are used to perform data transfers between a sender and a receiver through a communication channel. Each event indicates a specific state of the channel and the data associated with it.
Mathematical techniques involving Petri Nets, specifically a type of Petri Net known as Marked Graphs (MGs) have been used in formally describing and analyzing systems with states and events. FIG. 1E at 1E00 shows a schematic representation of a pipeline with memory elements 1E01, 1E02, 1E03, and 1E04. The Marked Graph representation showing the same memory elements (states) 1E01, 1E02, 1E03, and 1E04 is depicted at 1E10, followed by alternate notations 1E30, and 1E50, (each using slightly different MG notation styles) of the event transitions possible in the marked graph of 1E10.
More specifically, the Marked Graph of 1E10 shows events abstracted as A, B, C, and D. The technique for creation of these events is not depicted in the Marked Graph of 1E10, and of course the Marked Graph is intended to be an abstraction that is unconcerned about the realization techniques for those events.
The paragraphs above have introduced external clocks and handshake signals, both techniques are able to create events. As regards the use of Request and Acknowledge signals for creating events, traditionally, two families of protocols have been proposed for real system realization of the Request and Acknowledge signals, namely (a) four-phase protocols, and (b) two-phase protocols. In four-phase protocols, only one of the edges of each the signals is ever active (i.e. able to raise an event). The other edge is used only to return to the state prior to raising the event. Every data transfer involves four events (e.g. rising and falling edges of each of the Request and Acknowledge signals). In two-phase protocols, every data transfer involves two events, one for each signal. The logic value of the signal is irrelevant with respect to creating an event; merely the transition from one logic value to another creates an event, thus providing a perfect symmetry between rising and falling edges. Various embodiments of the present invention are based on two-phase protocols, however similar embodiments might be implemented with four-phase protocols.
Abstractions for graphical presentations of synchronization logic proposed in the present disclosure uses the C-element shown in FIG. 1F, at 1F10. The C-element is found in the relevant literature and is known as a Muller C-element. A C-element is an abstraction of logic that can synchronize the events at the inputs. When the inputs have the same value, the output propagates the value at its inputs. When the inputs differ, the output remains unchanged. The symbol for a C-element and a possible implementation of a C-element using combinational gates are depicted at 1F10 and 1F20, respectively. Also shown in FIG. 1F at 1F30 is a C-element including a reset signal. The reset signal Reset, when asserted (logic 1, has the effect of producing a logic 0 at output Z.
C-elements are the basis of an architectural construction known as Muller's pipeline. FIG. 1G includes a depiction of a Muller pipeline 1G10. The Muller pipeline 1G10 shows the logic, including C-elements, that synchronizes the latches of a linear pipeline. The datapath contains blocks of combinational logic (CL) and transparent latches (L). The C-elements have one of the inputs complemented. The protocol implemented by the Muller pipeline shown at 1G10 belongs to the family of four-phase protocols.
C-elements are also used in Sutherland's micropipelines. A Sutherland micropipeline is a variation of Muller's pipeline adapted to operate using a two-phase protocol. The datapath of such a micropipeline requires special registers with two input control signals (capture and pass). The events on these signals indicate that the register must become transparent (pass) or opaque (capture).
Desynchronization
Desynchronization is a paradigm that can be implemented in an automatic conversion of a synchronous circuit into an asynchronous circuit. The underlying idea of the desynchronization paradigm consists of substituting the clock-generated synchronization events of the synchronous circuit with synchronization events generated by sets of local controllers. This paradigm is illustrated in FIG. 1H. Specifically shown are the contrasts between the synchronous global clock driven pipeline of 1H10 and the asynchronous two-phase controller-driven Sutherland micropipeline of 1H50.
Automatic desynchronization of a synchronized circuit seeks to preserve the behavior of the sequential elements of the circuit while substituting asynchronous controller-based synchronization. That is, instead of using a global clock to trigger the storage of state in the storage elements of the micropipeline, a distributed scheme based on local controller handshake signals is used. Every storage element has an associated local controller that determines when the incoming data is available and when the outgoing data has already been captured by the receivers. The local controller associated with every storage element communicates through the previously introduced pairs of handshake signals, usually called request and acknowledge.
Several schemes for desynchronization have been proposed, using different types of handshake protocols and logic in the datapath. Each scheme has its specific features regarding the complexity of the logic, the timing overhead introduced by the control, the power consumption and the robustness of the circuit to variability. Muller's pipeline and Sutherland's micropipelines can be considered as particular cases of desynchronization schemes.
Among the various schemes for desynchronization, one of those proposed is a transformation method from synchronous to asynchronous circuits in the context of the design of processor arrays. The method includes replacing the flip-flops with master-slave latches and creating a synchronization stratum with local controllers implementing a handshake protocol for event creation.
Embodiments of the present invention provide novel building blocks for a fully automated design flow that generates provably correct asynchronous circuits from synchronous specifications, especially using variability-aware local controllers 1H60, 1H61, 1H70, 1H71, 1H80, 1H81, etc, each implementing two-phase protocols with its neighbors.
The elasticity in the data transmission requires extra storage to implement those registers that receive new incoming data but have not been able to deliver the previously stored data. Without the extra storage, the synchronization is only possible by means of global signals (i.e., synchronous clocks). One way to provide this feature is to use the storage associated with the master and slave latches that implement the flip-flops. In a conventional synchronous design, it is not possible to store different data at each latch. However, the control layer of a desynchronization scheme can provide different and independently enabled signals for the master and slave latches.
Generally, and as previously indicated, two families of protocols have been proposed for implementing handshakes for local synchronization: four-phase protocols, and two-phase protocols. Originally, two-phase protocols with special latches were proposed for micropipelines. Later on, two-phase controllers using conventional latches were proposed. However, none of the previous proposals disclosed techniques adequate for a provably correct and fully automated flow covering any possible synchronous circuit.
Moreover, prior proposals included assumptions or limitations for desynchronization techniques that motivate the present disclosure. To advance the state of the art, the present invention considers methods for guaranteeing a correct timing after synthesis, techniques for local controller-based timing, techniques for clock gating, techniques for dealing with matched delays, techniques for performance tuning, techniques for initializing/resetting sequential circuits, techniques for communicating between asynchronous circuits and synchronous circuits , etc, and further discloses various methods to deal with several challenging aspects of the design and synthesis of circuits used in desynchronized systems.
It is the advancement of the art and limitations of such prior proposals that motivate the present invention disclosed herein.