1. Field of the Invention
The present invention is directed to an asynchronous circuit with completion detection, and a system and method for designing the same. In particular, the invention is directed to translation of a Boolean single-rail combinational logic circuit to a multi-rail circuit with completion detection.
2. Discussion of Background Information
Asynchronous circuits are sequential digital circuits that are able to operate without clock signals. Two asynchronous logic paradigms are disclosed in U.S. Pat. No. 6,526,542 (“Multi-Rail Asynchronous Flow with Completion Detection and System and Method for Designing the Same”) and in U.S. Pat. No. 5,305,463 (“Null convention logic system”).
Multi-rail asynchronous circuits encode data and spacer values using 2 or more signal rails. In such encodings the data value represents actual binary data fed to the circuit, for example, a TRUE or a FALSE value, whereas the spacer value is used to appropriately initialize the circuit to prepare it for accepting the next data value. Multi-rail asynchronous circuits operate in 2 phases, always alternating between data and spacer values, irrespective of the encoding used for data and spacers. The first phase is the phase where data values are applied at circuit inputs and data values will appear at the circuit outputs, whereas the second phase triggered by the completion of the first applies spacer values at circuit inputs and completes when spacer values have propagated to the outputs and the spacer value is assigned to every internal net. In all approaches in the literature these phases are symmetrical, i.e. both phases operate by feeding the value (data or spacer) at the circuit inputs and waiting for that value to propagate through the circuit to the outputs, thus their delay is almost identical.
The most common encoding type in asynchronous multi-rail logic is dual-rail encoding. In dual-rail encoding, a digital signal is represented by 2 binary rails, which assume a total of four states, (0, 0), (0, 1), (1, 0) and (1, 1). The (0, 0) value commonly represents the spacer word, the (0, 1) value represents the TRUE data value, where as the (1, 0) value represents the FALSE data value. Value (1, 1) is commonly unused. In other multi-rail encodings data words can assume more than 2 logic values.
A reason for encoding digital signals in multi-rail representations is to enable the detection of the propagation of data values from the circuit inputs to the circuit outputs and by incorporating a completion mechanism to detect that the operation of the circuit has completed. Circuits designed using multi-rail representations can thus exhibit asynchronous, data-dependent input to output delays. These type of circuits can increase the performance of digital by stems by replacing the conventional synchronous circuits, the operation of which is based on an external timing reference, instead of completion detection.
Detecting completion requires a specific mechanism to be added to the multi-rail circuit, the operation of which depends on the circuit implementation of the multi-rail logic. Two classes of completion schemes are “strongly-indicating” and “weakly-indicating.” “Strongly-indicating” circuits will only propagate data values at the outputs after all internal nodes have settled to their final value. “Weakly-indicating” circuits may propagate data values at the outputs even if some of the internal nodes have not yet assumed their final values. Spacer values are propagated in both types identically, from the inputs to the outputs setting every internal signal to spacer, i.e. (0, 0).
The majority of digital designs are today implemented using synchronous techniques, requiring the presence of external clock signals. The key advantage of asynchronous circuits with completion detection is the possibility to exploit data-dependent, true, input to output delay indicated by the circuit itself. These type of circuits have the potential for increasing performance and are immune to parametric and environmental variations, such as temperature variations, power supply voltage fluctuations and variability in fabrication characteristics of on-chip devices.
At present, even though a set of methodologies exist for implementing asynchronous multi-rail circuits with completion detection, all approaches in the literature require significant amount of area increase (over 2.5×), and every circuit operation requires two phases of almost equal delay, potentially doubling (2×) the circuit delay.
Several methodologies exist in the literature for the implementation of multi-rail circuits with completion detection. In dynamic CMOS logic approaches such as the paradigm disclosed in U.S. Pat. No. 4,686,392 (“Multi-functional Differential Cascade Voltage Switch logic”) are used, along with dynamic precharge for this purpose. However, the preferred embodiment is targeted to design automation, and focuses on static CMOS circuitry.
In static CMOS design, the literature provides three approaches to the design of multi-rail circuits with completion detection: DIMS (Delay-Insensitive Minterm Synthesis). NCL (Null Convention Logic—U.S. Pat. No. 5,305,463) and extended NCL or NCLX (U.S. Pat. No. 6,526,542). All three approaches employ symmetric data and spacer phases, but employ different implementation styles.
The DIMS approach is a “strongly-indicated” approach based on C-Muller gates (sequential asynchronous gates implementing the function c=ab+bc+ac), which implements a symmetric two-phase, dual-rail circuit by transforming every output node of a Boolean circuit, f, into two logic cones, f.t, the data TRUE output and f.f, the data FALSE output. In DIMS the two logic cones are implemented in a sum-of-minterm fashion (or of minterms), where each minterm is realized as a C-Muller gate, according to the truth table of the implemented dual-rail function. In DIMS logic when a data word arrives at the input, only one minterm, i.e. one C-Muller gate, is activated thus only one of 2 rails per output is asserted. The assertion of one of the two rails of each output signals completion for that output. The advantage of this approach is its simplicity, since each output has only one active circuit path. The disadvantages include the use of non standard-cell gates (C-Muller) and the lack of application of logic optimization to DIMS circuits, which implies very large circuit area (from ×4, ×6 to very large).
The NCL approach is a “strongly-indicated” approach based on TH (Threshold) gates, a special-purpose static CMOS gate family and library, implementing “threshold” functions, where each gate in the NCL library has a corresponding dual gate implementing its “dual” function. In NCL flow, each gate has the same p-type pull-up network, comprised of all dual-rail inputs. Thus, all NCL gate outputs output a spacer word (all NCL gates contain an inverting keeper), when all inputs assume the spacer value. In NCL, the “dual” of a gate is a gate which outputs the inverted value of another. In NCL, the data TRUE rail of each logic output is generated by mapping the Boolean function to the TH gates, whereas the data FALSE rail is generated by transforming each TH gate of the data TRUE rail to its dual TH gate. The approach requires again a special purpose CMOS standard-cell library and has been shown to require very large area, same as DIMS.
In contrast to the other two approaches, the NCLX approach (NCL with explicit completion) is better suited to design automation and is based on standard-cell CMOS gates only. NCLX creates a dual-rail network, based on an original Boolean network, by adding duals to every gate in the original circuit using De' Morgan's duality principle and by eliminating inverted circuit nets by replacing them with the corresponding complementary rails. De' Morgan's duality principle states: inverted conjunction of n inputs is equal to the disjunction of their inverses and similarly the inverted disjunction of n inputs is equal to the conjunction of their inverses. Completion in NCLX is implemented by inserting local completion detectors (OR gates) at every circuit node and implementing a “guarded” conjunction gate (C element, which is equivalent to an AND gate with memory). The “guarded” conjunction gate outputs a single completion signal based on the conjunction of all internal nets. The completion detection output can thus only be asserted when all internal nodes have settled to their final value. NCLX requires symmetric phases of equivalent delay for data and spacers.