In the design of digital logic, it is a fundamental task to be able to sequence behavior in time where required. For example, an operation X must not be activated without its proper data operands A and B ready and available, and any other operations Y and Z which use the result of X must both be finished and ready to accept a new result from X. The former requirement is known as a dependency, and the latter known as an antidependency.
Synchronous logic solves the problem of sequencing behavior in digital designs by activating all operations on the occurrence of a single, global event. This global event is the rising or falling edge of a periodic clock signal.
Contrarily, asynchronous logic solves the problem of sequencing behavior by activating operations based on the occurrence of many distributed, largely unrelated, and highly localized events. These events are the rising or falling edges of potentially any logic gate within the design.
Unlike the synchronous design style with a global clock signal, asynchronous logic design is extremely variegated. A wide variety of styles exist as known art. Each of these styles of asynchronous logic may be classified according to several distinguishing features as here described.
Firstly, each of these styles is distinguished by the size of the operation activated by a local event. In some styles, a local event activates the processing of an entire datapath of logic. For example, the multiplication of two 32-bit operands to form a 64-bit result might be controlled by a single local event. Such would be said to be very coarse-grained asynchronous event control. In other styles of asynchronous logic, local events are identifiable as a signal to activate the logical NAND of two bits in only a single gate. Such would be said to be very fine-grained event control.
Styles of asynchronous logic which share the same granularity of event control are further distinguished by a multitude of logical communication protocols used to generate the local events based on the occurrence of other events in the asynchronous logic. In some cases simply the change in state of a signal, any edge or level change, may generate an event. This is known in the language of asynchronous logic as a two-phase signaling protocol. In other cases, both a rise and fall of a signal in series are required to generate a local event. This is known as a four-phase signaling protocol.
The safe design of any logic, whether synchronous or asynchronous, depends on assumptions made about timing. All logic in synchronous designs, for example, must take less time than the period of the global clock for proper safety. Asynchronous logic is no different in that timing assumptions put constraints on design.
Because a local event present in asynchronous logic represents a designer's intentional sequencing of overall behavior, it inevitably requires information from its dependencies and antidependencies in order to activate. Thus, each local event is generated based on a collection of occurrences of other events. All events related to dependencies for the operation must be collected to ensure the operation is guaranteed to have the correct data values available. This is known in the jargon of computer science as a join. More, all events related to antidependencies must be collected to ensure that the operation may activate safely without adversely affecting another. This is known in the jargon of computer science as a fork.
A fork or join may also have arbitration involved with event control. Operation X, which uses operands A and B, might hypothetically take A from more than one source. Operation X would be activated when A, and either B1 or B2, were available. Similarly, once X is activated and its result is ready, this result might be delivered to Y and either Z1 or Z2, but not both. Selection of the source of B, and the selection of Z, may be either explicitly directed by another signal, or left to chance as a “first-come-first served” policy. Arbitration is involved in event control wherever an EITHER-OR of events is required before the activation of an operation. It is not necessary wherever a simple AND of events is required to activate an operation.
Before an operation is activated, its joins must complete and its forks must be free to accept the operation's output. This requirement is universally true in asynchronous logic design of any style, for any safe and correctly behaving design. However, asynchronous logic styles distinguish how this timing guarantee is made and at what cost. There is a direct relationship between making a universal guarantee and the resulting circuit size or cost. There is also a direct relationship associated with satisfying the constraints that result from a partial timing correctness guarantee and the implementation complexity of such logic. Implementation complexity negatively impacts a CAD tool or human designer of the physical circuit.
Delay insensitive asynchronous logic ensures that under all circuit conditions, the guarantee of timing correctness is inherently met, no matter the implementation. Building completely delay-insensitive asynchronous logic inevitably involves more safety which must be satisfied by more gates. In some cases this absolute guarantee deteriorates performance because of the more robust event signaling protocol which must be used. Both leakage and switching power in a CMOS transistor implementation are necessarily higher. Nevertheless, delay insensitive asynchronous logic is extremely robust and expedient for numerous applications, and therefore a common style.
Other styles of asynchronous logic make the guarantee of timing correctness while ignoring the delay of wires. During physical implementation of the design, each wire attached to a gate contains a hard and fast time delay constraint, beyond which the entire logic design no longer operates safely. For a design of modern proportions, containing hundreds of thousands or millions of such wires, this assumption is usually unmanageable.
The bundled-delay constraint is the most common trade-off between circuit cost and implementation complexity. A group of logic paths, such as those within a multiplier producing a product from two operands, are grouped. The worst-case delay of this bundled datapath is given a timing constraint, and with this timing constraint the guarantee of timing correctness of the overall asynchronous logic design is made. Clearly, reducing the number of timing constraints by orders of magnitude ameliorates a great burden on implementation complexity. At the same time, avoiding the absolute guarantee of timing correctness that delay insensitive logic makes allows for far less bulky and expensive circuitry.
In all physically implemented asynchronous logic circuits, 100% of the timing constraints derived from the correctness assumptions are met in order to guarantee correct behavior. However, variations in the timing of individual logic paths do exist between different physical circuit embodiments of the same asynchronous logic design, each of which meet all of these timing constraints, operate safely and correctly. These logic path variations appear as symptoms of many perturbations including minute variances in manufacturing, differences in the voltage or temperature at which the circuits operate, and most importantly, different circuit implementations.
When event control involves arbitration due to an EITHER-OR condition, the logic will have correct but non-deterministic behavior because of these timing variations. Operation X involves a join which waits for either B1 or B2 to arrive, and activates based on whichever event arrives first. The race condition between B1 and B2 will see-saw back and forth because of timing variations, and therefore the order of processing in the asynchronous design may change. This nondeterminism is not a fatal flaw, as the overall behavior is correct. However, nondeterminism makes testing of asynchronous logic designs extremely difficult, as the same input applied repetitively to the same physical circuit yields results in different order each time.
For describing any logic design textually, a hardware description language or HDL is used. Since the advent of logic synthesis in the late 1980s, the HDL has become not only a description of the design for purposes of simulation or documentation, but also the way designs are entered and captured. For synchronous design, the HDLs Verilog and VHDL are standardized design entry languages well known in the world community of engineers. HDLs for asynchronous design entry have struggled for standardization and acceptance due to the complexities of describing the asynchronous event control. “Micropipelines” constitute a style of asynchronous logic characterized by coarse-grain event control of a stage of bundled-delay datapath, bounded by locally clocked registers at the start and end. The structure is similar to a synchronous pipeline stage. A set of discrete building blocks for event control, well-known to those skilled in the art, is associated with this style. These building blocks allow for AND, EITHER-OR and signal-controlled OR of events, for both forking and joining. Between the set of locally clocked registers under event control, the datapath has a known worst-case bundled delay. A handshake protocol with request and acknowledge signals is set up between the controllers of all local clocks in the design.
For a single Micropipeline stage, a request signal is sent forward from the start register of the pipe stage in the direction of the datapath to the end register when new operands A and B are both ready to be clocked and enter the combinatorial stage. This request is derived from a join event of A and B. An acknowledge signal is sent backward from the end register opposite of the direction of the datapath to the start, when the output of the stage has been safely latched. When request and acknowledge correspond, registers at the start of the stage are clocked, activating the operation of the datapath with the new A and B operands.
In order to satisfy the bundled-delay timing constraint and guarantee timing correctness of a Micropipeline stage, a delay is intentionally added to the forward request signal, which causes the request to arrive at the end in the same amount of time as the worst-case delay through the datapath within the stage. This matched delay element is among the basic building blocks which characterize Micropipelines.
What is needed is a micropipeline stage controller and control scheme that implements two-phase asynchronous handshakes.