As synchronous designs are increasingly facing challenges due to fundamental limitations of clocking, the VLSI design community has recently turned towards asynchronous logic to mitigate the challenges of global clock distribution in large complex high-speed systems. Asynchronous design offers several potential benefits, such as lower power consumption, higher performance, greater robustness, and significantly better modularity, all of which make asynchronous circuits a promising alternative to synchronous design.
When the problems that arise when using a global synchronous clock became apparent, the VLSI community started looking towards solving problems in asynchronous domain due to its inherent advantages. The main difference in the synchronous and asynchronous ideologies is the way timing between various modules is maintained. In a synchronous pipeline, for example, clocking gives a timing reference which dictates the completion of different stages. In asynchronous pipelines, however, timing is inferred by communication between the adjacent stages in the pipeline. This is referred to as handshaking. Handshaking protocols define the control behavior of asynchronous pipeline.
There are many areas where asynchronous circuits demonstrate clear advantages over their synchronous counterparts. Lower emissions of electromagnetic noise, no clock distribution (saving area and power), no clock skew, robustness to environmental variations (e.g. temperature and power supply) or transistor variations, better modularity and better security are just some of the properties for which most asynchronous designs have shown advantages over synchronous ones.
There are many different flavors of asynchronous design. However, the most commonly used approaches differ mainly in the following design choices.                Data signaling/encoding. In dual rail encoded data, each Boolean (i.e., two-valued signal) is implemented as two wires, typically a data signal and a clock signal. This allows the value and the timing information to be communicated for each data bit. Bundled data, on the other hand, has one wire for each data bit and a separate wire to indicate the timing.        Control signaling/handshaking. Level sensitive circuits typically represent a logic one by a high voltage and a logic zero by a low voltage. Transition signaling uses a change in the signal level to convey information.        Timing model. A speed independent design is tolerant to variations in gate speeds but not to propagation delays in wires while a delay insensitive circuit is tolerant to variations in wire delays as well.        
The most popular form in recent years has been dual-rail encoding with level sensitive signaling. Full delay insensitivity is still achieved, but there must be a “return to zero” phase in each transaction, and therefore more power is dissipated than with transition signaling. The advantage of this approach over transition signaling is that the logic processing elements can be much simpler; familiar logic gates process levels whereas the circuits required to process transitions require state and are generally more complex.
FIG. 1 illustrates another conventional approach, which uses bundled data with a transition signaled handshake protocol to control data transfers. FIG. 1 shows the interface between a sender 100 and a receiver 102. Sender 100 and receiver 102 may be two stages of a multi-stage pipeline, for example. A bundle of data, such as databus 104, carries information, typically using one wire for each bit. A request signal (REQ) 106 is sent by the sender to the receiver and carries a transition when the data is valid. An acknowledge signal (ACK) 108 is sent from the receiver to the sender and carries a transition when the data has been used.
The protocol sequence is also shown as the timing diagram at the bottom of FIG. 1. At time T1, sender 100 places valid data on databus 104. At time T2, after some delay sufficient to allow the signals on databus 104 to stabilize, sender 100 causes a transition to occur on REQ 106. Receiver 102 may use the transition of REQ 106 to internally capture (e.g., latch) the values on databus 104. At time T3, after some delay sufficient to allow receiver 102 to guarantee that the data on databus 104 has been properly latched, receiver 102 may cause a transition to occur on ACK 108, to indicate to sender 100 that the data has been successfully received by receiver 104, after which time sender 100 may “release” the data, meaning that sender 100 need not maintain the valid data on databus 104. In some cases, sender 100 may stop driving databus 104, sometimes referred to as “tri-stating” the bus.
There have been a number of implementations of asynchronous pipelines, each approach having particular drawbacks. For example, Sutherland (Sun '89) describes 2-phase micro-pipelines that are elegant but expensive and slow. Molnar, Sutherland et al. '9701 describes a pipeline that is fast but requires fine-grain transistor sizing to achieve delay equalization and then needs extensive post-layout simulation to verify complex timing constraints. Schuster et al. ISSCC'00 describes a asynchronous pipeline that has very complex timing requirements and circuit structures. Williams '86 and Martin '97 describe dynamic pipelines that have no explicit latches and low latency but have poor cycle time (i.e., “throughput limited”).
FIG. 2 is a block diagram illustrating a conventional transition signaling asynchronous pipeline implementation that supports simple forks and joins, which is disclosed in U.S. Pat. No. 6,958,627. The pipeline implementation disclosed therein is referred to as a “MOUSETRAP” pipeline. Pipeline 200 consists of multiple stages 202, two of which are shown in FIG. 2 as stageN-1 202A and stageN 202B. In one embodiment, each stage 202 includes a data latch 204 for latching incoming data 206, and a latch controller 208, which implements the latch enable logic. Latch controller 208 has 2 inputs, a request signal (REQ) 210 generated by the current stage and an acknowledgment signal (ACK) 212 from an adjacent stage, and outputs a latch enable signal 214. The function of latch controller 208 is to disable latch 204 when the inputs of latch controller 208 don't match, e.g., when a request has not been acknowledged. In one embodiment, latch controller 208 may be implemented using a simple XNOR gate 216. In one embodiment, latch 204 remains transparent when its stage 202 is waiting for data. As soon as data enters the stage, the data is captured by closing the latch behind it. The latch reopens when the data held by the latch is captured by the subsequent stage. This allows requests (along with data) to flow in the forward direction and their acknowledgments in the backward direction. A simple fork receives an input and forwards it to not one but multiple next stages, and waits for both next stages to acknowledge before accepting the next input data. A simple join receives input from not one but multiple input stages, and waits for both previous stages to request before merging the data from both input stages, latching the merged data, and forwarding the merged data to a single next stage.
However, behavior that is more sophisticated than a simple fork or simple join is desired. Accordingly, in light of these disadvantages associated with conventional implementations of asynchronous pipelines, there exists a need for improved systems, pipeline stages, and computer readable media for advanced asynchronous pipeline circuits using transitional signaling.