As synchronous designs are increasingly facing challenges due to fundamental limitations of clocking, the VLSI design community has recently turned towards asynchronous logic to mitigate the challenges of global clock distribution in large complex high-speed systems. Asynchronous design offers several potential benefits, such as lower power consumption, higher performance, greater robustness, and significantly better modularity, all of which make asynchronous circuits a promising alternative to synchronous design.
When the problems that arise when using a global synchronous clock became apparent, the VLSI community started looking towards solving problems in asynchronous domain due to its inherent advantages. The main difference in the synchronous and asynchronous ideologies is the way timing between various modules is maintained. In a synchronous pipeline, for example, clocking gives a timing reference which dictates the completion of different stages. In asynchronous pipelines, timing is inferred by communication between the adjacent stages in the pipeline. This is referred to as handshaking. Handshaking protocols define the control behavior of asynchronous pipeline.
There are many areas where asynchronous circuits dominate their synchronous counterparts. Lower emissions of electromagnetic noise, no clock distribution (saving area and power), no clock skew, robustness to environmental variations (e.g. temperature and power supply) or transistor variations, better modularity and better security are just some of the properties for which most asynchronous designs have shown advantages over synchronous ones.
There are many different flavors of asynchronous design. However, the most commonly used approaches differ mainly in the following design choices.                Data signaling/encoding. In dual rail encoded data, each Boolean (i.e., two-valued signal) is implemented as two wires, typically a data signal and a clock signal. This allows the value and the timing information to be communicated for each data bit. Bundled data, on the other hand, has one wire for each data bit and a separate wire to indicate the timing.        Control signaling/handshaking. Level sensitive circuits typically represent a logic one by a high voltage and a logic zero by a low voltage. Transition signaling uses a change in the signal level to convey information.        Timing model. A speed independent design is tolerant to variations in gate speeds but not to propagation delays in wires while a delay insensitive circuit is tolerant to variations in wire delays as well.        
The most popular form in recent years has been dual-rail encoding with level sensitive signaling. Full delay insensitivity is still achieved, but there must be a “return to zero” phase in each transaction, and therefore more power is dissipated than with transition signaling. The advantage of this approach over transition signaling is that the logic processing elements can be much simpler; familiar logic gates process levels whereas the circuits required to process transitions require state and are generally more complex.
FIG. 1 illustrates another conventional approach, which uses bundled data with a transition signaled handshake protocol to control data transfers. FIG. 1 shows the interface between a sender 100 and a receiver 102. Sender 100 and receiver 102 may be two stages of a multi-stage pipeline, for example. A bundle of data, such as databus 104, carries information, typically using one wire for each bit. A request signal (REQ) 106 is sent by the sender to the receiver and carries a transition when the data is valid. An acknowledge signal (ACK) 108 is sent from the receiver to the sender and carries a transition when the data has been used.
The protocol sequence is also shown as the timing diagram at the bottom of FIG. 1. At time T1, sender 100 places valid data on databus 104. At time T2, after some delay sufficient to allow the signals on databus 104 to stabilize, sender 100 causes a transition to occur on REQ 106. Receiver 102 may use the transition of REQ 106 to internally capture (e.g., latch) the values on databus 104. At time T3, after some delay sufficient to allow receiver 102 to guarantee that the data on databus 104 has been properly latched, receiver 102 may cause a transition to occur on ACK 108, to indicate to sender 100 that the data has been successfully received by receiver 104, after which time sender 100 may “release” the data, meaning that sender 100 need not maintain the valid data on databus 104. In some cases, sender 100 may stop driving databus 104, sometimes referred to as “tri-stating” the bus.
This approach has some disadvantages, however. Existing handshake protocols dictate a unidirectional flow of information. Given two adjacent stages, these protocols define one of the stages as active and the stage as passive. Only the active stage can initiate a communication with the passive stage. As used herein, the term “forward” refers to the direction that data is traveling as it passes through the pipeline, and the term “backward” refers to the opposite direction from forward. In conventional pipelines, initiation signals, such as REQ 106 can only travel forward, and response signals, such as ACK 108, can only travel backward. Though these protocols have enabled building complex pipelines, their unidirectional nature has become a bottleneck in implementing certain useful architectural concepts, such as speculation, preemption, and eager evaluation. These concepts will now be described with reference to a simple example, described below and illustrated in FIG. 2.
Consider the following example: a simple application consisting of an if-then-else statement.
IF <CONDITION> THEN<IF BRANCH>ELSE<ELSE BRANCH>END IF
In the conventional control-driven approach, designing an asynchronous circuit for the above application would involve first computing the condition and then taking the if or else branch accordingly. The control is returned after the whole operation is completed. If a represents the time to compute the CONDITION, β represents the time to perform the operations within the IF block, and γ represents the time to perform the operations within the ELSE block, then the cycle time is α+β if the condition is TRUE and α+γ if the condition is FALSE. Where p is the probability that the condition will be TRUE, the average cycle time TAVG is given by the equation:TAVG=α+pβ+(1−p)γ
As used herein, the term “speculation” refers to the execution of code or the performance of a process even though it is not known at the time whether the process is necessary or whether the results of the process will be used. Speculation may be performed by pipelines that have multiple parallel pipeline paths. Using the IF-THEN-ELSE example above, all the three operations—evaluating the condition, executing the IF branch, and executing the ELSE branch—can be performed in parallel using three separate, parallel pipelines. Since the branch outcome is not known until the condition is computed, both the IF and ELSE branches are speculatively executed and the appropriate result is selected based on the condition outcome.
As used herein, the term “preemption” refers to the cancelling of operations or a sequence of operations during execution of the operation or before the operation has been executed. In the IF-THEN-ELSE example, once the CONDITION has been evaluated, the unneeded branch may be preempted, e.g., the operations of the unneeded branch can be terminated if currently being executed or cancelled before execution has begun.
As used herein, the term “eager evaluation” refers to the evaluation of a CONDITION before all of its inputs are known. For example, if the CONDITION being evaluated includes an OR operation, and if one input to an OR operation is a logical 1, the output of the OR operation is known to be a logical 1 and thus the results of the OR operation can be forwarded to the next stage without waiting to receive the other input(s). Similarly, if one input to an AND operation is a logical 0, the output of the AND is known to be logical 0 regardless of the values of the other input(s). In either scenario, it would be unnecessary to evaluate, or wait for the completion of an ongoing evaluation of, the other terms of the OR/AND operation.
FIG. 2 is a block diagram illustrating a conventional transition signaling asynchronous pipeline implementation that does not support counterflow anti-tokens, which is disclosed in U.S. Pat. No. 6,958,627. Pipeline 200 consists of multiple stages 202, two of which are shown in FIG. 2 as stageN−1 202A and stageN 202B. In one embodiment, each stage 202 includes a data latch 204 for latching incoming data 206, and a latch controller 208, which implements the latch enable logic. Latch controller 208 has 2 inputs, a request signal (REQ) 210 generated by the current stage and an acknowledgment signal (ACK) 212 from an adjacent stage, and outputs a latch enable signal 214. The function of latch controller 208 is to disable latch 204 when the inputs of latch controller 208 don't match, e.g., when a request has not been acknowledged. In one embodiment, latch controller 208 may be implemented using a simple XNOR gate 216.
In one embodiment, latch 204 remains transparent when its stage 202 is waiting for data. As soon as data enters the stage, the data is captured by closing the latch behind it. The latch reopens when the data held by the latch is captured by the subsequent stage. This allows requests (along with data) to flow in the forward direction and their acknowledgments in the backward direction.
In one embodiment, the request signal generated by one stage is also both the request signal sent to the next stage and the acknowledge signal sent to the previous stage. For example, in the embodiment illustrated in FIG. 2, REQ 210 for stageN 204 is also both ACK for stageN−1 202 and REQ for stageN+1 (not shown). In conventional pipeline architectures, the signal wires are typically named based on the type of signal they carry. All the forward flowing signals carry requests and all the reverse flowing signals carry acknowledgments.
FIG. 3 illustrates an abstract picture of such a design that implements speculation but not preemption or early evaluation. The boxes represent operations performed by the circuit. Related operations are connected by lines, and connected boxes represent a sequence of operations. For pipelines, each operation may be performed by separate circuits. Thus, the boxes may represent separate hardware stages, such as sender 100 and receiver 102, and the lines may represent the combination of REQ 106, ACK 108, and databus 104 as shown in FIG. 1, above. For clarity, the boxes are hereinafter referred to generically as “stages”.
In the example shown in FIG. 3, stage 300 represents the detection of an IF-THEN-ELSE construct and the subsequent creation of multiple, parallel operations. A stage that creates multiple, parallel operations is referred to as a “fork”. Line 302 represents the sequence of operations required to evaluate the CONDITION. Line 304 represents the sequence of operations performed by the IF branch. Line 306 represents the sequence of operations performed by the ELSE branch. Stage 308 represents the completion of the CONDITION evaluation and resulting selection of the results of one branch or the other, e.g., the results of sequence 304 or sequence 306. A state that coalesces the results of multiple, parallel operations is referred to as a “join”. The abstract example illustrated in FIG. 3 is intended to show that each branch may involve different, and sometimes vastly different, numbers of operations.
During execution of the simple IF-THEN-ELSE example shown above, the pipeline will perform operations from each branch in parallel. For example, the first stages in sequences 302, 304, and 306 will be performed at the same time, the second stages in sequences 302, 304, and 306 will be performed at the same time, and so on.
Thus, the throughput of such a pipeline is limited by the mismatches in depths of the two branches. Let NIF be the number of stages in the IF branch and NELSE be the number of stages in the ELSE branch. Assuming NIF≦NELSE, the cycle time of the given pipeline is given by the equation:TCYCLE=(NELSE/NIF)×(the cycle time of a given stage)For example, if the IF and ELSE branches are perfectly matched, i.e., having the same number of stages, a pipeline having a 100 nS cycle time will produce a new output every (N/N)×100 nS=100 nS, i.e., every 1 cycle. However, if the IF branch has 4 stages and the ELSE branch has 5 stages, the pipeline will produce a new output every (5/4)×100 nS=125 nS, i.e., every 1.25 cycles. In other words, it can be said that the pipeline will produce a valid result during only 4 out of every 5 clock cycles. During the 1 out of every 5 clock cycles, the 4 stage pipe waits for the 5 stage pipe to finish; while it is waiting, it cannot accept as input the next operation.
The main drawback of the above design is that the final stage has to wait until all branches are computed even if some are not required. In part, this is due to the unidirectional nature of conventional pipeline designs, as illustrated in FIG. 1. Even though stage 308 may quickly determine whether the CONDITION branch returns TRUE or FALSE, stage 308 cannot act on that information since all initiating signals may only travel forward, and stage 308 has no choice but to wait until it receives all REQ signals from the last blocks of stages 302, 304, and 306, respectively. Furthermore, once the CONDITION has been evaluated, the operations of the non-selected branch are no longer needed, but stage 308 has no mechanism by which it can command the pipeline stages currently dedicated to performing those operations to discontinue processing of those operations. Thus, not only must stage 308 wait longer than necessary in some circumstances, the pipeline expends power to perform unnecessary operations.
Preemption is a technique that can overcome this disadvantage of conventional pipelines. One proposed method to implement preemption is to add the ability to send commands in the “backward” direction, referred to herein as “anti-tokens”. Referring again to the example illustrated in FIG. 3, once stage 308 has evaluated the CONDITION, it knows whether to disregard the results of the IF branch 304 or the ELSE branch 306. Stage 308 could then issue an anti-token backwards along the unneeded branch. Once the anti-token is received by a stage, the stage would cancel the operation and/or discard the result.
FIG. 4 compares the operation of a conventional pipeline 400 without anti-tokens to the operation of a conventional counterflow pipeline 402 which uses anti-tokens to implement preemption. The filled circles represent tokens, which flow in the forward direction through a pipeline, and empty circles represent anti-tokens, which flow in the backward direction through a pipeline. The flow of tokens and anti-tokens is represented as a sequence of views of the pipeline at various times T1 through T6, arranged from top to bottom of FIG. 3. The sequence of views illustrating the operation of pipeline 400 is on the left side of FIG. 3, and the sequence of views illustrating the operation of counterflow pipeline 402 is on the right side of FIG. 3. Within each single view, tokens flow from left to right and anti-tokens flow from right to left. The IF branch operations are represented by stages S2 and S3, the CONDITION evaluation is represented by stage S4, and the ELSE branch operations are represented by stages S5˜S8.
At time T1, stage S1 has detected an IF-THEN-ELSE construct and prepares to perform the calculations of the CONDITION, IF branch, and ELSE branch in parallel. S1 issues tokens to the first stage of each branch, namely stage S2 of the IF branch, stage S4 of the CONDITION branch, and stage S5 of the ELSE branch. In this example, the operation of pipelines 400 and 402 are identical at time T1.
At time T2, stage S2 has completed its operation and sends a token to stage S3. Stage S4 has completed evaluation of the CONDITION, and sends a token to stage S9, indicating the results of the CONDITION. In this example, the result is TRUE, meaning that only the IF branch need be processed and the ELSE branch need not be performed. At time T2 also the operation of pipelines 400 and 402 are identical.
At time T3, the operation of pipelines 400 and 402 begin to differ significantly. Pipeline 400 simply continues to wait for the completion of the IF and ELSE branches. The last stage of the IF branch, stage S3 sends a token containing the result of the IF branch operations to stage S9, but stage S9 cannot use that result, i.e., forward that result to the next stage, until it receives a token from the ELSE branch. Thus, pipeline 400 must wait during time T4 and time T5 while the ELSE branch completes its operation. Not until time T6 can stage S9 of pipeline 400 forward the results of the IF branch on to the next stage in the process.
In contrast, at time T3, stage S9 of counterflow pipeline 402 has determined, based on the results of the CONDITION branch, that the ELSE branch is superfluous, and this issues an anti-token into last stage of the ELSE branch, stage S8. At time T4, stage S9 of counterflow pipeline 402 may proceed to the next stage by forwarding the results of the IF branch on to the next stage in the process. Meanwhile, the anti-token passed backwards from stage S8 to stage S7 meets the token passed forwards from stage S7 to stage S8, cancelling the operation that would have been performed by stage S8 during time T5. Counterflow pipeline 402 performs the operation in less time and reduces power consumption by stage S8.
In another scenario, the each of the three branches may be split into two or more parallel sub-branches, each sub-branch calculating part of a logical CONDITION equation. In this scenario, if stage 308 has the ability to perform eager evaluation, additional time and/or power may be saved.
In this way, the use of anti-tokens provides the means by which pipelined systems can implement preemption, speculation, and eager evaluation. In general, the counterflow approach is useful for three key applications:
Pre-emption. An instruction that has received an exception can pass on information in a counterflow manner to any subsequently issued instructions to pre-maturely kill themselves before switching onto the exception handling routine.
Speculation. In general, control flow constructs, including conditional branches, switch and case statements, multiplexers with varying input delays, etc., are cases where speculation can improve throughput of an asynchronous pipeline.
Eager Evaluation. Applications range from a simple logic gate to a complex Boolean function. If the latencies of the input branches differ by a large amount, an anti-token can be propagated backward and the result validated immediately. Early output implementations allow logic to evaluate results before all inputs are presented. The results move to the next stage, but the current stage stalls while waiting for the late inputs to arrive simply to acknowledge them. This unnecessary wait can be removed by allowing backwards propagating anti-tokens to remove the late inputs. The use of anti-tokens and improved semi-decoupled latches allows the removal of many stalls due to unnecessary synchronizations, thus improving the performance of the circuit. Although the speed improvement might be sought after, the area and power consumption costs are high.
The idea of issuing an anti-token along the unwanted branch is useful in two ways. First, it aids in increasing the throughput by reducing the cycle time. Second, it aids in energy savings by preventing the unwanted requests flowing through the pipeline and hence preventing unwanted computations.
However, one disadvantage with conventional implementations of counterflow pipelines in general, and with conventional asynchronous counterflow pipelines in particular, is the problem of metastability.
As used herein, the term “metastability” refers to the transient, unstable but relatively long-lived state of a logic circuit (or any physical system). This occurs when a system that is designed to perform one action in response to one input and perform another action in response to another input is confronted with both inputs simultaneously, and must decide which action to perform. In logic circuits, metastability can cause unwanted glitches, which can lead to undesirable effects. Metastability is a characteristic of conventional asynchronous counterflow pipelines because each stage has to make a decision depending on whether it received a token or an anti-token. More specifically, the behavior of conventional counterflow pipelines will change depending on whether it received a token or an anti-token. In other words, conventional counterflow pipelines must choose between two possible actions to perform.
For example, referring again to FIG. 4, pipeline 402 issues an anti-token during time T3 and at time T4, the anti-token is traveling from stage S8 to stage S7 while at the same time the token is traveling from stage S7 to stage S8. For conventional asynchronous counterflow pipeline designs, the relative timing of the token and anti-token is critical. If the token arrives at a stage before the anti-token arrives at the same stage, the data is sent forward. If the anti-token arrives at a stage before the token arrives at the same stage, incoming data from the previous stage is not accepted. If the token and anti-token arrive simultaneously, the stage must decide between two actions: whether to send the data forward or to reject the data. Furthermore, conventional asynchronous circuit implementations can glitch under certain timing scenarios, and rely on certain timing assumptions for correct behavior. Both mechanisms give rise to metastability.
One approach to solving the metastability problem involves maintaining two separate pipelines, one for tokens and the other for anti-tokens. However, this approach is expensive in terms of hardware complexity, chip area, and power consumption. Additional circuitry is required to ensure that the two pipelines are synchronized.
Another approach to solving the metastability problem involves adding arbitration logic to determine which signal arrived first. This approach also is expensive in terms of hardware complexity, chip area, and power consumption due to the requirement of an arbitration circuit at every stage.
Accordingly, in light of these disadvantages associated with conventional implementations of asynchronous counterflow pipelines, there exists a need for improved systems, methods, and computer readable media for preemption in asynchronous systems using anti-tokens.