Asynchronous, or clockless, logic design provide certain advantages over synchronous designs, such as in the area of power consumption. An asynchronous interconnection network would, for instance, save on power by eliminating the need for global clock distribution. Since clock power consumption is a major portion of total chip power, this can represent significant savings. Instead, different localized timing domains can exist on a single chip, glued together by an asynchronous interconnect fabric.
While synchronous designers employ clock-gating as a method of reducing dynamic power for inactive components on a chip, the asynchronous timing domains naturally provide this functionality by only transitioning nets when there is active computation.
Asynchronous designs, since they are self-timed, are also more tolerant of on-chip variations. Communication is typically localized between neighboring modules, which are similarly affected by manufacturing process and temperature. This locality property reduces verification efforts for designers. During normal operation, asynchronous circuits are more resilient to changes in temperature and voltage conditions and, unlike synchronous implementations, do not have to operate based on worst-case assumption.
Several network-on-chip solutions have been proposed to enable structured system design. A delay-insensitive chip area interconnect, named CHAIN, by Bainbridge et al., for example, provides robust self-timed communication for system-on-chip designs, including a multiprocessor for neural simulations. An asynchronous crossbar design, called Nexus, proposed by Lines provides system-level communication and has been used in Ethernet routing chips. Along with few other recent asynchronous on-chip-network architectures and designs, these earlier works provided asynchronous node architecture and implementation for coarse-grain complex-functionality primitive nodes. Each of these earlier proposed approaches has, however, limitations that restrict its applicability, for instance, to higher-end single-chip parallel processors.
A linear, low-overhead asynchronous pipeline called MOUSETRAP proposed by Singh and Nowick can provide high-throughput operation by using a single register based on level-sensitive latches to store data. Its simple stage control consisting of only a single combinational gate also contributes to its high-throughput operation. Also, unlike most synchronous pipelines that require expensive single registers made of flip-flops or double-latches, MOUSETRAP provides high storage capacity with low area using a single latch-based register in each stage.
U.S. Pat. No. 6,958,627 (the '627 patent) to Singh and Nowick describes Asynchronous MOUSETRAP pipelines. The '627 patent provided three primitive asynchronous cell designs: a linear cell (1-input, 1-output), a fork cell (1-input, 2-outputs), and a merge cell (2-inputs, 1-output). The fork cell receives one input, and broadcasts it in parallel to both outputs. The merge cell receives two inputs, waits for both to arrive, and merges them together onto a single output stream.