1. Field of the Invention
This invention relates to asynchronous pipelines, and more particularly to asynchronous pipelines for high-speed applications which uses blocks of static logic for processing data, and simple transparent latches to separate data items.
2. Background of Related Art
Several synchronous pipelines have been proposed for high-throughput applications. In wave pipelining, multiple waves of data are propagated between two latches. (See, for example, D. Wong, G. DeMicheli, and M. Flynn, “Designing High-Performance Digital Circuits Using Wave-Pipelining,” IEEE TCAD, 12(1):24-46, January 1993; W. Liu, C. T. Gray, D. Fan, W. J. Farlow, T. A. Hughes, and R. K. Cavin, “A 250-MHz Wave Pipelined Adder in 2-μm CMOS,” IEEE JSSC, 29(9):1117-1128, September 1994; and A. Mukherjee, R. Sudhakar, M. Marek-Sadowska, and S. Long, “Wave Steering in YADDs: A Novel Non-Iterative Synthesis and Layout Technique, Proc. DAC, 1999.) However, this approach requires substantial design effort, from the architectural level down to the layout level, for accurate balancing of path delays (including data-dependent delays), and remains highly vulnerable to process, temperature and voltage variations. Other aggressive approaches include clock-delayed domino (See G. Yee and C. Sechen, “Clock-Delayed Domino For Adder and Combinational Logic Design,” Proc. ICCD, October 1996), skew-tolerant domino (See D. Harris and M. Horowitz, “Skew-Tolerant Domino Circuits,” IEEE JSSC, 32(11):1702-1711, November 1997; A. Dooply and K. Yun, “Optimal Clocking and Enhanced Testability for High-Performance Self-Resetting Domino Pipelines,” ARVLSI'99), and self-resetting circuits (See V. Natayanan, B. Chappell, and B. Fleischer, “Static Timing Analysis For Self Resetting Circuits,” Proc. ICCAD, 1996; A. Dooply and K. Yun, “Optimal Clocking and Enhanced Testability for High-Performance Self-Resetting Domino Pipelines,” ARVLSI'99). These designs require complex timing constraints which are difficult to verify. They also lack elasticity and still require high-speed global clock distribution.
In addition, many asynchronous pipelines have been proposed. The classic asynchronous pipelines are called micropipelines (See I. E. Sutherland, “Micropipelines,” Communications of the ACM, 32(6):720-738, June 1989). This style uses elegant control, but has slow and complex capture-pass latches which hinder performance.
A number of variants using alternative control and latch structures have been proposed (See P. Day and J. V. Woods, “Investigation Into Micropipeline Latch Design Styles,” IEEE TVLSI, 3(2):264-272, June 1995; K. Yun, P. Beerel, and J. Arceo, “High-Performance Asynchronous Pipelines Circuits,” Proc. Intl. Symp. Adv. Res. Async. Circ. Syst. (ASYNC), 1996; and C. Molnar, I. Jones, W. Coates, J. Lexau, S. Fairbanks, and I. Sutherland, “Two FIFO Ring Performance Experiments,” Proceedings of the IEEE, 87(2):297-307, February 1999), but in each case the performance is limited due either to excessive control delays or to sizable latch delays.
These pipeline designs fall into two categories: (1) pipelines that use phase conversion, and (2) pipelines that do not use phase conversion. The pipelines described in Sutherland, “Micropipelines,” and Day and Woods, “Investigation into Micropipeline Latch Design Styles,” cited above, and C. Molnar and I. W. Jones, “Simple Circuits that Work For Complicated Reasons,” Proc. Intl. Symp. Adv. Res. Async. Circ. Syst. (ASYNC), pp. 138-149, April 2000, all use phase conversion. In contrast, the pipelines of S. B. Furber and P. Day, “Four-Phase Micropipeline Latch Control Circuits,” IEEE TVLSI, 4(2):247-253, June 1996, and K. Yun, P. Beerel, and J. Arceo, “High-Performance Asynchronous Pipelines Circuits,” cited above, do not use phase conversion.
The micropipelines of Sutherland, (See, e.g., FIG. 14 thereof) and Day and Woods (See, e.g., FIG. 10 thereof) use phase conversion. The micropipeline stage N 10 uses transition signaling and transparent latches 12, as illustrated in FIG. 1. Data is received at data input 14 from stage N−1 (not shown in FIG. 1) and data is transmitted to stage N+1 (not shown in FIG. 1) at data output 16. Control of the latch 12 is complex, and performed by at least three elements: a C element 18, an exclusive NOR element (XNOR) 20, and a toggle component 22. The output of C element 18 is doneN 35, which serves as an input to XNOR 20, along with ackN 32 received from stage N+1. The output En 36 of XNOR 20 enables the latch element 12. The toggle element 16 routes transitions received on its input 21 to one of two outputs 24 and 26 alternately, starting with the output 26, labeled with a dot. The output 26 is routed to stage N+1 as reqN+1 30 and to stage N−1 as ackN−1 31. A disadvantage of these designs is that the critical paths are long: (1) from request signal reqN 28 received from stage N−1 to request signal reqN+1 30 transmitted to stage N+1, there are four component delays, i.e., delays from the C-element 18, the XNOR 20, the latch 12, and the toggle 22; and (2) from acknowledgment signal ackN 32 received from stage N+1 to the input 33 of the C-element 18 (to half-enable it), there are three component delays, i.e., delays from the XNOR 20, the latch 12, and the toggle 22.
The pipelines described by Molnar and Jones, “Simple Circuits That Work for Complicated Reasons,” cited above, also use phase conversion. They are referred to as “Charlie boxes,” and include simpler designs, such as the S style described therein. However, these designs generate a relatively late completion signal. Moreover, these designs do not propose extensions to handle complex pipelining, i.e., forks and joins, nor do they disclose “waveform shaping” strategy, elimination of critical inverters through dual-rail control, or use of a clocked-CMOS style.
There are several alternative pipeline designs which do not use phase conversion. In Furber and Day, “Four-Phase Micropipeline Latch Control Circuits,” cited above, three distinct 4-phase protocols for asynchronous pipelines are proposed: (1) fully-decoupled, (2) long-hold and (3) semi-decoupled. These designs have several disadvantages: in the first two protocols, pipeline control is complex. For the best of their designs, i.e., semi-decoupled, which introduces a highly concurrent protocol, there is a minimum of four components on the critical cycle. These components are all C-elements, two of which have stack depth of three, and additional inverters are actually implied for correcting polarity.
A final alternative approach is to retain transition-signaling control, but replace the transparent latches with dual-edge-triggered D-flip-flops (DETDFF's), as cited in K. Yun, P. Beerel, and J. Arceo, “High-Performance Asynchronous Pipelines Circuits,” above. According to this design, data is latched each time the latch control is toggled. While this approach avoids the overhead of phase conversion, it incurs a heavy performance penalty because DETDFF'S are significantly slower than transparent latches, and are also much larger.
A new pipeline style, referred to as “GasP,” has been proposed which obtains even higher throughputs (See I. Sutherland and S. Fairbanks, “GasP: A Minimal FIFO Control,” Proc. Intl. Symp. Adv. Res. Async. Circ. Syst. (ASYNC), pp. 46-53. IEEE Computer Society Press, March 2001; and J. Ebergen, “Squaring the FIFO in GasP,” Proc. Intl. Symp. Adv. Res. Async. Circ. Syst. (ASYNC), pp. 194-205, IEEE Computer Society Press, March 2001). However, this approach aims for fine-grain transistor sizing to achieve delay equalization for all gates in the control circuitry, and the protocol has more complex timing constraints.
It is therefore an object of the invention to provide a pipeline which has a simplified control, and reduced control delays and latch delays.
It is another object of the invention to provide a pipeline which does not require delay equalization.
It is still another object of the invention to provide a pipeline which has simple one-sided timing constraints.
It is a further object of the invention to provide a pipeline which provides extensions to handle complex pipelining such as forks and joins.
It is a still further object of the invention to provide a pipeline which provides a latch switching optimization.
It is yet another object of the invention to provide a pipeline having a very fine-grain structure that is especially suitable for producing high throughputs.