1. Field of the Invention
The present invention relates to an asynchronous design methodology known as phased logic and in particular, a technique called early evaluation that can be used to increase performance of such self-timed systems.
2. Related Art
Pipeline processors are known, high speed computing machinery which have separate stages that can operate concurrently. They can be found in graphics processors, signal processing devices, arithmetic integrated circuit components and instruction interpretation units. In general, pipeline processors operate on data as it passes along them. Thus, the latency of a pipeline is measured in terms of the time it takes a single data value to pass through it. Further, the throughput rate of a pipeline processor is a measure of how many data values can pass through it per unit time.
Pipeline processors both store and process data (i.e., they comprise alternating storage elements and processing logic). Further, pipelines processors can be clocked (i.e., their parts act in response to an external clock) or event-driven (i.e., their parts act independently whenever local events dictate).
Pipeline processors can be characterized as inelastic and elastic. Inelastic pipeline processors have a fixed amount of data. Thus, the input rate and the output rate of an inelastic pipeline exactly match. An inelastic pipeline, when not considering any processing logic, acts like a shift register.
Elastic pipeline processors have a varying amount of data in them. Thus, the input rate and the output rate of an elastic pipeline may differ momentarily because of internal buffering. Not considering any processing logic, an elastic pipeline processor behaves as a flow-through, first-in-first-out (FIFO) memory.
In I. E. Sutherland, “Micropipelines,” Communications of the ACM, Vol. 32, No. 6, June 1989, pp. 720–738 [hereinafter Sutherland], which is hereby incorporated by reference in its entirety, a micropipeline processor design methodology was first introduced. Sutherland defined a “micropipeline processor” (or simply, “micropipeline”) as: “[A] particularly simple form of event-driven, elastic pipeline with or without internal processing. The micro part of this name seems appropriate . . . because micropipelines contain very simple circuitry, because micropipelines are useful in very short lengths, and because micropipelines are suitable for layout in microelectronic form.”
The described micropipeline design methodology in Sutherland was offered as a solution for designing asynchronous, very large scale integration (VLSI) circuits and addressed the limitations of the clocked-logic conceptual framework commonly used in the design of digital systems. That is, there was a “need [for] a new conceptual framework because the complexity of VLSI technology ha[d] reached the point where design time and design cost often exceed[ed] fabrication time and fabrication cost.”
Micropipelines are a self-timed methodology that uses bundled data signaling, and Muller C-elements for controlling data movement between pipeline stages as described in D. E. Muller and W. S. Bartky, “A Theory of Asynchronous Circuits”, Proc. Int. Symp. on Theory of Switching, vol. 29, pp. 204–243 (1959) [hereinafter Muller], which is hereby incorporated by reference in its entirety.
“Bundled data signaling” refers to signaling where a group of wires represents the data, and a single control wire is used to indicate the presence of valid data. The control wire is said to be bundled with the data, hence the term “bundled data signaling.” In micropipelines, it is assumed that the delay of the control path is matched to the delay of the data path. This delay matching includes the wiring delay between micropipeline stages.
In M. E. Dean et al., “Efficient Self-Timing with Level-Encoded 2-Phase Dual-Rail (LEDR),” Advanced Research in VLSI (1991) [hereinafter Dean], which is hereby incorporated by reference in its entirety, LEDR signaling was introduced as a method for providing delay insensitive signaling for micropipelines. The term “phase” is used in Dean to distinguish successive computation cycles in the LEDR micropipeline, with the data undergoing successive even and odd phase changes.
The LEDR micropipeline systems were all linear pipelined data paths, with some limited fork/join capability also demonstrated, but with no indication of how general digital systems could be mapped to these structures. This problem was solved in D. H. Linder and J. C. Harden, “Phased Logic: Supporting the Synchronous Design Paradigm with Delay Insensitive Circuitry,” IEEE Transactions on Computers, Vol 45, No 9, September 1996 [hereinafter Linder], which is hereby incorporated by reference in its entirety, via a methodology termed “Phased Logic” (or “PL”).
PL uses marked graph theory, as described in F. Commoner, A. W. Hol, S. Even, A. Pneuel, “Marked Directed Graphs,” J. Computer and System Sciences, vol. 5, pp. 511–523, 1971 [hereinafter Commoner], which is hereby incorporated by reference in its entirety, as the basis for an automated method for mapping a clocked netlist composed of D-Flip-Flops, combinational gates and clocked by a single global clock to a self-timed netlist of PL gates. Logically, a PL gate is simply a micropipeline block with the state of the Muller C-element known as the “gate phase,” which can be either even or odd. A PL gate is said to “fire” (i.e., the Muller C-element changes state) when the phase of all data inputs match the gate phase. This firing causes the output data to be updated with the result of the computation block of the gate.
The term “coarse-grain” is used in the relevant art(s) to refer to a PL gate that has multiple outputs, has a compute function composed of multiple gates, and uses bundled data signaling for the inputs. The term “fine-grain” is used is used in the relevant art(s) to refer to a PL gate that has only one output, a compute function composed of a single logic function, and which uses LEDR signaling for data.
Notwithstanding the advances detailed above, a primary deficiency of micropipelines remains. That is, micropipelines remain slower than clocked pipelines because of the extra latency in the forward path. No systems exits which allow a micropipeline block to compute a result based on the arrival of only a subset of inputs due to data arrival dependencies.
Given the foregoing, what is needed is a system and method for early evaluation in micropipeline processors to improve performance.