The present invention generally relates to asynchronous circuits, and more particularly to an improved asynchronous circuit having lower latency and delay.
Signals propagating through multi-stage logic circuits fall into one of two general categories: data or control. Data represents the informational content passed from one stage to another through the integrated circuit chip. Control signals manage and direct the logical operations of individual stages in the context of the overall operation of a multi-stage logic circuit. A goal in the design of most multi-stage logic circuits is to optimize the speed of the former, while minimizing overhead costs in delay and complexity of the latter.
Advances in semiconductor fabrication technology allow increasingly larger numbers of logic stages to be placed on a single integrated circuit, and permit operation of such circuits at speeds greater than prior generations of circuits. Two known methodologies for the design of such circuits are known as a synchronous implementation and an asynchronous implementation. Synchronous designs are usually controlled by a global clock signal which causes all of the circuitry on the integrated circuit chip to operate in lockstep. While conceptually simple, such a design requires that the clock control cycle for all stages be set for the worst-case delay of data signals in any one stage.
Asynchronous designs use local control to determine when local stages operate, and the stages do not necessarily operate in synchrony with the other stages on the integrated circuit chip. As such, asynchronous designs eliminate the difficulty of distributing a clock xe2x80x9cgloballyxe2x80x9d across the integrated circuit, and also potentially offer improved speed, lower power consumption, and other benefits.
Asynchronous control schemes can be assigned to different broad categories depending on the amount of interaction between data and control. At one extreme is the case of xe2x80x9cpure bundledxe2x80x9d data, in which the data carries no information into the control. In circuits using pure bundled data, the delay of the control circuits must be accurately matched to the delay of the data path. A more general scheme can be called xe2x80x9cdata-dependentxe2x80x9d timing, in which the data carries some control information that indicates when it is valid. This enables the control system to assure data validity before processing the data.
One way to minimize control overhead and improve data performance in an asynchronous circuit is with a process known as xe2x80x9ccompletion detection,xe2x80x9d in which control logic generates a xe2x80x9cdonexe2x80x9d signal when it detects that the data output is valid. In typical prior art examples, dual-rail output is generated for every input data bit, and it is required that both rails stay xe2x80x9coffxe2x80x9d until the correct value of the output is known. While this approach can improve the average-case performance, it adds both delay to the execution time required by the data path and complexity to the datapath circuit because the control logic must monitor each stage to determine whether the output calculation of each stage is valid before the validity of the entire datapath operation can be signaled.
This invention provides a stage in a multi-stage, asynchronous datapath circuit. The stage calculates one or more data outputs as a function of one or more data inputs. In accordance with the present invention, the stage includes digital logic having multiple logical elements that calculate both internal results for use as inputs to other logical elements within the stage and final results for use as inputs to other logical elements in a next stage. An internal completion signal generator is coupled with the digital logic, and detects completion by the digital logic of the internal results or final results calculations and, in response, generates a completion signal for each calculation result detected. A done signal generator receives the completion signals and, in response to one or more preselected combination of the completion signals, generates a done signal with a predetermined delay that is at least as long as a maximum delay until the one or more data outputs are calculated.
In accordance with an another embodiment of the present invention, a stage in a multi-stage, self-timed datapath circuit includes digital logic having multiple logical elements that receive one or more data inputs and calculate both internal results for use as inputs to other logical elements within the stage, and final results for use as inputs to other logical elements in a next stage. An internal completion signal generator, is coupled with the digital logic and detects completion by the digital logic of one or more of the internal results or final results calculations, and in response generates a completion signal for each calculation result detected. A done signal generator receives the completion signals, and in response to a preselected one of the completion signals, generates a done signal with a predetermined delay, wherein the predetermined delay is at least as long as a maximum delay until the one or more data outputs are calculated.
In accordance with yet another embodiment of the invention, a control circuit for a stage in a multi-stage, self-timed datapath circuit includes an internal completion signal generator that detects completion by the digital logic of an intermediate result of the multi-step calculation, and in response generates a completion signal. A done signal generator is responsive to the completion signal, and generates a done signal with a predetermined delay, where the predetermined delay is at least as long as a maximum time for the logical elements within the stage to calculate a data output.
In accordance with yet another embodiment of the present invention, a method of predicting completion of a total stage calculation includes the steps of dividing a plurality of logical elements into multiple sections, where each logical element outputs both internal results and final results, selecting at least one section, monitoring the at least one selected section for both internal results and final results, and in response to a predetermined number of results monitored, generating a completion signal with a delay. The delay is set to an estimate of a time for completion of the total stage calculation.