The present invention relates generally to data processing units and more particularly to a circuit and methods for implementing sequential logic functions from autonomous, i.e., clock-less, self-validating circuits.
The traditional way of implementing logic functions has been, for decades now, to have pure combinational boolean logic inserted-between banks of binary storage devices, i.e., latches or flip-flops controlled from a single free-running signal referred to as the clock that sets the pace at which the whole logic function is operating. In this standard approach all the combinational logic required for implementing a given logic function can be decomposed in cones of logic. Each cone having one-output and, in the general case, several inputs although anything between one input and many inputs is obviously possible. Then, each output of a cone feeds an input of a latch, updated at next occurrence of a clock transition thus, remembering the current output result of the cone until it is updated again at a subsequent transition of the clock. In turn, latch outputs are possible inputs to all other cones of logic (recursive feeding is possible and is a common practice especially, when a particular state must be maintained for several cycles of the clock) so as every cone input is stable for a complete clock cycle since every latch is holding its binary information for one clock cycle. As a consequence, state of the logic may only change with the clock and any logic function built according to these principles is orderly evolving since it is under the control of a common timing device.
This way of implementing logic functions has become a standard when their complexity has dramatically increased fueled by the progresses in the integration, on a single piece of silicon, of thousands and soon, of millions of transistors. Early logic circuits tended to be more sophisticated than the simple scheme herein above described. They were often characterized by the presence of several clocks, not necessarily running in synchronism, and even by logic supplying clocks to other pieces of logic. These approaches were very prone to errors and unpredictable results were observed depending on the relative speed of the components. Moreover, logic boards built with such logic devices were often not even completely testable after fabrication. Some defects could only be found within the machine they were put in it and sometimes only at customer premises that was the most expensive way of discovering bugs. Therefore, logic designs were structured, so as they could more easily be synthesized, simulated and tested leading to the simple approach to designing logic previously described.
However, this approach hinges on a crucial requirement which is that the clock speed must be set for the worst case paths, sometimes the only worst case path, found within the logic. Moreover, worst case paths must be considered for the worst case environmental conditions of temperature and power-supply in which a particular logic will have to operate. Finally, the characteristics of the worst case fabrication lot must also be considered to decide at which speed a particular device will possibly run. Because a particular logic is designed in view of accomplishing a given task, at a given level of performance, the speed at which clock must run is in fact the starting requirement. The designer is then faced to the problem of fitting the design into this requirement for the worst case conditions, herein above mentioned, even though they are seldom, not to say never, all simultaneously encountered. In fact, a very disturbing problem of the binary boolean type of logic, universally used as of now to implement combinational logic, is that there is no associated notion of completeness. It is not possible, just observing the output of a cone, to determine if the job has been carried out or not. In fact, often, output of a cone is glitching until the longest path of the cone has settled. As a consequence the result must be assumed based on the time that has elapsed since the last transition of the clock has occurred. In practice, this means that every path in every cone of a particular design must be analyzed so as to make sure that all cone outputs are stable before the next clock transition occurs. Indeed, checking programs have been designed which scrub all the possible paths, pinpointing the ones for which delays are exceeding the period of the clock even though some may not be functional! Then, it is up to the designer to decide if those paths are indeed functional and must be corrected through another round of physical design. This is a time consuming, expensive in terms of computing resources and cumbersome job that is not always successful if the clock period is too tight for the current technology available and the logic function too complex. A dead-end situation that may become obvious only long after the physical design process has started and that may require drastic actions like restarting from scratch with a brand new approach.
So, in an attempt to facilitate the physical design and make possible the actual implementation of a logic, often, the designer is pipe-lining the design. Which means that smaller chunks of logic are inserted between banks of latches so as the paths are becoming shorter. The price to pay for that is twofold. Firstly, the result of a pipe-lined logic function becomes available only after several cycles of the clock thus, increasing the latency. Secondly, latches must be inserted where logic function has been cut even though the intermediate results are generally of no interest and are useless for the rest of the logic. This brings another limitation of the current way of designing logic dealing with the fact that a clock timing has to be precisely distributed over a whole logic function, i.e., to every latch. It is particularly important that the clock distribution structure, often referred to as the clock tree, exhibits no skew between its various branches over the whole area covered by a logic function implemented on a semiconductor chip (most of the time silicon) or at least has a skew which is lower than the best case of the shortest path present on the chip so as it is not possible to experience any short-cut that would result of a latch feeding another one too soon because their respective clocks are enough skewed to allow propagation to occur on the same master clock transition. This is another headache for the logic designers although this part might be, at some extent, handled by the provider of the Gate Array (GA) or Field Programmable Gate Array (FPGA) generally used for the Application Specific Integrated Circuits (ASIC). In which cases considerable software and hardware resources are spent either during the physical design phase of any part or initially while the particular device was devised by the manufacturer to provide numerous repowering and the possibility of a load balancing between branches so as to keep skew at a minimum. Moreover, the clock tree in itself occupies a significant portion of a chip area and dissipates much power too because it is constantly toggled at the highest frequency present on the chip.
Still another problem of clocked designs is that everything is changing on the advent of a clock transition common to every latch that triggers peaks of current through the power supply terminals of the chips and modules thus, disturbing, among other things, the ground and which may jeopardize the noise immunity of the gates and latches if not contained through a careful design of the packaging at each level, i.e., chip, module and board so as to keep the parasitic inductances as low as possible.
A mention should also be made here of the electromagnetic emissions produced by the clocked designs that may create interferences to other pieces of equipment and that must be drastically controlled so as to conform with the EMC (ElectroMagnetic Compatibility) directives in effect.
In spite of all these drawbacks, because of its simplicity and of all advantages resulting of it, synchronous (clocked) type of design has been, by far, the standard for many years. Indeed, it was possible to cope with all above cited problems because the clock frequency was reasonably low. However, the relentless quest for performance has driven the clock frequency to values expressed in hundreds of megahertz for commercially available microprocessors and in Gigahertz for their laboratory counterparts. Obviously, these are internal, on chip, clock frequencies, that cannot be sustained at the periphery of the modules actually able to run only at frequencies one order of magnitude lower. To fix the ideas the light is traveling, in a perfect medium, 30 cm within a time period of 1 nanosecond, which corresponds to the clock period of the current laboratory parts while microprocessor chips are commonly square of 1.5xc3x971.5 cm. Indeed, the speed limitation on the wires, a not so perfect medium, has become the limiting factor to increase the performance forcing the manufacturers to use very sophisticated technologies, very expensive in terms of investments, with many wiring layers (5 is becoming the standard) so as to shorten the distances and/or using materials of better electrical characteristics (copper is replacing aluminum although it was very difficult to accommodate its very undesirable secondary effects on silicium) in an attempt to reduce transmission delays.
It is therefore strongly believed that, in spite of its simplicity, and long proved capability to cope with an always increasing demand for performance, the simple synchronous (clocked) approach to designing logic functions does no longer fit well the performance requirements for the newest generations of microprocessors and ASIC""s and that new approaches, which get rid of a centralized clocking, must be considered.
Thus, it is a broad object of the invention to overcome the problems of the synchronous, clocked, logic designs.
It is a further object of the invention to allow logic functions to be capable of self-assessing readiness, i.e., to indicate when they have completed the process of a new set of inputs.
It is another object of the invention to permit that logic functions, or combinations of, supply their own clocks.
It is still another object of the invention to have logic functions always operating at their maximum speed.
A logic circuit is disclosed that is operable in a DUMMY mode and in a VALID mode. It comprises a plurality of LOGIC input and output lines each having an asserted and a de-asserted state. It also comprises a plurality of MODE input lines, for turning the logic circuit into the herein above DUMMY or VALID modes along with a plurality of MODE output lines, for detecting whether the logic circuit is operating in the DUMMY or VALID mode. Logic operations are performed between the LOGIC input lines and the LOGIC output lines whenever the VALID mode is active while the DUMMY mode is permitted to propagate throughout the logic circuit regardless of whether the LOGIC input lines are in an asserted or a de-asserted state.
The logic circuit further comprises a MODE Control for turning the logic circuit into either one of the two herein above modes also, including a PACE signal that, when asserted, turns said logic circuit into the DUMMY mode and an input for a FEEDBACK signal for toggling between the VALID mode and the DUMMY mode when the PACE signal is not asserted along with an output for a FEEDFORWARD signal for reporting what mode controlling means is currently asserting.
The logic circuit still comprises a MODE Detect for detecting in which one of the two herein above modes the logic circuit is operating, including an input for the FEEDFORWARD signal just described and an output for the FEEDBACK signal reporting if the detecting means is currently detecting the VALID or the DUMMY mode along with a STATUS for generating a first signal for requesting the assertion of new states on the LOGIC inputs and a second signal for indicating that new states are ready to be used on the LOGIC outputs.
Methods for assembling herein above logic circuits forming serial, parallel or mixed combinations, working in synchronism, in a glitch-less manner, are also disclosed.
The circuit and methods of the invention permit to carry out sequential logic functions that do not require a common clock be distributed over a complete function, such as an ASIC or a Processor, to inter-operate.