1. Field of the Invention
The present invention generally relates to digital signal processing circuits and, more particularly, to control circuits for digital signal storage devices.
2. Description of the Prior Art
Digital data processors rely on stored digital signals both for control or instruction signals and data signals upon which the control functions or instructions are to be carried out. High performance and so-called processing power thus requires extremely fast access to such stored digital signals. In both processing circuits and memory circuits, the requirement for high speed of operation has led to increasing densities of integration in electronic circuit design and circuits which, once triggered, can carry out relatively complex functions autonomously. Register files and other memory structures are examples of devices which may be designed to perform such autonomous functions.
Timing constraints are of critical concern in high performance data processing and memory circuits as cycle times are reduced for higher speed. Signals require a finite amount of time to propagate through any type of electrical or electronic structure and the proper function of logic circuits requires that the intended signals be present at the inputs thereof in order to obtain the correct output. Signal propagation time is affected by many factors of circuit design such as conductor resistance and parasitic capacitances. At high densities of circuit integration, the number of circuits to which a connection is made may present severe design constraints in regard to cycle time. For example, at high integration densities, a connection such as a word or bit line of a memory will present significant RC delays and waveform distortions where the total switched capacitance, C, is dominated by the sum of device capacitances of a large number of load devices and the total resistance, R, is dominated by the resistance of the long word or bit lines of small cross-section.
To obtain highest operational speed and shortest cycle time in logic circuits employing currently available MOS technology, it is common practice to employ so-called "dynamic" logic circuits in preference to static logic circuits. Generally, in dynamic circuits the goal of maximizing the speed at which a logical function is performed (e.g. "evaluation") is achieved by minimizing the number of switching devices in the evaluation path, and by employing NMOS, rather than the slower-switching PMOS, for the majority of devices in the evaluation path. This optimization of speed of the evaluation path, or "forward path", is achieved at the cost of subsequently having to "reset", or "pro-charge", the dynamic nodes, in preparation for the next logic cycle, to a state from which they may be switched to the other logic state most rapidly, and then only when necessary in accordance with input signals which are evaluated.
Generally, the reset, or pro-charge, operations in current logic chips are ultimately derived from a global clock which is distributed throughout the chip. As chip circuit densities increase, implying larger clock-loading, and as speed requirements increase, implying tighter tolerances on clock skew, it becomes increasingly difficult to effectively distribute a global clock. In response to this difficulty, so-called "self-timed" circuits have been conceived. One such class of circuits, termed "self-resetting" CMOS, "SRCMOS", has been proposed and discussed by T. I. Chappoll et al. in IEEE J. Solid-State Circuits, vol. 26, no. 11, pp. 1577-1585, November 1991.
In SRCMOS circuits the resetting of each logic block (or "macro") is generated internally to the block, independently of the global clock. Thus far, SRCMOS designs have employed simple sequential timing chains implemented with a serial connection of a plurality of inverters and taking outputs from individual inverters in the serial chain to trigger individual reset operations. However, memories, such as register files, pose special unique problems to the design of the reset control circuits in SRCMOS, due to the fact that the densest interior stages are unavoidably significantly slower than the peripheral stages. In such circuits, having significant non-uniformities of stage delays, conventional single sequential reset timing chains do not provide adequate precision of reset timing nor adequate pulsewidth control to allow the highest performance design possible.
To convey a more complete appreciation of the design conflicts which are presented, FIG. 1 shows a circuit 10 which may be considered as operationally representative of a dynamic. In SRCMOS circuit in SRCMOS, logic data are represented as voltage pulses, rather than as the usual voltage levels of static CMOS, and during a logic cycle a dynamic node cycles through three phases: stand-by, evaluation, and reset. Specifically, during the evaluation period, a logical "1" on a circuit node is represented by a voltage pulse of either polarity (a positive pulse if stand-by is ground; a negative pulse if stand-by is Vdd, the supply voltage) and a logical "O" in the absence of a pulse--i.e., the continuation of the stand-by state. As shown in FIG. 1, a dynamic node N has at least 3 devices connected to it: an evaluation (or forward) device 12, a reset device 14, and a stand-by device 16. The dynamic node N is reset by being precharged through transistor 14 in response to a reset pulse R. An input signal representing a logical "1" as a forward path (or evaluation path) pulse having a pulse width PWF (pulse width forward) applied at terminal F, as shown in FIG. 2, will discharge dynamic node N through transistor 12. At some later time, another reset pulse R with a pulse width PWR is applied to again precharge the dynamic node N to the stand-by state which is maintained by the standby transistor 16. The control pulse S for the stand-by device is not shown in FIG. 2, but is approximately the complement of the dynamic node-N waveform.
The switching delays which occur during the charging and discharging of the dynamic node N are indicated by the legends DR and DF, respectively, in FIG. 2. It is considered generally necessary for reliable operation that the pulse widths PWF and PWR be some factor k larger than DF and DR. For example, in some technologies a value of k=5 is currently considered adequate for highly reliable operation. Additionally, to assure that transistors 12 and 14 are not concurrently significantly conductive (referred to as contention), a gap G, which is equal to or greater than zero must also be provided.
Thus it is seen that the minimum cycle time of the circuit of FIG. 2 is CT=(PWF+PWR+G) which must be .gtoreq.k(DF+DR). Therefore, for a given cycle time for a dynamic node, if DF is small, DR can be large and if DF is large, DR must be small. The pulse width of the waveform at the dynamic node, which is the forward control pulse for the subsequent logic stage, i s EQU (PWF-DF)+G+DR.gtoreq.PWF+(DR-DF).
It can therefore be appreciated that the pulse width on the dynamic node N tracks the pulse width of the forward control pulse PWF.
Since functional logic is only performed during the evaluation period (forward-path) it is desirable to design the forward-path as fast as possible and let the reset be slower, thereby investing most of the chip area and power in devices along the forward-path, and minimizing the area and power required for the reset circuits. Consequently, for a given node, DR is normally greater than DF and so, according to the above formula for the node-N pulsewidth, (DR-DF) is positive and there is a natural expansion of the forward pulsewidth along the forward-path.
To further visualize the design conflicts arising in high performance SRCMOS circuits having the characteristics summarized above, FIG. 3 is a high-level representation of a logic macro 30 fabricated in SRCMOS technology. A logic macro should be understood to comprehend any circuit having a plurality of sequential logic stages requiring a sequential series of pulses to be input thereto for proper operation or reset. The forward evaluation path comprises a mix of static circuits and n sets of resettable dynamic nodes which are reset by a block of reset pulse generation circuits 34 which generate a sequence of reset signals RS1, RS2, . . . RSn, collectively indicated by reference numeral 36. For conventional logic functions a simple timing chain, described above, and triggered by some signal 38 which occurs at an appropriate and reliably repeatable time in the forward path, is generally adequate.
However, a simple serial timing chain is substantially inflexible in that the delays provided are adjustable only in increments of the propagation delay of elements in the timing chain, such as inverters, and that the pulsewidths of the reset pulses, 36, are highly correlated. For example, if 34 is comprised of a serial chain of "balanced" inverters (equal strength PMOS and NMOS devices) the reset-pulse pulsewidths would all be equal. Nevertheless, a simple reset timing chain is generally sufficient for control of reset of logic paths in which the DF's (forward path delays) are sufficiently uniform since uniformity of DF's implies substantial uniformity of the required PWF's, PWR's and DR's for a given cycle time.
However, as pointed out above, maximization of integration density of interior device circuits in memories and register files implies that interior forward delays (DF's) will differ significantly from peripheral circuit delays. This substantial non-uniformity of DF's implies a corresponding non-uniformity in the required forward pulsewidths, PWF's. Furthermore, as mentioned previously, it is desirable to minimize the area and power overhead for the reset circuits. Hence it is desirable to make the DR's (and hence the PWR's) as large as is consistent with the logic cycle time. Thus a non-uniformity of required PWF's leads to a non-uniformity in the preferred PWR's to meet a given cycle time.
However, due to the inflexible nature of simple serial timing chains mentioned above, it is very difficult, in practice, to design a single sequential timing chain that provides for the optimum variability of the PWF's and PWR's. For example, due to the tight correlation of pulsewidths of the pulses 36 in FIG. 3, it is difficult, by means of varying the inverter balancing alone, for adjacent reset pulses, e.g. RS3 and RS4 in FIG. 3, to have very different pulsewidths. Furthermore, every change made in the timing of one reset pulse affects every other reset pulse e.g., changing RS3 affects RS4, RS5, . . . etc. to RSn.
As another example, consider a case where a very slow forward stage follows a fast stage having forward pulsewidth PWF. In that case it is desired that the subsequent forward pulse (e.g. the node-N pulsewidth in FIGS. 1 and 2) have a much larger pulsewidth than PWF. This can be achieved only by either 1.) increasing DR (in FIGS. 1 and 2), or by 2.) increasing the gap, G. If one increases DR and the corresponding PWR, then, due to the pulsewidth correlation mentioned above, the next reset pulse, which controls the reset of the slow stage will also be increased in width, thereby further increasing the cycle time of the slow stage. On the other hand, if the forward pulsewidth is increased by increasing G, the increase in G can be done only in units of 2-inverter delays, which may or may not be acceptable.
Another problem arises for fast stages which follow the slowest stage. Due to the stage-to-stage expansion of PWF's described earlier, all fast stages following the slowest stage will have PWF's larger than necessary (larger than the k.times.DF criterion) and therefore require smaller PWR's and DR's than would otherwise be provided. This latter fact implies a larger than necessary overhead in associated reset circuit area and power.
Following this progression, the last-stage, or output-stage, will exhibit the widest forward pulsewidth, which can be substantially larger than the widest required interior forward pulsewidth. In a logic path consisting of several macros it is obviously deleterious to the overall cycle time if every macro expands the forward pulsewidths.
In summary, when PWF+PWR for the slowest stage is close to the maximum allowed circuit cycle time, it is difficult, if not impossible, to achieve the required pulse widths and reset timing for an optimal design with a conventional timing reset chain structure. The delay quantization inherent in a timing chain prevents tight adjustments to the leading edges of reset pulses. Also, the natural correlation of PWR's is contrary to the need for variable PWR's in an optimal design (e.g. after the slowest interior stage). Furthermore, allowing propagation of increasing PWF's after the slowest stage causes the output pulses to be wide, the widest in the macro, and requires faster and hence larger than optimal reset circuit elements for all stages following the slowest. All of these problems are exacerbated by RC delays and distortions in extended word and reset lines.