1. Field of the Invention
The present invention relates to amplifier circuits, and particularly to those amplifier circuits providing a tail current source coupled to the commonly-connected node of a differential transistor pair.
2. Description of Related Art
As is well known in the art, a basic differential amplifier configuration is advantageously implemented using a pair of transistors configured as a differential pair, and using a constant current source in the xe2x80x9ctailxe2x80x9d of the differential pair. For example, a pair of NMOS transistors form a differential transistor pair when their respective source terminals are connected in a common node, or common-source node, and their respective gate terminals are connected to respective input terminals (e.g., for receiving a differential input signal). As another example, a pair of bipolar transistors form a differential transistor pair when their respective emitter terminals are commonly-connected in a common-emitter node, and their respective base terminals are connected to respective input terminals. When using NMOS transistors in the differential transistor pair, a constant current source is frequently approximated by a single N-channel transistor with a DC bias voltage on its gate having a value which is less than the nominal common mode voltage of the two input signals connected to the gates of the differential pair. Such a configuration assumes that the current tail transistor remains saturated, which requires its drain voltage to exceed its gate voltage less a threshold. But in a traditional differential pair configuration, the drain voltage of the current tail transistor (assuming it""s the same node as the common-source node of the differential pair) must be lower in voltage than the higher of the two input signals less a threshold voltage for any current to flow through either of the differential pair transistors.
As a result, it is exceedingly difficult to use a traditional NMOS differential pair, having threshold voltages optimized for operation of the remaining circuitry of the integrated circuit, to sense a signal having a very low common-mode voltage which approaches or is below the threshold voltage (relative to, for example, the lower power supply voltage). To achieve such a low common mode input voltage, designers may use a PMOS differential pair biased toward an upper power supply. Such a PMOS differential transistor pair is usually slower than an NMOS pair, and also suffers from an analogous problem when presented with an input signal having a high common mode voltage which approaches (e.g., within a PMOS threshold voltage) the upper power supply voltage. As a result, designers frequently use both NMOS and PMOS differential transistor pairs, each connected to receive the differential input signal, with the currents through each pair summed to produce an output signal. Each of these techniques frequently adds complexity, degrades performance, or results in unwanted amplifier characteristics.
To afford the capability of sensing an input differential signal having a low common mode voltage when using, for example, an NMOS differential transistor pair, a current source device and a capacitor may be employed to provide at the common node of the differential transistor pair what appears to be a constant current source connected to a xe2x80x9cnegative voltage.xe2x80x9d
In one embodiment particularly useful when using NMOS transistors in a differential transistor pair, one terminal of a capacitor (e.g., an MOS transistor configured as a capacitor) is precharged to VDD and the other terminal is precharged to VSS. When the amplifier needs to sense its differential input signal, a control signal turns off precharge transistors and couples the capacitor terminal previously at VSS to the common-source node of a differential transistor pair. The capacitor terminal previously precharged to VDD is driven toward VSS, preferably using a controlled current source, which couples the common-source node of the differential transistor pair from VSS toward a voltage below VSS. As soon as the common-source node voltage is low enough for at least one side of the differential pair to conduct a current substantially equal to that of controlled current source, the common source node voltage is substantially clamped at that voltage. The actual voltage resulting on the common-source node depends on the transistor characteristics, the particular voltages present on the gates of the differential pair, and the magnitude of the controlled current source. For some operating conditions, this voltage may be above ground rather than below ground, but the xe2x80x9ctailxe2x80x9d current of the differential pair nonetheless remains equal to the magnitude of current through the device driving the other end of the capacitor from VDD toward VSS.
No device configured to function as a constant current source is needed between the differential transistor pair and the capacitor because the tail current of the differential transistor pair is controlled by and substantially equal to the current discharging the other end of the capacitor, which current flows through the capacitor as a displacement current. If the discharging current is a constant current, the tail current is also a constant current. In this example, discharging a node previously precharged to VDD with a constant current toward VSS may be accomplished using straightforward circuits, such as an NMOS switch transistor in series with an NMOS current mirror transistor.
In a broader embodiment of the invention useful in an integrated circuit including a differential transistor pair having a common node connecting a respective first current handling terminal of each transistor within the pair, a method of providing a tail current for the differential transistor pair includes: (1) precharging a capacitive device having a first terminal and a second terminal; (2) providing a current path from the common node of the differential transistor pair to the first terminal of the capacitive device; and (3) driving, with a first current, the voltage of the second terminal of the precharged capacitive device in a direction to capacitively couple the first terminal and, by way of the current path, the common node to respective voltages sufficient in magnitude and polarity to cause a current, substantially equal to the first current, to flow collectively through one or both transistors of the differential transistor pair; (4) wherein the first current flows as a displacement current through the capacitive device and is projected onto the common node, thereby providing a dynamic tail current for the differential transistor pair substantially equal in magnitude to the first current.
For certain values of input voltages applied to the differential transistor pair, certain transistor characteristics of the differential transistor pair, and certain magnitudes of the first current, the first terminal of the capacitive device may assume a voltage outside a range of voltages bounded by the highest power supply voltage and the lowest power supply voltage operably received by the integrated circuit. In other cases, the first terminal of the capacitive device may assume a voltage within the range of voltages bounded by the highest power supply voltage and the lowest power supply voltage. In certain embodiments, the capacitive device is disposed in close local proximity to, and uniquely associated with, the differential transistor pair. In other embodiments, the capacitive device is disposed in close local proximity to, and also uniquely associated with, two separate differential transistor pairs, and each differential transistor pair may include a respective gating circuit connecting the respective common node thereof to the first terminal of the capacitive device. In certain embodiments, the connection path consists of a direct wired connection between the common node and the first terminal of the capacitive device, or, for other embodiments, may include at least one transistor connecting the common node to the first terminal of the capacitive device.
In an apparatus embodiment of the present invention, an integrated circuit includes: (1) a first transistor and a second transistor configured as a differential transistor pair having a common node connecting respective first current handling terminals of each transistor within the pair; (2) capacitive means having a first terminal and a second terminal; (3) means for precharging the capacitive means to a voltage thereacross; (4) means for providing a current path from the common node of the differential transistor pair to the first terminal of the capacitive means; and (5) means for driving, with a first current, the voltage of the second terminal of the precharged capacitive means in a direction to capacitively couple the first terminal and, by way of the current path, the common node to respective voltages sufficient in magnitude and polarity to cause a current, substantially equal to the first current, to flow collectively through one or both transistors of the differential transistor pair; (6) wherein the first current flows as a displacement current through the capacitive means and is projected onto the common node, thereby providing a dynamic tail current for the differential transistor pair substantially equal in magnitude to the first current.
In another apparatus embodiment of the present invention, an integrated circuit includes: (1) a first transistor and a second transistor configured as a differential transistor pair having a common node connecting respective first current handling terminals of each transistor within the pair; (2) a capacitive device having a first terminal and a second terminal; (3) a precharge circuit for precharging, during a first time period, the capacitive device to an initial voltage thereacross, said first terminal of the capacitive device being coupled to a source of a first voltage and said second terminal of the capacitive device being coupled to a source of a second voltage; (4) a current path from the common node of the differential transistor pair to the first terminal of the capacitive device; and (5) a driver circuit responsive to a first control signal, for driving, with a first current during a second time period, the voltage of the second terminal of the capacitive device in a direction to capacitively couple the first terminal and, by way of the current path, the common node to respective voltages sufficient in magnitude and polarity to cause a current, substantially equal in magnitude to the first current, to flow collectively through one or both transistors of the differential transistor pair; (6) wherein the first current flows as a displacement current through the capacitive device and is projected onto the common node, thereby providing a dynamic tail current for the differential transistor pair substantially equal in magnitude to the first current.
The present invention may advantageously be used in amplifier circuits incorporated into a variety of useful circuits, including a linear amplifier circuit, a latching amplifier circuit having a separate latch enable input, and a latching amplifier circuit having no separate latch enable input, where latching occurs as a result of the differential output signal developed by the differential transistor pair. Advantageous embodiments may be fashioned using any of a variety of transistor structures, including field effect transistors as well as bipolar junction transistors.
As an example, the present invention may be advantageously used in a read amplifier in the read path of a dynamic memory array to develop signal on a generic I/O line before bit line sensing has occurred. The inputs for such a read amplifier may be connected to the bit lines, the sense amplifier nodes, a local I/O line serving, for example, a few bit line pairs, or a local output line similarly serving, for example, a few bit line pairs. The present invention may also be advantageously used in an external input buffer to sense external input signals having a small signal swing and a common-mode voltage near the lower power supply. One input for such an external input buffer may be coupled to receive an external input signal, with the other input coupled to receive a reference voltage or a complementary external input signal. Such an external input buffer may advantageously be strobed or enabled by a control signal generated relative to an external clock signal received by the integrated circuit.
The scope of the present invention in its many embodiments is defined in the appended claims. Nonetheless, the invention and its many features and advantages may be more fully appreciated in the context of exemplary implementations disclosed and described herein which combine one or more embodiments of the invention with other concepts, architectures, circuits, and structures to achieve significantly higher performance than previously achievable. For example, a high performance dynamic memory array architecture is disclosed in several embodiments, along with various embodiments of associated supporting circuitry, which afford performance approaching that usually associated with static memory arrays.
In an exemplary embodiment an 18 MBit memory array includes four banks of arrays, each including thirty-two array blocks. Each array block includes 128 horizontally-arranged row lines (i.e., word lines) and 1152 (1024xc3x979/8) vertically-arranged columns. Most internal circuitry operates using a single positive power supply voltage, VDD, and the reference voltage VSS (i.e., xe2x80x9cgroundxe2x80x9d). Each column is implemented as a complementary folded bit line pair. Four independent row decoders are provided respectively for the four banks, and are physically arranged in two pairs, thus forming two splines, one spline located between the left pair of memory banks, and the other spline located between the right pair of memory banks. Latching input buffers for address and control inputs are located within each of the splines and are connected to respective input pads by horizontally arranged input wires running through the memory banks. Two input buffers are provided for each input pad, one located in each spline. Clock lines used to strobe the various inputs are arranged vertically, running through each spline. An R-C compensation circuit between each input wire and the corresponding latching input buffer located in the particular spline nearest its respective input pad provides a delay to the xe2x80x9cupstreamxe2x80x9d buffer which compensates for the additional wiring delay in reaching the xe2x80x9cdownstreamxe2x80x9d buffer, and which allows all of the latching input buffers to be driven by phase-aligned clock signals, and still achieve a very narrow worst case setup and hold time over all such inputs. The use of a separate input buffer in each spline for each address and control input, requiring additional interconnect wire to connect each input pad to its input buffer in the xe2x80x9cfarxe2x80x9d spline (above and beyond the interconnect wire to connect each input pad to its input buffer in the xe2x80x9cnearxe2x80x9d spline), increases the input capacitance of each address and control input to the chip (which input capacitance, of course, must be driven by the source of the external signal). However, the complementary internal outputs for each such input buffer may be buffered immediately by self-resetting buffers, and need only drive decoder and/or control circuitry locally within the same spline. Thus, the total capacitive loading on the complementary outputs of each buffer are advantageously reduced and are more balanced between the various buffers.
The row decoder uses predecoding to reduce the total line capacitance driven during an active cycle. The final stages of the row decoder includes an N-channel tree configuration driven by VDD-level (i.e., VSS-to-VDD level) pre-decoded address signals to select and discharge to VSS a particular decode node which was precharged to VPP. Subsequent buffering stages provide a final 1-of-4 decode and drive the selected word line to a VPP voltage that is substantially independent of VDD, rather than driving the selected word line to VDD or to a voltage which is a ratio of VDD. There are no race conditions within the decoder, even though it accomplishes a level shifting from VDD-level signals to VPP-level word lines.
The VPP voltage is internally generated by a charge pump type circuit and its output is a substantially fixed voltage independent of process and environmental corner which is regulated with respect to VSS (i.e., ground). For typical operating voltage, the VPP voltage is somewhat higher than VDD, although at low operating voltage the VPP voltage may be substantially higher than VDD, while at high operating voltage, the VPP voltage may be similar in magnitude to the VDD voltage. Preferably the VPP voltage is chosen to be near the maximum voltage that the field effect transistors (FETs) can safely tolerate. Since the VPP is regulated to be substantially independent of variations in the VDD voltage, the VPP level is advantageously at a higher voltage than would otherwise be safe, and tolerances in the VPP voltage level which would otherwise be necessary to account for variations in the VDD level are unnecessary.
If the semiconductor technology allows, transistors which are exposed to the VPP level (e.g., transistors whose gate terminal is driven at any time to the VPP level while the source or drain terminal might be at ground, such as the memory array access transistors and various array select transistors, or those transistors whose drain or source terminal is driven at any time to the VPP level while the gate terminal might be at ground) are preferably implemented using a thicker gate dielectric than the majority of the other transistors which are never exposed to such a high differential voltage across gate-to-drain or gate-to-source terminals. Moreover, it is also preferable to limit the voltage across any transistor using the thin gate dielectric to no more than VDD. Transistors exposed to any voltage which is greater than the VDD level are preferably implemented with the thick gate dielectric and are limited in voltage to the VPP level, which is a fixed voltage substantially independent of the VDD voltage. Consequently, transistors exposed to such internally xe2x80x9cboostedxe2x80x9d voltages need only withstand a relatively fixed, predictable voltage level (e.g., by using a bandgap reference in the circuit which regulates the VPP voltage) and do not need to withstand even higher voltages which might otherwise be produced by a xe2x80x9cboostedxe2x80x9d voltage generator whose output voltage is a ratio of VDD (e.g., 1.5xc3x97VDD). The voltage across the memory cell capacitors is limited to less than one-half VDD (e.g., limited to about 1.0 volts for certain embodiments). A third dielectric material, thinner than the xe2x80x9cthinxe2x80x9d capacitor dielectric required for typical DRAM memory cells (which must normally support a voltage of one-half the maximum allowed VDD voltage) may be advantageously used to fabricate the memory cell capacitors to provide additional storage capacitance per unit area.
Within each memory bank, a row of sense amplifiers is implemented in the holes between each pair of array blocks. Each sense amplifier is shared between two pairs of bit linesxe2x80x94one pair located within the array block above the sense amplifier and the other pair located within the array block below the sense amplifier. The complementary internal nodes within each sense amplifier are respectively connected to the true and complement bit lines above the sense amplifier by a first pair of N-channel array select transistors whose gates are driven to VSS (to isolate the sense amplifier nodes from the bit line pair) or driven to VPP (to connect the sense amplifier nodes to the bit line pair), and are further connected to the pair of bit lines below the sense amplifier by a second pair of array select transistors whose gates are likewise switchable from VSS to VPP. A row of sense amplifiers is implemented above the top array block and another row of sense amplifiers is implemented below the bottom array block of the given memory bank, which serve half of the bit lines within the top and bottom array blocks, respectively. For any particular array block, half of the bit line pairs are served by a sense amplifier located above the array block, and the remaining half are served by a sense amplifier located below the array block. A pair of array select transistors having a gate voltage switchable between VSS and VPP connects any given pair of bit lines to the complementary internal sense amplifier nodes within the corresponding sense amplifier.
An amplifier in the read path is used to develop signal on a generic I/O line before bit line sensing has occurred. Such a generic I/O line may include a global output line, a column line, or an I/O line. This amplifier may be connected to the bit lines, the sense amplifier nodes, a local I/O line serving, for example, a few bit line pairs, or a local output line similarly serving, for example, a few bit line pairs. If the read amplifier inputs are connected directly to the bit line sense amplifier nodes (i.e., one read amplifier per bit line sense amplifier), the column select function may be advantageously used to enable the amplifier for the selected column, while if the read amplifier inputs are connected to local output or I/O lines (i.e., one read amplifier per group of bit line sense amplifiers), the column select function may be used to couple the selected bit line sense amplifier to the local output or I/O lines. If the common mode voltage of the read amplifier input nodes is so low that current flow through the tail of an N-channel differential pair cannot be assured for all voltage or process corners, the amplifier may incorporate a coupling circuit to capacitively couple the tail of the differential pair downward, preferably using a controlled current source, to approximate a constant current source to a negative supply voltage.
In a certain embodiment, each read amplifier""s inputs are connected to the internal nodes of a corresponding bit line sense amplifier. The respective outputs of a group of read amplifiers are connected in common to a horizontally-arranged differential pair of local output lines. One such amplifier is enabled at a time by column select circuitry to develop signal on the pair of local output lines. A second stage amplifier then further buffers this signal and drives a pair of vertically-arranged global output lines. The global output lines extend the full height of the memory bank, with half preferably extending beyond the memory bank to I/O circuits above the memory bank, with the remaining half extending beyond the memory bank to I/O circuits below the memory bank. In certain embodiments, the second stage amplifier may also include a multiplexer to choose between two different pairs of local output lines (e.g., a first pair of local output lines serving 8 sense amplifiers located to the left of the second stage amplifier, and a second pair of local output lines serving 8 sense amplifiers located to the right of the second stage amplifier).
The word lines within the array blocks may be implemented in a polysilicon layer and strapped using a later-processed metal layer to reduce word line delays. Such word line straps are preferably implemented using two different layers of metal (preferably the two xe2x80x9clowestxe2x80x9d layers, metal-1 and metal-2) in order to match the word line pitch without requiring any distributed buffers or final decode buffers. The read amplifiers used to sense a local output line and subsequently drive a global output line may be advantageously located above word line straps where a break in the memory cell stepping already occurs. This allows the read amplifier block to more readily be laid out in the center of a group of bit line sense amplifier and column select circuits. As such, the bit line sense amplifier pitch may be slightly less than twice the column pitch (recalling that half of the bit line sense amplifiers are above the array block and the remaining half below the array block).
The bit line sense amplifiers each are implemented using a full CMOS cross-coupled latch. To sense the signal on a pair of bit lines, both the cross-coupled N-channel pair of transistors (i.e., the NMOS sense amplifier) and the cross-coupled P-channel pair of transistors (i.e., the PMOS sense amplifier) which form the CMOS sense amplifier are enabled at substantially the same time. The NMOS sense amplifier drives the bit line having a lower voltage toward VSS, while the PMOS sense amplifier drives the bit line having a higher voltage toward VDD. If enabled a sufficiently long time, the lower bit line substantially reaches VSS and the higher bit line would be driven substantially all the way to VDD. However, the PMOS sensing is terminated before the higher bit line substantially reaches the full VDD voltage. This allows the bit line to quickly be driven to a high level without having to wait for the xe2x80x9cexponential tailxe2x80x9d if it were driven all the way to VDD. The internal sense amplifier nodes and the near end of the bit lines are actually driven above and overshoot the final high bit line xe2x80x9crestorexe2x80x9d level (e.g., 2.0 volts for a device operating at a VDD of 2.5 volts) before the PMOS sensing is terminated, whereas the far end of the high bit lines have not yet reached the final high bit line xe2x80x9crestorexe2x80x9d level when the PMOS sensing is terminated. Then, after the PMOS sensing is terminated, charge is shared between the near end and far end of the bit lines, thus speeding up the far end reaching the final high bit line xe2x80x9crestorexe2x80x9d level because the effective time constant of the resistive bit line is cut in half.
Since the word line and array select lines are left high for some time even after the PMOS sense amplifier is turned off, charge sharing between the sense amplifier nodes, the near and far ends of the bit lines, and the memory cell storage node itself contribute to determining the final high restore level which is xe2x80x9cwrittenxe2x80x9d back into the selected memory cell. When compared to having a full VDD level on a high bit line, the relatively low final xe2x80x9chighxe2x80x9d bit line voltage (e.g., 2.0 volts) transfers into the selected memory cell more quickly due to the higher gate-to-source voltage of the memory cell access transistor.
The NMOS sensing is preferably continued, even after the PMOS sensing has stopped, to more adequately drive the bit line having the lower voltage (the xe2x80x9clow-goingxe2x80x9d bit line) to a substantially full VSS level. This ensures that, if the selected memory cell happens to be coupled to the low-going bit line, a substantially full VSS level is restored into the selected memory cell. This also ensures that all the low-going bit lines (not just those having a selected memory cell connected thereto) are filly discharged before, at the end of the cycle, the high and low bit lines share their charge to set the bit line equilibrate voltage. The selected word line (which is driven when active to the VPP level) is then brought low as the NMOS sensing is terminated, after which the array block is automatically taken into precharge.
Timing circuitry is used to time the simultaneous start of both NMOS and PMOS sensing relative to the timing of the selected word line being driven high, to time the end of PMOS sensing, and to time the simultaneous end of NMOS sensing and the selected word line being brought low. The PMOS sense timing duration may be designed to decrease as the VDD voltage increases to ensure a written high level which is substantially independent of VDD, even over process and temperature corners. For example, the timing may be set to ensure a written high level on the high bit line (and into the selected memory cell) of about 2.0 volts for a device having a VDD voltage range from 2.3 to 2.9 volts. Such a PMOS sense timing generator may be accomplished by using a dummy bit line and sense amplifier structure (activated substantially before the main sense amplifiers are activated), detecting when the PMOS sensing needs to be turned off to achieve a final high voltage of about 2.0 volts on the dummy sense amplifier and bit line structure; then buffering this timing signal to control the turn off time of the PMOS sense enable signals for the regular sense amplifiers within the memory arrays. The PMOS timing may alternatively be accomplished using a string of inverters powered at a voltage a fixed amount below VDD, or by other techniques to achieve a timing which is a combination of several variables, such as power supply voltage VDD, bandgap voltage, transistor threshold voltage and transconductance, temperature, or others.
In a preferred embodiment, the sense amplifier timing circuitry produces three main timing signals. The first timing signal is used to control, relative to the timing of the selected word line being driven high, the simultaneous start of both the NMOS and PMOS sensing. A second timing signal is used to control, relative to the simultaneous start of NMOS and PMOS sensing, the duration of the PMOS sensing, and a third timing signal is used to control, relative to the end of the PMOS sensing, when to simultaneously end the NMOS sensing and bring the selected word line back low. Each of these timing signals is independently generated, although the circuitry used for each may share portions with another. These three timing signals define three timing intervals. The timing interval xe2x80x9ct1xe2x80x9d begins with the selected word line being driven high and ends with the simultaneously start of both the NMOS and PMOS sensing (i.e., the timing interval xe2x80x9ct1xe2x80x9d is the amount of time the selected word line is high before sensing). The timing interval xe2x80x9ct2xe2x80x9d extends from the simultaneous start of NMOS and PMOS sensing to the end of PMOS sensing (i.e., the timing interval xe2x80x9ct2xe2x80x9d is the duration of the PMOS sensing). The timing interval xe2x80x9ct3xe2x80x9d extends from the end of the PMOS sensing to the simultaneous end of the NMOS sensing and discharge of the selected word line (i.e., the timing interval xe2x80x9ct3xe2x80x9d is the amount of time the word line remains high after the end of PMOS sensing).
The timing interval t1 essentially controls how much signal from the memory cell reaches the sense amplifier before starting the NMOS and PMOS sensing. A short t1 may not provide enough time for all the charge in a selected memory cell to fully share with the charge on the bit line and sense amplifier nodes, and consequently the sense amplifier begins to sense with less signal than would be developed if, alternatively, a longer t1 were configured. A longer t1 increases operating margins at the expense of increased cycle time. Similarly, the timing interval t2 essentially controls how much charge is driven onto the high-going sense amplifier node, bit line, and memory cell during sensing. Increasing t2 increases the voltage stored into the memory cell, but also increases the bit line equilibrate voltage when charge is later shared between true and complement bit lines (and sense amplifier nodes). A short t2 may not provide enough charge to develop the desired restored high level (e.g., 2.0 volts) on the bit line and into a selected memory cell. Conversely, an excessively long t2 timing may not increase the stored high level in the memory cell as much as it increases the bit line equilibrate voltage, and thus may decrease the high level signal available for sensing, particularly at high VDD. The timing interval t3 essentially controls how much charge is shared between the sense amplifier node, the near end and far end of a high-going bit line (which typically is moderately resistive), and the memory cell. The resistance of the NMOS memory cell access transistor is much higher when restoring a high level (due to its lower gate-to-source voltage) than when restoring a low level. The t3 timing is constrained by the time needed to write a high voltage into the selected memory cell through the resistive bit line and further through the relatively high-resistance memory cell access transistor. A short t3 may result in a worst case memory cell (one located at the xe2x80x9cfarxe2x80x9d end of a bit line, furthest from its bit line sense amplifier) being written to a restored high level which is too low, for a given amount of xe2x80x9cQxe2x80x9d transferred into the sense amplifiers (i.e., for the bit line equilibration voltage which results from the given amount of xe2x80x9cQxe2x80x9d).
These timing intervals t1, t2, and t3 may be collectively optimized on a chip-by-chip basis. In a preferred embodiment, there may be sixteen different timing settings, each specifying a particular combination of the t1, t2, and t3 timing intervals, ranging from very aggressive for highest performance, to very relaxed for highest yield. For example, the timing setting xe2x80x9c1xe2x80x9d may provide for the most aggressive (i.e., shortest) t1 timing interval, the most aggressive (i.e., shortest) t2 timing interval, and the most aggressive (i.e., shortest) t3 timing interval. The timing setting xe2x80x9c16xe2x80x9d may provide for the most relaxed t1 timing interval, the most relaxed t2 timing interval, and the most relaxed t3 timing interval. Each incremental timing setting between xe2x80x9c1xe2x80x9d and xe2x80x9c16xe2x80x9d is preferably optimized to incrementally increase, by a similar amount, the signal available at the bit line sense amplifier just before sensing. To accomplish this, the timing setting xe2x80x9c2xe2x80x9d may increase the t1 interval by 200 ps compared to the xe2x80x9cmost aggressivexe2x80x9d t1 value of timing setting xe2x80x9c1,xe2x80x9d while keeping t2 and t3 unchanged (a 200 ps increase may be easily achieved by adding two inverters to the logic path setting the time interval). The timing setting xe2x80x9c3xe2x80x9d may increase t3 by 200 ps while keeping the same value of the t1 and t2 intervals as in timing setting xe2x80x9c1.xe2x80x9d Each successive low-numbered timing setting preferably increases the value of one of the three timing intervals t1, t2, and t3 relative to their values in the previous timing setting, while keeping the remaining two timing intervals unchanged. Higher numbered timing settings may increase a given timing interval by increasingly larger amounts to maintain a similar increase in the signal available at the bit line sense amplifier just before sensing, or may increase more than one of the three timing intervals. For example, the timing setting xe2x80x9c15xe2x80x9d may increase t1 and t3 each by 400 ps relative to the respective intervals in timing setting xe2x80x9c14xe2x80x9d (compared to a 200 ps increase in only t3 between timing setting xe2x80x9c2xe2x80x9d and xe2x80x9c3xe2x80x9d).
The timing setting xe2x80x9c8xe2x80x9d is preferably optimized to provide a xe2x80x9cnominalxe2x80x9d value for each of the three timing intervals t1, t2, and t3 which is expected to be an appropriate setting for a typical device having typical transistor characteristics, typical sense amplifier offset voltage, typical bit line resistance, etc. Note that these xe2x80x9cnominalxe2x80x9d values of the timing intervals t1, t2, and t3 are a function of the process corner. Higher bit line resistance, higher access transistor threshold voltage, or lower VPP, for example, raise the nominal value of each of the t1, t2, and t3 timing intervals which are called for by timing setting xe2x80x9c8.xe2x80x9d For the preferred embodiment, the various timing settings provide a variety of t1 intervals, some shorter than nominal and others longer than nominal, and provide a variety of t3 intervals, both shorter and longer than nominal. But since the duration of the PMOS sensing is so short for the nominal case, for some embodiments the shortest t2 interval provided is the xe2x80x9cnominalxe2x80x9d value, and more relaxed t2 intervals are provided for in the timing settings numbered above xe2x80x9c8.xe2x80x9d
During manufacture, this timing setting xe2x80x9c8xe2x80x9d is configured as the default setting. During a special test mode (for example, at wafer sort) the timing setting may be temporarily made more or less aggressive to determine the window of operation for each chip. Some of the memory devices are found to function correctly with very aggressive timing, while others require more relaxed timing. Then, during the fuse blowing sequence for redundancy, timing fuses may be also blown to permanently modify the default strobe timing. The timing setting is preferably set as aggressively as possible to enhance device performance, while maintaining adequate sense amplifier signal margins for reliability. For example, if a timing setting of xe2x80x9c4xe2x80x9d is the most aggressive timing for which a given device functions without error, then the device may be advantageously fuse programmed to a timing setting of xe2x80x9c6xe2x80x9d to ensure some additional operating margin (the signal to the bit line sense amplifiers increasing as the timing setting increases). At a later test, such as at final test of a packaged device, the test mode may still be entered, and the timing setting advanced from its then fuse programmed setting to a more aggressive setting, in order to further verify adequate sense amplifier margins on a chip-by-chip basis, independent of which actual timing setting was fuse programmed into the device.
A two-dimensional grid of power buses is preferably implemented within each memory bank, with large VDD and VSS buses arranged parallel to the bit lines and implemented in a higher layer of metal (e.g., the top layer), vertically passing above the bit lines. Filter capacitors are located at the ends of each array block as well as at the top and bottom of each memory bank to help provide additional bypass capacitance to withstand the large current spikes which occur during sensing. These filter capacitors, as well as other filter capacitors implemented elsewhere within the device, are preferably implemented using multiple, independent capacitors which are individually de-coupled and automatically switched out of the circuit if, at any time, more than a predetermined leakage current is detected automatically by the memory device as flowing through a given capacitor (i.e., a xe2x80x9cshortedxe2x80x9d capacitor). The large metal buses allow this stored charge to reach the two selected rows of sense amplifiers (i.e., located in the holes above and below the selected array block) with very little voltage drop, and allow the sense amplifiers to latch quickly and provide a good VSS low level.
The bit lines are equilibrated together to achieve an equilibration voltage on the bit lines, for a preferred embodiment, of approximately 1.0 volts. The bit lines are preferably equilibrated at both ends to reduce the required equilibrate time. The bit line equilibration voltage is coupled from all bit line pairs to a common node which may be sampled just after equilibration and buffered (using a sample-and-hold amplifier) to drive the memory cell plate. Since the bit line equilibration voltage is approximately one-half the written high level, the bit line equilibration voltage may also be sampled, compared to a reference voltage (for example, a 1.0 volt reference), and any voltage difference used to adjust the PMOS timing (and thereby adjust the final written high level).
As stated above, the exemplary memory array is automatically taken back into precharge without waiting for a control signal. In other words, one edge of a clock causes the memory array to execute a useful cycle, then to automatically reset itself in preparation for a new cycle. This precharge timing is relative to the beginning of the active cycle. Of significance, this limits the amount of potential sub-threshold leakage through memory cell access transistors by limiting the time that any bit lines are at VSS. The precharging/equilibration is accomplished by using two sets of signalsxe2x80x94one is an automatically timed pulse, while the other stays on until the start of the next cycle. For example, the bit line sense amplifiers are preferably equilibrated using two different equilibrate signals. Both turn on automatically at the same time after NMOS sensing is complete and the selected word line is brought low. One equilibrate signal is turned off by a timed pulse just when the bit line equilibration is substantially complete (i.e., at the end of the active cycle), while the other equilibrate signal is turned off by the start of the subsequent cycle. The pulsed equilibrate signal drives much larger internal capacitive loads, such as large equilibration devices, while the non-pulsed equilibrate signal drives fewer and/or much smaller devices which indeed assist the larger pulsed equilibrate devices in equilibrating the various nodes. However, the smaller devices are largely included as xe2x80x9ckeepersxe2x80x9d to maintain the equilibration until the next active cycle. As such, the total capacitance of the various equilibration signal lines which must be discharged (i.e., brought low) at the start of new cycle is greatly reduced and can be accomplished with less delay after the initiating control signal, and the performance is enhanced. For relaxed clock cycle times, the pulsed equilibrate signal falls automatically at the end of a cycle, while the non-pulsed equilibrate signal stays high until the next cycle selecting this array block is initiated. However, for a clock cycle time which approaches the fastest possible cycle time for a given device, the non-pulsed equilibrate signal for the newly selected array block may be discharged by the initiation of the next cycle at substantially the same time as the pulsed equilibrate signal for the previously selected array block is discharged automatically at the end of the previous cycle. To save power, the non-pulsed equilibrate signal for only the selected array block and supporting circuitry is brought to VSS at the start of an active cycle, and all others remain inactive at VDD throughout the active cycle. Similarly, the pulsed equilibrate signal for only the selected array block and supporting circuitry is actually pulsed at the end of an active cycle, while all others remain inactive at VSS.
During an internal write operation, the exemplary device contains write circuitry that supplies a small differential voltage to the sense amplifier before bit line sensing, the polarity of the voltage depending on the data to be written. The circuitry furthermore xe2x80x9cswallowsxe2x80x9d the voltage otherwise developed in the sense amplifier by the selected memory cell. Then, during their normal latching, the bit line sense amplifiers then xe2x80x9cwritexe2x80x9d the level into the memory cell. Because of an internal write queue, the data to be written is already available when the actual internal write operation is started. In preparation for the current write operation, this data is preferably driven onto the global input lines late in the previous write operation, and then coupled to the selected sense amplifier by column select circuitry fairly early in the current write operation, before latching the bit line sense amplifiers. The magnitude of the write signal coupled onto the sense amplifier nodes is kept small to reduce power consumption and to reduce disturbance to the neighboring bit lines and sense amplifiers which are not being written. Preferably, the magnitude of the write signal imparted onto any given sense amplifier node is no higher than that normally developed during a read operation, so that coupling to the neighboring bit lines and sense amplifiers is no worse than during a read operation. The global input lines serving the next word to be written are equilibrated after each write operation, preferably to the bit line equilibration voltage, and driven to the new data state for the next write operation, even if the next write operation is not the next cycle. Moreover, the differential voltage on the global input lines serving the next word to be written is equilibrated away (in a write cycle) after bit line sensing has started and the column select lines are inactive (i.e., during the later stages of bit line sensing), and then driven to reflect the new write data for the following write cycle before the bit lines have finished equilibrating, rather than driving these data input signals during the early part of bit line sensing when such movement could disturb the bit line sensing. The global input lines then dynamically float until needed by the next write operation. To handle the possibility that the next write operation may be many cycles later, the global input lines may be refreshed periodically (e.g., every 256 external clock cycles, before any leakage current can substantially modify their voltage) by re-equilibrating and re-driving to ensure the proper magnitude of the write data signal for as long as necessary until the next write operation occurs.
By writing a dynamic memory array by xe2x80x9cfoolingxe2x80x9d the sense amplifier and letting it actually restore the voltage levels onto the bit lines in accordance with the data to be written, rather than in accordance with the data previously in the selected memory cell, a write cycle takes the same very short time as a read cycle, rather than the longer time that would be required by first sensing old data, then modifying it. In addition, a significant amount of power is saved by not having to over-power many sense amplifiers after they have already been latched.
During power-up, all the memory cells are initialized to a low voltage under automatic internal control. Provision is made to allow every word line to simultaneously go high, to force the node to which the bit lines are equilibrated to VSS, and to ensure that the bit line equilibration and array select transistors are on. Since each sense amplifier is then coupled to a common node at VSS by precharge signals, each bit line (both true and complement) is driven to VSS and all memory cells are likewise forced to VSS, even if the word lines are no higher than a threshold voltage above VSS. At about the same time, the memory cell plate is established at a voltage near the eventual bit line equilibration voltage (preferably around 1.0 volts) by other power-up circuits, being careful to limit the current flow, which charges the cell plate, to an amount less than the output current of the substrate bias charge pump (to prevent the substrate from coupling positively and causing massive latchup from the diffused regions of each memory cell""s internal node). Then, when normal cycles begin, the very first operation in the memory array occurs with memory array nodes (bit lines, cell plate) properly established, and all memory cells initialized at one of the two valid states (in this example, at VSS). The first cycles do not have to try to sense memory cells having an initialized voltage near the bit line equilibration voltage, as would likely occur without such a power-up sequence due to coupling from the memory cell plate to the memory cells themselves as the memory cell plate reaches its normal level at the bit line equilibration voltage of, for example, 1.0 volts. This prevents any bit line sense amplifiers which are not being written from spending time in a meta-stable state which, if allowed to occur, would affect the high level restored into the memory cells being written, as well as the equilibrate voltage resulting on the bit lines.
During a read operation, signal developed on the bit lines by the selected memory cell is immediately buffered by the local output line amplifier(s) before bit line sensing starts, and immediately starts to develop signal on the pair of global output lines. For certain embodiments, the differential signal propagates through lines and differential amplifiers to the output buffers, whose first stage is a latching amplifier which is then strobed to detect, amplify, and latch this signal. The timing of the strobe signal for this latching amplifier (which may be known as xe2x80x9ct4xe2x80x9d) may be optimized on a chip-by-chip basis. There may be, for example, eight possible strobe timings, from very aggressive to very relaxed. The device may be initially configured with an intermediate default strobe timing (e.g., having a value of xe2x80x9c4,xe2x80x9d where xe2x80x9c1xe2x80x9d is the most aggressive and xe2x80x9c8xe2x80x9d is the most relaxed), and during a special test mode (for example, at wafer sort) the strobe timing may be made more or less aggressive to determine the window of operation for each chip. Then, during the fuse blowing sequence for redundancy, timing fuses may be also blown to modify the default strobe timing. The timing is modified to be as aggressive as possible while maintaining adequate margins for reliability. For example, if in the test mode a t4 timing of xe2x80x9c2xe2x80x9d is the fastest timing for which a given device functions without error, then the device may be advantageously fuse programmed to a t4 timing of xe2x80x9c3xe2x80x9d or not altered to remain at xe2x80x9c4xe2x80x9d to ensure sufficient operating margin. At a later test, such as at final test of a packaged device, the test mode may again be entered, and the t4 timing advanced from its then fuse programmed setting to a more aggressive setting (e.g., 1 or 2 settings faster than its new programmed timing setting without needing to know the new programmed timing setting), in order to further verify adequate operating margins on a chip-by-chip basis, independent of which actual timing setting was fuse programmed into the device.
In an alternative embodiment of a memory array having a cycle time which is long compared to its read access time, a latching global output line amplifier may be strobed (at what was time t4 in the earlier embodiment) to detect and amplify the signal on the pair of global output lines, and communicate the sensed data onward through output multiplexer circuitry and ultimately (if the particular global output line is selected) to output buffer circuitry. The timing of the global output line amplifier may be selected to support both a flow-through configuration as well as a pipelined configuration. To support a fast flow-through access time specification, the latching global output amplifier is aggressively strobed as soon as a predetermined amount of signal has developed on the global output lines. In this way, the data propagates to and is available at the outputs as quickly as possible. But with this aggressive timing, some devices may fail. Conversely, when in the pipelined mode of operation, the global output latch timing is relaxed to more closely coincide with the global output signal peak, and the sensed data is provided to the output buffers for driving to the output pins during the next cycle (using a PLL or delay-locked loop). By affording additional time for even more signal to develop on the global output lines, a particular device which may be marginal or may even fail at the fast t4 timing of the flow-through mode may prove to have adequate margin at the more relaxed timing of the pipelined mode, and may be sold for use and guaranteed to operate only in the pipelined mode of operation.
Bit line crossover structures are advantageously used to achieve lower worst case coupling, during both read or write operations, onto a particular bit line pair from neighboring bit lines on either side. Because photolithographic guard cells are used at the edges of each arrayed group of memory cells, there is a layout area penalty in providing crossover structures including the required guard cells on either side of each crossover structure. To reduce this area penalty, a novel crossover arrangement is employed, for certain embodiments, which provides a significant degree of noise (i.e., coupling) reduction while requiring only one crossover. Within each array block, each complementary pair of bit lines runs vertically from the top to the bottom of the array block. The true bit line and complement bit line of a first pair run adjacent to each other from the top to the bottom of the array block without any crossovers. The true bit line and complement bit line of a second pair do not run adjacent to each other, but instead straddle the first pair (i.e., both true and complement bit lines of the first pair lie between the true and complement bit lines of the second pair), with a single crossover half-way down the second bit line pair (vertically in the middle of the array block). This crossover arrangement repeats horizontally throughout each array block in groups of two pairs of bit lines (four physical bit line wires). By using this crossover arrangement, only four groups of guard cells are required in each array blockxe2x80x94one each at the top and bottom of the array block, and one each at the top and bottom of the single crossover structure located in the vertical center of the array block.
The address and data for a write cycle are queued to eliminate dead cycles on the system data bus. In the exemplary embodiment operated in the pipelined mode, the address for a read cycle is strobed during one cycle, and the corresponding data read from the selected memory cells is driven onto the external data pins during a subsequent cycle. If an external write cycle follows immediately after an external read cycle, the write address may be presented to the address bus and strobed into the memory device just like for a read cycle, but the external bidirectional data bus is occupied with driving the data out corresponding to an earlier external read cycle (by a number of cycles depending on the pipeline latency for a particular embodiment) and cannot be used to present the corresponding write data. Instead, the data for the external write cycle is driven onto the data bus and presented to the device during the cycle in which output data would have appeared had the cycle been an external read cycle instead of an external write cycle. In this way, the address bus and the data bus are used every cycle, with no wasted cycles for either bus. Both the write address and data are queued, the actual write operation to physically store the write data into the selected memory cells is postponed until a subsequent write cycle, which then, when executed, retires the previously received address and data from the write queue into the memory array. Read bypass circuitry is provided which allows data corresponding to the address of the read cycle to be correctly read from the write queue whenever an earlier queued write directed to that same address has not yet been retired.
In the exemplary embodiment, the internal data path is twice as wide (i.e., a xe2x80x9cdouble wordxe2x80x9d) as the external I/O word width (i.e., the least significant address bit selects one of the two possible 36-bit words), and a significant degree of internal power consumption is saved by merging external write cycles when sequential write addresses occur. The address of a given external write cycle is stored and compared to the address of the next external write cycle. If the selected memory cells to be written in both external write cycles correspond to the same physical word line and the same column within the same array block of the same memory bank (i.e., differ in only the least significant address bit), the internal write operation which would otherwise follow from the first external write cycle is delayed, and the data to be written is queued and merged with the data to be written in the second external write cycle. The write queue then xe2x80x9cretiresxe2x80x9d both queued write requests by performing a single internal write operation, simultaneously writing both data words received in the first and second external write cycles. If the internal data path were wider than 72-bits, then more than two 36-bit write cycles could be merged into a single internal write operation. For example, if the internal data path were 144-bits wide, then four 36-bit write cycles could conceivably be merged into a single internal write operation.
The exemplary embodiment includes a burst mode of operation which provides, during subsequent cycles, read or write access to sequential addressed memory cells relative to a received (i.e., xe2x80x9cloadxe2x80x9d) address, without requiring such sequential addresses be presented to the device. Using the 72-bit wide (double word) organization of each memory bank, two 36-bit words are retrieved from the memory array in the first cycle. The second word is saved to present to the data outputs after the first word is output. Because the exemplary device is organized into separate memory banks, a burst of four sequential words may transcend the address boundaries between memory banks. Consequently, the exemplary device includes provision for automatically initiating a load cycle in another memory bank during a burst cycle.
In certain embodiments, a dynamic memory array using the architecture and supporting circuits described above achieves random access cycles (each requiring a new random row access) at a sustained rate in excess of 200 MHz operation, even when each new row access is within the same array block of the same memory bank.
The present invention may be better understood, and its numerous objects, features, and advantages made even more apparent to those skilled in the art by referencing the detailed description and accompanying drawings of the embodiments described below.