1. Field of the Invention
The present invention relates to semiconductor memories, and particularly to those driving a selected word line in a memory array to a boosted voltage level.
2. Description of Related Art
Semiconductor random-access memory devices or sub-systems using arrays of dynamic memory cells (e.g., 1-transistor/1-capacitor (1T/1C) cells) have consistently provided greater density and lower cost per bit than those using static memory cells (e.g., 6-transistor (6T) cells, or 4-transistor/2-resistor (4T/2R) cells). However, such dynamic random-access memory arrays have historically also been lower in performance when compared to static random-access memory arrays. Consequently, system designers have typically chosen dynamic memory arrays (e.g., commercially available dynamic random access memories, or DRAMs) when high density and low cost are required, such as for CPU main memory applications. Conversely, designers have typically chosen static memory arrays when the highest possible performance is required, such as for cache memory and high speed buffer applications. Examples of static memory array devices or sub-systems include commercially available static random access memories (SRAMs) and CPU-resident on-board cache memory sub-systems.
The reasons often cited for the lower performance of dynamic memory arrays include the destructive sensing of all memory cells common to the addressed word line (encountered in virtually all dynamic memory arrays) and the consequential need to restore data back into each sensed memory cell during the active cycle, the need to equilibrate bit lines and various other differential nodes and to precharge various circuit nodes between active cycles, and the requirement for periodic refreshing of all dynamic memory cells.
Traditionally, N-channel dynamic memory arrays have provided for boosting a selected word line to a voltage above the VDD power supply voltage. Moreover, a high-going bit line is usually restored to a fall VDD level by the bit line sense amplifier. If the selected word line is boosted to a voltage that is more than an NMOS threshold voltage above VDD, then a selected memory cell coupled to the high-going bit line can be restored to a fall VDD voltage level. Many techniques may be employed to generate such a boosted voltage, but inevitably most techniques rely on a capacitively-coupled method which results in a boosted voltage that is a ratio of VDD. If, at low VDD, a boosting capacitor is adequately sized to generate a boosted voltage sufficiently greater than a threshold voltage above VDD, then at high VDD the same size capacitor provides a boosted voltage that is higher than VDD by far more than a threshold voltage. The additional voltage stress placed upon devices within the integrated circuit can degrade the reliability of the integrated circuit, or alternatively would place more stringent requirements on the semiconductor process to be able to safely tolerate such higher voltages. As both the horizontal and vertical dimensions of devices continue to shrink, it is increasingly difficult and performance sacrificing to fabricate a semiconductor process that is capable of tolerating such variation of maximum voltage.
To provide for an internally-generated boosted voltage without such wide voltage extremes, a VPP voltage is internally generated by a charge pump type circuit whose output is a substantially fixed voltage which is regulated with respect to VSS (i.e., ground). It is substantially independent of VDD over process and environmental variations. The VPP voltage is used by the row decoders to drive a selected word line from VSS to VPP, and preferably to boost selected array select signals from VDD to VPP, rather than driving the selected word line to VDD or to a voltage which is a ratio of VDD.
For typical operating voltages, the VPP voltage is somewhat higher than VDD, although at low operating voltage the VPP voltage may be substantially higher than VDD, while at very high operating voltage, the VPP voltage may be similar in magnitude to the VDD voltage. Preferably the VPP voltage is chosen to be near the maximum voltage that the field effect transistors (FETs) can safely tolerate. Since the VPP is regulated to be substantially independent of variations in the VDD voltage, at low operating voltage the VPP level is advantageously at a higher voltage than would otherwise be safe, and tolerances in the VPP voltage level which would otherwise be necessary to account for variations in the VDD level are unnecessary.
In a preferred embodiment, the VPP generator includes a plurality of pump circuits, each connected to the VPP output, and each controlled by a common control circuit. Each such pump circuit is enabled to pump according to the amount of charge which is needed at a particular time, based on the measured level of both VDD and VPP. A first regulator compares various fractions of the VDD voltage to an internally generated bandgap voltage, while a second regulator compares various lower fractions of the VPP voltage to the bandgap voltage. If VDD is low, then more of the pump circuits are enabled for a given cycle (or alternatively, enabling one or more pump circuits having a higher aggregate pumping capacity). As VDD increases, fewer such pump circuits are enabled. Similarly, if VPP is particularly low (such as during power-up), then all the pump circuits are enabled, while if VPP is already high enough, then none of the pump circuits are enabled. In a preferred embodiment, none of the pump circuits are enabled if VPP exceeds 4.0 volts, while all of the pump circuits are enabled if VPP is less than 3.8 volts. Between 3.8 and 4.0 volts, the measured values of both VPP and VDD determine how many and what pumping capacity of pump circuits are enabled.
For a given VPP and VDD voltage there are a fixed number of pump circuits enabled. As VDD increases slightly, the charge per cycle increases, even though the same number of pump circuits are enabled, because the VDD is increasing. However, as VDD further increases slightly, one less pump is enabled, so the charge per cycle abruptly decreases. Then as VDD further increases, the charge per cycle again increases because VDD is increasing. When plotted as a function against VDD, the charge per pump cycle thus appears as a sawtooth waveform, which decreases abruptly as each such pump circuit is successively disabled. The pump circuits are preferably not uniformly sized, but instead each size determined individually so that the charge per pump cycle, when plotted as a sawtooth waveform against VDD, varies from min-to-max as little as possible over the range of VDD.
A significant amount of internal de-coupling (i.e., filtering) capacitance on the VPP node is provided by the various row decoder and array select circuits which are unselected during a given cycle. For example, the last two buffers within each row decoder provide in aggregate a large effective capacitance. Taken together, such capacitances provide a significant reservoir of charge on the VPP node without requiring separate devices or structures. A test control signal preferably causes the regulated value of VPP to decrease by a small amount (e.g., 200 mV) when in certain test modes to ensure reliable operation of the memory device when VPP is actually lower than the minimum expected VPP voltage. By using such test modes, adequate operating margins for normal operation may be more easily assured.
If the semiconductor technology allows, transistors which are exposed to the VPP level (e.g., transistors whose gate terminal is driven at any time to the VPP level while the source or drain terminal might be at ground, such as the memory array access transistors and various array select transistors, or those transistors whose drain or source terminal is driven at any time to the VPP level while the gate terminal might be at ground) are preferably implemented using a thicker gate dielectric than the majority of the other transistors which are never exposed to such a high differential voltage across gate-to-drain or gate-to-source terminals. Moreover, it is also preferable to limit the voltage across any transistor using the thin gate dielectric to no more than VDD. Transistors exposed to any voltage which is greater than the VDD level are preferably implemented with the thick gate dielectric and are limited in voltage to the VPP level, which is a fixed voltage substantially independent of the VDD voltage. Consequently, transistors exposed to such internally xe2x80x9cboostedxe2x80x9d voltages need only withstand a relatively fixed, predictable voltage level (e.g., by using a bandgap reference in the circuit which regulates the VPP voltage) and do not need to withstand even higher voltages which might otherwise be produced by a xe2x80x9cboostedxe2x80x9d voltage generator whose output voltage is a ratio of VDD (e.g., 1.5xc3x97VDD).
In one embodiment of the present invention, an integrated circuit includes: (1) a memory array including a plurality of memory cells, each memory cell coupled to an associated one of a plurality of word lines within the memory array; (2) a power supply terminal for receiving, relative to a ground potential, a power supply voltage operably coupled to the integrated circuit, by which voltage most circuits associated with the memory array are generally powered; (3) a voltage generator circuit for generating, on an output node thereof, a boosted voltage having a regulated magnitude, relative to the ground potential, that is substantially independent of the power supply voltage, over at least an expected range of possible operating values for the power supply voltage; and (4) a row decoder circuit coupled to receive the boosted voltage, for decoding a selected word line and driving the selected word line to the boosted voltage during a memory operation.
In another embodiment of the present invention, a method of operating an integrated circuit having a memory array includes: (1) receiving, relative to a ground potential, a power supply voltage operably coupled to the integrated circuit, by which voltage most circuits associated with the memory array are generally powered; (2) generating a boosted voltage having a regulated magnitude, relative to the ground potential, that is substantially independent of the power supply voltage, over at least an expected range of possible operating values for the power supply voltage; and (3) driving a selected word line to the boosted voltage during a memory operation.
In yet another embodiment of the present invention, an integrated circuit includes: (1) a memory array including a plurality of memory cells, each memory cell coupled to an associated one of a plurality of word lines within the memory array; (2) a power supply terminal for receiving, relative to a ground potential, a power supply voltage operably coupled to the integrated circuit, by which voltage most circuits associated with the memory array are generally powered; (3) means for generating a boosted voltage having a regulated magnitude, relative to the ground potential, that is substantially independent of the power supply voltage, over at least an expected range of possible operating values for the power supply voltage; and (4) means for driving a selected word line to the boosted voltage during a memory operation.
In still another embodiment of the present invention, an integrated circuit includes: (1) a memory array including a plurality of memory cells, each memory cell coupled to an associated one of a plurality of word lines within the memory array; (2) a power supply terminal for receiving, relative to a ground potential, a VDD power supply voltage operably coupled to the integrated circuit; (3) a voltage generator circuit for generating, on an output node thereof, a boosted VPP voltage that is nominally greater in magnitude than the VDD power supply voltage, over at least an expected range of possible operating values for the VDD power supply voltage; and (4) a row decoder circuit coupled to receive the VPP voltage, for decoding a selected word line and driving the selected word line to the VPP voltage. The voltage generator circuit includes a controller circuit, and a plurality of pump circuits responsive to the controller circuit for periodically coupling electronic charge onto the VPP output node of the voltage generator circuit, thereby tending to increase the magnitude of the VPP voltage. Each of the plurality of pump circuits may be independently enabled by the controller circuit to couple a corresponding amount of electronic charge per pump cycle onto the VPP output node of the voltage generator circuit. For a given value of the VPP voltage, the controller circuit is arranged to enable one or more pump circuits having, in aggregate, incrementally more total pump capacitance as the magnitude of the VDD voltage tends correspondingly further below its nominal value, and to enable one or more pump circuits having, in aggregate, incrementally less total pump capacitance as the magnitude of the VDD voltage tends correspondingly further above its nominal value, thereby tending to maintain a more uniform value of total pumped charge per cycle over an operating range of the VDD voltage. Also, each of the plurality of pump circuits are individually sized so that, as the VDD voltage varies over its anticipated operating range, and as the number and pumping capacity of pump circuits that are enabled varies accordingly, the peak magnitude of electronic charge pumped per cycle, just before each change in the selection of pump circuits so enabled, is substantially uniform over the anticipated operating range of the VDD voltage.
Another integrated circuit embodiment of the present invention includes a voltage generator circuit for generating, on an output node thereof, a boosted VPP voltage that is nominally greater in magnitude than a VDD power supply voltage operably coupled to the integrated circuit, over at least an expected range of possible operating values for the VDD power supply voltage. The voltage generator circuit includes a controller circuit, and a plurality of pump circuits responsive to the controller circuit for periodically coupling electronic charge onto the VPP output node of the voltage generator circuit, thereby tending to increase the magnitude of the VPP voltage. Each of the plurality of pump circuits may be independently enabled by the controller circuit to couple a corresponding amount of electronic charge per pump cycle onto the VPP output node of the voltage generator circuit. For a given value of the VPP voltage, the controller circuit is arranged to enable one or more pump circuits having, in aggregate, incrementally more total pump capacitance as the magnitude of the VDD voltage tends correspondingly further below its nominal value, and to enable one or more pump circuits having, in aggregate, incrementally less total pump capacitance as the magnitude of the VDD voltage tends correspondingly further above its nominal value, thereby tending to maintain a more uniform value of total pumped charge per cycle over an operating range of the VDD voltage. Also, each of the plurality of pump circuits are individually sized so that, as the VDD voltage varies over its anticipated operating range, and as the number and pumping capacity of pump circuits that are enabled varies accordingly, the peak magnitude of electronic charge pumped per cycle, just before each change in the selection of pump circuits so enabled, is substantially uniform over the anticipated operating range of the VDD voltage.
The scope of the present invention in its many embodiments is defined in the appended claims. Nonetheless, the invention and its many features and advantages may be more fully appreciated in the context of exemplary implementations disclosed and described herein which combine one or more embodiments of the invention with other concepts, architectures, circuits, and structures to achieve significantly higher performance than previously achievable. For example, a high performance dynamic memory array architecture is disclosed in several embodiments, along with various embodiments of associated supporting circuitry, which afford performance approaching that usually associated with static memory arrays.
In an exemplary embodiment an 18 MBit memory array includes four banks of arrays, each including thirty-two array blocks. Each array block includes 128 horizontally-arranged row lines (i.e., word lines) and 1152 (1024xc3x979/8) vertically-arranged columns. Most internal circuitry operates using a single positive power supply voltage, VDD, and the reference voltage VSS (i.e., xe2x80x9cgroundxe2x80x9d). Each column is implemented as a complementary folded bit line pair. Four independent row decoders are provided respectively for the four banks, and are physically arranged in two pairs, thus forming two splines, one spline located between the left pair of memory banks, and the other spline located between the right pair of memory banks. Latching input buffers for address and control inputs are located within each of the splines and are connected to respective input pads by horizontally arranged input wires running through the memory banks. Two input buffers are provided for each input pad, one located in each spline. Clock lines used to strobe the various inputs are arranged vertically, running through each spline. An R-C compensation circuit between each input wire and the corresponding latching input buffer located in the particular spline nearest its respective input pad provides a delay to the xe2x80x9cupstreamxe2x80x9d buffer which compensates for the additional wiring delay in reaching the xe2x80x9cdownstreamxe2x80x9d buffer, and which allows all of the latching input buffers to be driven by phase-aligned clock signals, and still achieve a very narrow worst case setup and hold time over all such inputs. The use of a separate input buffer in each spline for each address and control input, requiring additional interconnect wire to connect each input pad to its input buffer in the xe2x80x9cfarxe2x80x9d spline (above and beyond the interconnect wire to connect each input pad to its input buffer in the xe2x80x9cnearxe2x80x9d spline), increases the input capacitance of each address and control input to the chip (which input capacitance, of course, must be driven by the source of the external signal). However, the complementary internal outputs for each such input buffer may be buffered immediately by self-resetting buffers, and need only drive decoder and/or control circuitry locally within the same spline. Thus, the total capacitive loading on the complementary outputs of each buffer are advantageously reduced and are more balanced between the various buffers.
The row decoder uses predecoding to reduce the total line capacitance driven during an active cycle. The final stages of the row decoder includes an N-channel tree configuration driven by VDD-level (i.e., VSS-to-VDD level) pre-decoded address signals to select and discharge to VSS a particular decode node which was precharged to VPP. Subsequent buffering stages provide a final 1-of-4 decode and drive the selected word line to the VPP voltage that is substantially independent of VDD, rather than driving the selected word line to VDD or to a voltage which is a ratio of VDD. There are no race conditions within the decoder, even though it accomplishes a level shifting from VDD-level signals to VPP-level word lines.
If the semiconductor technology allows, transistors which are exposed to the VPP level are preferably implemented using a thicker gate dielectric than the majority of the other transistors which are never exposed to such a high differential voltage across gate-to-drain or gate-to-source terminals. Moreover, it is also preferable to limit the voltage across any transistor using the thin gate dielectric to no more than VDD. The voltage across the memory cell capacitors is limited to less than one-half VDD (e.g., limited to about 1.0 volts for certain embodiments). A third dielectric material, thinner than the xe2x80x9cthinxe2x80x9d capacitor dielectric required for typical DRAM memory cells (which must normally support a voltage of one-half the maximum allowed VDD voltage) may be advantageously used to fabricate the memory cell capacitors to provide additional storage capacitance per unit area.
Within each memory bank, a row of sense amplifiers is implemented in the holes between each pair of array blocks. Each sense amplifier is shared between two pairs of bit linesxe2x80x94one pair located within the array block above the sense amplifier and the other pair located within the array block below the sense amplifier. The complementary internal nodes within each sense amplifier are respectively connected to the true and complement bit lines above the sense amplifier by a first pair of N-channel array select transistors whose gates are driven to VSS (to isolate the sense amplifier nodes from the bit line pair) or driven to VPP (to connect the sense amplifier nodes to the bit line pair), and are further connected to the pair of bit lines below the sense amplifier by a second pair of array select transistors whose gates are likewise switchable from VSS to VPP. A row of sense amplifiers is implemented above the top array block and another row of sense amplifiers is implemented below the bottom array block of the given memory bank, which serve half of the bit lines within the top and bottom array blocks, respectively. For any particular array block, half of the bit line pairs are served by a sense amplifier located above the array block, and the remaining half are served by a sense amplifier located below the array block. A pair of array select transistors having a gate voltage switchable between VSS and VPP connects any given pair of bit lines to the complementary internal sense amplifier nodes within the corresponding sense amplifier.
An amplifier in the read path is used to develop signal on a generic I/O line before bit line sensing has occurred. Such a generic I/O line may include a global output line, a column line, or an I/O line. This amplifier may be connected to the bit lines, the sense amplifier nodes, a local I/O line serving, for example, a few bit line pairs, or a local output line similarly serving, for example, a few bit line pairs. If the read amplifier inputs are connected directly to the bit line sense amplifier nodes (i.e., one read amplifier per bit line sense amplifier), the column select function may be advantageously used to enable the amplifier for the selected column, while if the read amplifier inputs are connected to local output or I/O lines (i.e., one read amplifier per group of bit line sense amplifiers), the column select function may be used to couple the selected bit line sense amplifier to the local output or I/O lines. If the common mode voltage of the read amplifier input nodes is so low that current flow through the tail of an N-channel differential pair cannot be assured for all voltage or process corners, the amplifier may incorporate a coupling circuit to capacitively couple the tail of the differential pair downward, preferably using a controlled current source, to approximate a constant current source to a negative supply voltage.
In a certain embodiment, each read amplifier""s inputs are connected to the internal nodes of a corresponding bit line sense amplifier. The respective outputs of a group of read amplifiers are connected in common to a horizontally-arranged differential pair of local output lines. One such amplifier is enabled at a time by column select circuitry to develop signal on the pair of local output lines. A second stage amplifier then further buffers this signal and drives a pair of vertically-arranged global output lines. The global output lines extend the fall height of the memory bank, with half preferably extending beyond the memory bank to I/O circuits above the memory bank, with the remaining half extending beyond the memory bank to I/O circuits below the memory bank. In certain embodiments, the second stage amplifier may also include a multiplexer to choose between two different pairs of local output lines (e.g., a first pair of local output lines serving 8 sense amplifiers located to the left of the second stage amplifier, and a second pair of local output lines serving 8 sense amplifiers located to the right of the second stage amplifier).
The word lines within the array blocks may be implemented in a polysilicon layer and strapped using a later-processed metal layer to reduce word line delays. Such word line straps are preferably implemented using two different layers of metal (preferably the two xe2x80x9clowestxe2x80x9d layers, metal-1 and metal-2) in order to match the word line pitch without requiring any distributed buffers or final decode buffers. The read amplifiers used to sense a local output line and subsequently drive a global output line may be advantageously located above word line straps where a break in the memory cell stepping already occurs. This allows the read amplifier block to more readily be laid out in the center of a group of bit line sense amplifier and column select circuits. As such, the bit line sense amplifier pitch may be slightly less than twice the column pitch (recalling that half of the bit line sense amplifiers are above the array block and the remaining half below the array block).
The bit line sense amplifiers each are implemented using a full CMOS cross-coupled latch. To sense the signal on a pair of bit lines, both the cross-coupled N-channel pair of transistors (i.e., the NMOS sense amplifier) and the cross-coupled P-channel pair of transistors (i.e., the PMOS sense amplifier) which form the CMOS sense amplifier are enabled at substantially the same time. The NMOS sense amplifier drives the bit line having a lower voltage toward VSS, while the PMOS sense amplifier drives the bit line having a higher voltage toward VDD. If enabled a sufficiently long time, the lower bit line substantially reaches VSS and the higher bit line would be driven substantially all the way to VDD. However, the PMOS sensing is terminated before the higher bit line substantially reaches the fall VDD voltage. This allows the bit line to quickly be driven to a high level without having to wait for the xe2x80x9cexponential tailxe2x80x9d if it were driven all the way to VDD. The internal sense amplifier nodes and the near end of the bit lines are actually driven above and overshoot the final high bit line xe2x80x9crestorexe2x80x9d level (e.g., 2.0 volts for a device operating at a VDD of 2.5 volts) before the PMOS sensing is terminated, whereas the far end of the high bit lines have not yet reached the final high bit line xe2x80x9crestorexe2x80x9d level when the PMOS sensing is terminated. Then, after the PMOS sensing is terminated, charge is shared between the near end and far end of the bit lines, thus speeding up the far end reaching the final high bit line xe2x80x9crestorexe2x80x9d level because the effective time constant of the resistive bit line is cut in half.
Since the word line and array select lines are left high for some time even after the PMOS sense amplifier is turned off, charge sharing between the sense amplifier nodes, the near and far ends of the bit lines, and the memory cell storage node itself contribute to determining the final high restore level which is xe2x80x9cwrittenxe2x80x9d back into the selected memory cell. When compared to having a full VDD level on a high bit line, the relatively low final xe2x80x9chighxe2x80x9d bit line voltage (e.g., 2.0 volts) transfers into the selected memory cell more quickly due to the higher gate-to-source voltage of the memory cell access transistor.
The NMOS sensing is preferably continued, even after the PMOS sensing has stopped, to more adequately drive the bit line having the lower voltage (the xe2x80x9clow-goingxe2x80x9d bit line) to a substantially full VSS level. This ensures that, if the selected memory cell happens to be coupled to the low-going bit line, a substantially full VSS level is restored into the selected memory cell. This also ensures that all the low-going bit lines (not just those having a selected memory cell connected thereto) are fully discharged before, at the end of the cycle, the high and low bit lines share their charge to set the bit line equilibrate voltage. The selected word line (which is driven when active to the VPP level) is then brought low as the NMOS sensing is terminated, after which the array block is automatically taken into precharge.
Timing circuitry is used to time the simultaneous start of both NMOS and PMOS sensing relative to the timing of the selected word line being driven high, to time the end of PMOS sensing, and to time the simultaneous end of NMOS sensing and the selected word line being brought low. The PMOS sense timing duration may be designed to decrease as the VDD voltage increases to ensure a written high level which is substantially independent of VDD, even over process and temperature corners. For example, the timing may be set to ensure a written high level on the high bit line (and into the selected memory cell) of about 2.0 volts for a device having a VDD voltage range from 2.3 to 2.9 volts. Such a PMOS sense timing generator may be accomplished by using a dummy bit line and sense amplifier structure (activated substantially before the main sense amplifiers are activated), detecting when the PMOS sensing needs to be turned off to achieve a final high voltage of about 2.0 volts on the dummy sense amplifier and bit line structure, then buffering this timing signal to control the turn off time of the PMOS sense enable signals for the regular sense amplifiers within the memory arrays. The PMOS timing may alternatively be accomplished using g a string of inverters powered at a voltage a fixed amount below VDD, or by other techniques to achieve a timing which is a combination of several variables, such as power supply voltage VDD, bandgap voltage, transistor threshold voltage and transconductance, temperature, or others.
In a preferred embodiment, the sense amplifier timing circuitry produces three main timing signals. The first timing signal is used to control, relative to the timing of the selected word line being driven high, the simultaneous start of both the NMOS and PMOS sensing. A second timing signal is used to control, relative to the simultaneous start of NMOS and PMOS sensing, the duration of the PMOS sensing, and a third timing signal is used to control, relative to the end of the PMOS sensing, when to simultaneously end the NMOS sensing and bring the selected word line back low. Each of these timing signals is independently generated, although the circuitry used for each may share portions with another. These three timing signals define three timing intervals. The timing interval xe2x80x9cT1xe2x80x9d begins with the selected word line being driven high and ends with the simultaneously start of both the NMOS and PMOS sensing (i.e., the timing interval xe2x80x9ct1xe2x80x9d is the amount of time the selected word line is high before sensing). The timing interval xe2x80x9ct2xe2x80x9d extends from the simultaneous start of NMOS and PMOS sensing to the en d of PMOS sensing (i.e., the timing interval xe2x80x9ct2xe2x80x9d is the duration of the PMOS sensing). The timing interval xe2x80x9ct3xe2x80x9d extends from the en d of the PMOS sensing to the simultaneous end of the NMOS sensing and discharge of the selected word line (i.e., the timing interval xe2x80x9ct3xe2x80x9d is the amount of time the word line remains high after the end of PMOS sensing).
The timing interval t1 essentially controls how much signal f from the memory cell reaches the sense amplifier before starting the NMOS and PMOS sensing. A short t1 may not provide enough time for all the charge in a selected memory cell to fully share with the charge on the bit line and sense amplifier nodes, and consequently the sense amplifier begins to sense with less signal than would be developed if, alternatively, a longer t1 were configured. A longer t1 increases operating margins at the expense of increased cycle time. Similarly, the timing interval t2 essentially controls how much charge is driven onto the high-going sense amplifier node, bit line, and memory cell during sensing. Increasing t2 increases the voltage stored into the memory cell, but also increases the bit line equilibrate voltage when charge is later shared between true and complement bit lines (and sense amplifier nodes). A short t2 may not provide enough charge to develop the desired restored high level (e.g., 2.0 volts) on the bit line and into a selected memory cell. Conversely, an excessively long t2 timing may not increase the stored high level in the memory cell as much as it increases the bit line equilibrate voltage, and thus may decrease the high level signal available for sensing, particularly at high VDD. The timing interval t3 essentially controls how much charge is shared between the sense amplifier node, the near end and far end of a high-going bit line (which typically is moderately resistive), and the memory cell. The resistance of the NMOS memory cell access transistor is much higher when restoring a high level (due to its lower gate-to-source voltage) than when restoring a low level. The t3 timing is constrained by the time needed to write a high voltage into the selected memory cell through the resistive bit line and further through the relatively high-resistance memory cell access transistor. A short t3 may result in a worst case memory cell (one located at the xe2x80x9cfarxe2x80x9d end of a bit line, furthest from its bit line sense amplifier) being written to a restored high level which is too low, for a given amount of xe2x80x9cQxe2x80x9d transferred into the sense amplifiers (i.e., for the bit line equilibration voltage which results from the given amount of xe2x80x9cQxe2x80x9d).
These timing intervals t1, t2, and t3 may be collectively optimized on a chip-by-chip basis. In a preferred embodiment, there may be sixteen different timing settings, each specifying a particular combination of the t1, t2, and t3 timing intervals, ranging from very aggressive for highest performance, to very relaxed for highest yield. For example, the timing setting xe2x80x9c1xe2x80x9d may provide for the most aggressive (i.e., shortest) t1 timing interval, the most aggressive (i.e., shortest) t2 timing interval, and the most aggressive (i.e., shortest) t3 timing interval. The timing setting xe2x80x9c16xe2x80x9d may provide for the most relaxed t1 timing interval, the most relaxed t2 timing interval, and the most relaxed t3 timing interval. Each incremental timing setting between xe2x80x9c1xe2x80x9d and xe2x80x9c16xe2x80x9d is preferably optimized to incrementally increase, by a similar amount, the signal available at the bit line sense amplifier just before sensing. To accomplish this, the timing setting xe2x80x9c2xe2x80x9d may increase the t1 interval by 200 ps compared to the xe2x80x9cmost aggressivexe2x80x9d t1 value of timing setting xe2x80x9c1,xe2x80x9d while keeping t2 and t3 unchanged (a 200 ps increase may be easily achieved by adding two inverters to the logic path setting the time interval). The timing setting xe2x80x9c3xe2x80x9d may increase t3 by 200 ps while keeping the same value of the t1 and t2 intervals as in timing setting xe2x80x9c1.xe2x80x9d Each successive low-numbered timing setting preferably increases the value of one of the three timing intervals t1, t2, and t3 relative to their values in the previous timing setting, while keeping the remaining two timing intervals unchanged. Higher numbered timing settings may increase a given timing interval by increasingly larger amounts to maintain a similar increase in the signal available at the bit line sense amplifier just before sensing, or may increase more than one of the three timing intervals. For example, the timing setting xe2x80x9c15xe2x80x9d may increase t1 and t3 each by 400 ps relative to the respective intervals in timing setting xe2x80x9c14xe2x80x9d (compared to a 200 ps increase in only t3 between timing setting xe2x80x9c2xe2x80x9d and xe2x80x9c3xe2x80x9d).
The timing setting xe2x80x9c8xe2x80x9d is preferably optimized to provide a xe2x80x9cnominalxe2x80x9d value for each of the three timing intervals t1, t2, and t3 which is expected to be an appropriate setting for a typical device having typical transistor characteristics, typical sense amplifier offset voltage, typical bit line resistance, etc. Note that these xe2x80x9cnominalxe2x80x9d values of the timing intervals t1, t2, and t3 are a function of the process corner. Higher bit line resistance, higher access transistor threshold voltage, or lower VPP, for example, raise the nominal value of each of the t1, t2, and t3 timing intervals which are called for by timing setting xe2x80x9c8.xe2x80x9d For the preferred embodiment, the various timing settings provide a variety of t1 intervals, some shorter than nominal and others longer than nominal, and provide a variety of t3 intervals, both shorter and longer than nominal. But since the duration of the PMOS sensing is so short for the nominal case, for some embodiments the shortest t2 interval provided is the xe2x80x9cnominalxe2x80x9d value, and more relaxed t2 intervals are provided for in the timing settings numbered above xe2x80x9c8.xe2x80x9d
During manufacture, this timing setting xe2x80x9c8xe2x80x9d is configured as the default setting. During a special test mode (for example, at wafer sort) the timing setting may be temporarily made more or less aggressive to determine the window of operation for each chip. Some of the memory devices are found to function correctly with very aggressive timing, while others require more relaxed timing. Then, during the fuse blowing sequence for redundancy, timing fuses may be also blown to permanently modify the default strobe timing. The timing setting is preferably set as aggressively as possible to enhance device performance, while maintaining adequate sense amplifier signal margins for reliability. For example, if a timing setting of xe2x80x9c4xe2x80x9d is the most aggressive timing for which a given device functions without error, then the device may be advantageously fuse programmed to a timing setting of xe2x80x9c6xe2x80x9d to ensure some additional operating margin (the signal to the bit line sense amplifiers increasing as the timing setting increases). At a later test, such as at final test of a packaged device, the test mode may still be entered, and the timing setting advanced from its then fuse programmed setting to a more aggressive setting, in order to further verify adequate sense amplifier margins on a chip-by-chip basis, independent of which actual timing setting was fuse programmed into the device.
A two-dimensional grid of power buses is preferably implemented within each memory bank, with large VDD and VSS buses arranged parallel to the bit lines and implemented in a higher layer of metal (e.g., the top layer), vertically passing above the bit lines. Filter capacitors are located at the ends of each array block as well as at the top and bottom of each memory bank to help provide additional bypass capacitance to withstand the large current spikes which occur during sensing. These filter capacitors, as well as other filter capacitors implemented elsewhere within the device, are preferably implemented using multiple, independent capacitors which are individually de-coupled and automatically switched out of the circuit if, at any time, more than a predetermined leakage current is detected automatically by the memory device as flowing through a given capacitor (i.e., a xe2x80x9cshortedxe2x80x9d capacitor). The large metal buses allow this stored charge to reach the two selected rows of sense amplifiers (i.e., located in the holes above and below the selected array block) with very little voltage drop, and allow the sense amplifiers to latch quickly and provide a good VSS low level.
The bit lines are equilibrated together to achieve an equilibration voltage on the bit lines, for a preferred embodiment, of approximately 1.0 volts. The bit lines are preferably equilibrated at both ends to reduce the required equilibrate time. The bit line equilibration voltage is coupled from all bit line pairs to a common node which may be sampled just after equilibration and buffered (using a sample-and-hold amplifier) to drive the memory cell plate. Since the bit line equilibration voltage is approximately one-half the written high level, the bit line equilibration voltage may also be sampled, compared to a reference voltage (for example, a 1.0 volt reference), and any voltage difference used to adjust the PMOS timing (and thereby adjust the final written high level).
As stated above, the exemplary memory array is automatically taken back into precharge without waiting for a control signal. In other words, one edge of a clock causes the memory array to execute a useful cycle, then to automatically reset itself in preparation for a new cycle. This precharge timing is relative to the beginning of the active cycle. Of significance, this limits the amount of potential sub-threshold leakage through memory cell access transistors by limiting the time that any bit lines are at VSS. The precharging/equilibration is accomplished by using two sets of signalsxe2x80x94one is an automatically timed pulse, while the other stays on until the start of the next cycle. For example, the bit line sense amplifiers are preferably equilibrated using two different equilibrate signals. Both turn on automatically at the same time after NMOS sensing is complete and the selected word line is brought low. One equilibrate signal is turned off by a timed pulse just when the bit line equilibration is substantially complete (i.e., at the end of the active cycle), while the other equilibrate signal is turned off by the start of the subsequent cycle. The pulsed equilibrate signal drives much larger internal capacitive loads, such as large equilibration devices, while the non-pulsed equilibrate signal drives fewer and/or much smaller devices which indeed assist the larger pulsed equilibrate devices in equilibrating the various nodes. However, the smaller devices are largely included as xe2x80x9ckeepersxe2x80x9d to maintain the equilibration until the next active cycle. As such, the total capacitance of the various equilibration signal lines which must be discharged (i.e., brought low) at the start of new cycle is greatly reduced and can be accomplished with less delay after the initiating control signal, and the performance is enhanced. For relaxed clock cycle times, the pulsed equilibrate signal falls automatically at the end of a cycle, while the non-pulsed equilibrate signal stays high until the next cycle selecting this array block is initiated. However, for a clock cycle time which approaches the fastest possible cycle time for a given device, the non-pulsed equilibrate signal for the newly selected array block may be discharged by the initiation of the next cycle at substantially the same time as the pulsed equilibrate signal for the previously selected array block is discharged automatically at the end of the previous cycle. To save power, the non-pulsed equilibrate signal for only the selected array block and supporting circuitry is brought to VSS at the start of an active cycle, and all others remain inactive at VDD throughout the active cycle. Similarly, the pulsed equilibrate signal for only the selected array block and supporting circuitry is actually pulsed at the end of an active cycle, while all others remain inactive at VSS.
During an internal write operation, the exemplary device contains write circuitry that supplies a small differential voltage to the sense amplifier before bit line sensing, the polarity of the voltage depending on the data to be written. The circuitry furthermore xe2x80x9cswallowsxe2x80x9d the voltage otherwise developed in the sense amplifier by the selected memory cell. Then, during their normal latching, the bit line sense amplifiers then xe2x80x9cwritexe2x80x9d the level into the memory cell. Because of an internal write queue, the data to be written is already available when the actual internal write operation is started. In preparation for the current write operation, this data is preferably driven onto the global input lines late in the previous write operation, and then coupled to the selected sense amplifier by column select circuitry fairly early in the current write operation, before latching the bit line sense amplifiers. The magnitude of the write signal coupled onto the sense amplifier nodes is kept small to reduce power consumption and to reduce disturbance to the neighboring bit lines and sense amplifiers which are not being written. Preferably, the magnitude of the write signal imparted onto any given sense amplifier node is no higher than that normally developed during a read operation, so that coupling to the neighboring bit lines and sense amplifiers is no worse than during a read operation. The global input lines serving the next word to be written are equilibrated after each write operation, preferably to the bit line equilibration voltage, and driven to the new data state for the next write operation, even if the next write operation is not the next cycle. Moreover, the differential voltage on the global input lines serving the next word to be written is equilibrated away (in a write cycle) after bit line sensing has started and the column select lines are inactive (i.e., during the later stages of bit line sensing), and then driven to reflect the new write data for the following write cycle before the bit lines have finished equilibrating, rather than driving these data input signals during the early part of bit line sensing when such movement could disturb the bit line sensing. The global input lines then dynamically float until needed by the next write operation. To handle the possibility that the next write operation may be many cycles later, the global input lines may be refreshed periodically (e.g., every 256 external clock cycles, before any leakage current can substantially modify their voltage) by re-equilibrating and re-driving to ensure the proper magnitude of the write data signal for as long as necessary until the next write operation occurs.
By writing a dynamic memory array by xe2x80x9cfoolingxe2x80x9d the sense amplifier and letting it actually restore the voltage levels onto the bit lines in accordance with the data to be written, rather than in accordance with the data previously in the selected memory cell, a write cycle takes the same very short time as a read cycle, rather than the longer time that would be required by first sensing old data, then modifying it. In addition, a significant amount of power is saved by not having to over-power many sense amplifiers after they have already been latched.
During power-up, all the memory cells are initialized to a low voltage under automatic internal control. Provision is made to allow every word line to simultaneously go high, to force the node to which the bit lines are equilibrated to VSS, and to ensure that the bit line equilibration and array select transistors are on. Since each sense amplifier is then coupled to a common node at VSS by precharge signals, each bit line (both true and complement) is driven to VSS and all memory cells are likewise forced to VSS, even if the word lines are no higher than a threshold voltage above VSS. At about the same time, the memory cell plate is established at a voltage near the eventual bit line equilibration voltage (preferably around 1.0 volts) by other power-up circuits, being careful to limit the current flow, which charges the cell plate, to an amount less than the output current of the substrate bias charge pump (to prevent the substrate from coupling positively and causing massive latchup from the diffused regions of each memory cell""s internal node). Then, when normal cycles begin, the very first operation in the memory array occurs with memory array nodes (bit lines, cell plate) properly established, and all memory cells initialized at one of the two valid states (in this example, at VSS). The first cycles do not have to try to sense memory cells having an initialized voltage near the bit line equilibration voltage, as would likely occur without such a power-up sequence due to coupling from the memory cell plate to the memory cells themselves as the memory cell plate reaches its normal level at the bit line equilibration voltage of, for example, 1.0 volts. This prevents any bit line sense amplifiers which are not being written from spending time in a meta-stable state which, if allowed to occur, would affect the high level restored into the memory cells being written, as well as the equilibrate voltage resulting on the bit lines.
During a read operation, signal developed on the bit lines by the selected memory cell is immediately buffered by the local output line amplifier(s) before bit line sensing starts, and immediately starts to develop signal on the pair of global output lines. For certain embodiments, the differential signal propagates through lines and differential amplifiers to the output buffers, whose first stage is a latching amplifier which is then strobed to detect, amplify, and latch this signal. The timing of the strobe signal for this latching amplifier (which may be known as xe2x80x9ct4xe2x80x9d) may be optimized on a chip-by-chip basis. There may be, for example, eight possible strobe timings, from very aggressive to very relaxed. The device may be initially configured with an intermediate default strobe timing (e.g., having a value of xe2x80x9c4,xe2x80x9d where xe2x80x9c1xe2x80x9d is the most aggressive and xe2x80x9c8xe2x80x9d is the most relaxed), and during a special test mode (for example, at wafer sort) the strobe timing may be made more or less aggressive to determine the window of operation for each chip. Then, during the fuse blowing sequence for redundancy, timing fuses may be also blown to modify the default strobe timing. The timing is modified to be as aggressive as possible while maintaining adequate margins for reliability. For example, if in the test mode a t4 timing of xe2x80x9c2xe2x80x9d is the fastest timing for which a given device functions without error, then the device may be advantageously fuse programmed to a t4 timing of xe2x80x9c3xe2x80x9d or not altered to remain at xe2x80x9c4xe2x80x9d to ensure sufficient operating margin. At a later test, such as at final test of a packaged device, the test mode may again be entered, and the t4 timing advanced from its then fuse programmed setting to a more aggressive setting (e.g., 1 or 2 settings faster than its new programmed timing setting without needing to know the new programmed timing setting), in order to further verify adequate operating margins on a chip-by-chip basis, independent of which actual timing setting was fuse programmed into the device.
In an alternative embodiment of a memory array having a cycle time which is long compared to its read access time, a latching global output line amplifier may be strobed (at what was time t4 in the earlier embodiment) to detect and amplify the signal on the pair of global output lines, and communicate the sensed data onward through output multiplexer circuitry and ultimately (if the particular global output line is selected) to output buffer circuitry. The timing of the global output line amplifier may be selected to support both a flow-through configuration as well as a pipelined configuration. To support a fast flow-through access time specification, the latching global output amplifier is aggressively strobed as soon as a predetermined amount of signal has developed on the global output lines. In this way, the data propagates to and is available at the outputs as quickly as possible. But with this aggressive timing, some devices may fail. Conversely, when in the pipelined mode of operation, the global output latch timing is relaxed to more closely coincide with the global output signal peak, and the sensed data is provided to the output buffers for driving to the output pins during the next cycle (using a PLL or delay-locked loop). By affording additional time for even more signal to develop on the global output lines, a particular device which may be marginal or may even fail at the fast t4 timing of the flow-through mode may prove to have adequate margin at the more relaxed timing of the pipelined mode, and may be sold for use and guaranteed to operate only in the pipelined mode of operation.
Bit line crossover structures are advantageously used to achieve lower worst case coupling, during both read or write operations, onto a particular bit line pair from neighboring bit lines on either side. Because photolithographic guard cells are used at the edges of each arrayed group of memory cells, there is a layout area penalty in providing crossover structures including the required guard cells on either side of each crossover structure. To reduce this area penalty, a novel crossover arrangement is employed, for certain embodiments, which provides a significant degree of noise (i.e., coupling) reduction while requiring only one crossover. Within each array block, each complementary pair of bit lines runs vertically from the top to the bottom of the array block. The true bit line and complement bit line of a first pair run adjacent to each other from the top to the bottom of the array block without any crossovers. The true bit line and complement bit line of a second pair do not run adjacent to each other, but instead straddle the first pair (i.e., both true and complement bit lines of the first pair lie between the true and complement bit lines of the second pair), with a single crossover half-way down the second bit line pair (vertically in the middle of the array block). This crossover arrangement repeats horizontally throughout each array block in groups of two pairs of bit lines (four physical bit line wires). By using this crossover arrangement, only four groups of guard cells are required in each array blockxe2x80x94one each at the top and bottom of the array block, and one each at the top and bottom of the single crossover structure located in the vertical center of the array block.
The address and data for a write cycle are queued to eliminate dead cycles on the system data bus. In the exemplary embodiment operated in the pipelined mode, the address for a read cycle is strobed during one cycle, and the corresponding data read from the selected memory cells is driven onto the external data pins during a subsequent cycle. If an external write cycle follows immediately after an external read cycle, the write address may be presented to the address bus and strobed into the memory device just like for a read cycle, but the external bidirectional data bus is occupied with driving the data out corresponding to an earlier external read cycle (bey a number of cycles depending on the pipeline latency for a particular embodiment) and cannot be used to present the corresponding write data. Instead, the data for the external write cycle is driven onto the data bus and presented to the device during the cycle in which output data would have appeared had the cycle been an external read cycle instead of an external write cycle. In this way, the address bus and the data bus are used every cycle, with no wasted cycles for either bus. Both the write address and data are queued, the actual write operation to physically store the write data into the selected memory cells is postponed until a subsequent write cycle, which then, when executed, retires the previously received address and data from the write queue into the memory array. Read bypass circuitry is provided which allows data corresponding to the address of the read cycle to be correctly read from the write queue whenever an earlier queued write directed to that same address has not yet been retired.
In the exemplary embodiment, the internal data path is twice as wide (i.e., a xe2x80x9cdouble wordxe2x80x9d) as the external I/O word width (i.e., the least significant address bit selects one of the two possible 36-bit words), and a significant degree of internal power consumption is saved by merging external write cycles when sequential write addresses occur. The address of a given external write cycle is stored and compared to the address of the next external write cycle. If the selected memory cells to be written in both external write cycles correspond to the same physical word line and the same column within the same array block of the same memory bank (i.e., differ in only the least significant address bit), the internal write operation which would otherwise follow from the first external write cycle is delayed, and the data to be written is queued and merged with the data to be written in the second external write cycle. The write queue then xe2x80x9cretiresxe2x80x9d both queued write requests by performing a single internal write operation, simultaneously writing both data words received in the first and second external write cycles. If the internal data path were wider than 72-bits, then more than two 36-bit write cycles could be merged into a single internal write operation. For example, if the internal data path were 144-bits wide, then four 36-bit write cycles could conceivably be merged into a single internal write operation.
The exemplary embodiment includes a burst mode of operation which provides, during subsequent cycles, read or write access to sequential addressed memory cells relative to a received (i.e., xe2x80x9cloadxe2x80x9d) address, without requiring such sequential addresses be presented to the device. Using the 72-bit wide (double word) organization of each memory bank, two 36-bit words are retrieved from the memory array in the first cycle. The second word is saved to present to the data outputs after the first word is output. Because the exemplary device is organized into separate memory banks, a burst of four sequential words may transcend the address boundaries between memory banks. Consequently, the exemplary device includes provision for automatically initiating a load cycle in another memory bank during a burst cycle.
In certain embodiments, a dynamic memory array using the architecture and supporting circuits described above achieves random access cycles (each requiring a new random row access) at a sustained rate in excess of 200 MHz operation, even when each new row access is within the same array block of the same memory bank.
The present invention may be better understood, and its numerous objects, features, and advantages made even more apparent to those skilled in the art by referencing the detailed description and accompanying drawings of the embodiments described below.