This invention pertains to the field of semiconductor non-volatile data storage system architectures and their methods of operation, and, in particular, relates to program verify methods.
A number of architectures are used for non-volatile memories. A NOR array of one design has its memory cells connected between adjacent bit (column) lines and control gates connected to word (row) lines. The individual cells contain either one floating gate transistor, with or without a select transistor formed in series with it, or two floating gate transistors separated by a single select transistor. Examples of such arrays and their use in storage systems are given in the following U.S. patents and pending applications of SanDisk Corporation that are incorporated herein in their entirety by this reference: U.S. Pat. Nos. 5,095,344, 5,172,338, 5,602,987, 5,663,901, 5,430,859, 5,657,332, 5,712,180, 5,890,192, 6,103,573, 6,151,248, and 6,426,893 and Ser. No. 09/667,344, filed Sep. 22, 2000.
A NAND array of one design has a number of memory cells, such as 8, 16 or even 32, connected in series string between a bit line and a reference potential through select transistors at either end. Word lines are connected to corresponding control gates of cells across multiple such different series strings. Relevant examples of such arrays and their operation are given in the following U.S. patent application Ser. No. 09/893,277, filed Jun. 27, 2001, that is also hereby incorporated by reference, and references contained therein.
When writing multi-state per storage element data into a non-volatile memory, such as flash electrically erasable and programmable read-only memories (EEPROMs), the write, or programming operation, is typically designed to move a targeted population of storage elements progressively through a series of data states until each element reaches its desired state. This is done by incrementally changing the state of the storage elements, sensing a parameter indicative of this state in a verify process, and further changing the state of those cells that have not yet verified as being in their desired final or target state. In a EEPROM, this typically consists of increasing threshold voltage (Vth) levels (starting from the erased or 0 state), using a sequentially increasing steering voltage step (e.g. staircase) implementation for each subsequent programming pulse. As each storage element passes through its to-be-written Vth data state target, it becomes locked out during the corresponding state verify operation, terminating all subsequent writing to the associated storage elements for the duration of that write session.
A verify operation is a sensing or read operation where the state of the storage unit is compared to its data-associated target value. For a binary storage unit there is only one data state aside from the ground state, while the multi-state case will have additional states. For example, consider the case where each storage element or cell stores a total of 3-bits or eight states. In a common cell array architecture, all cells in a write or sense group being simultaneously respectively written or read are tied to a common control, or steering, gate. In such an implementation, in order to read or verify cells over the range of possible states (e.g. states {0,1,2,3,4,5,6,7} for the 3-bit example) it is necessary to serially (e.g. sequentially) scan through all the steering voltage sensing conditions. In the example, there are seven such sequential sensing operations for the read operation. These are performed at the seven threshold voltage discrimination levels to simultaneously determine the stored state of the eight possible states for each cell within the full set of cells being read or verified. Using this sort of read operation as applied to the program/verify/lockout sequence, wherein each programming pulse is accompanied with a series of verify steps (along with the associated state conditional programming lockout), this verify set might also proceed sequentially through the full set of steering voltage target Vth levels (e.g. set of seven for eight state storage elements), associated with the corresponding set of programmable data states.
FIG. 1 illustrates the basic multi-state program/verify operation for the 8-state case in a flash type memory. Programming pulses, which include incrementally increasing steering or control gate program voltage levels, are interlaced with a 7-step verify sequence of increasing steering gate sensing voltage levels.
FIG. 2 expands this verify series in waveform 103 (also labeled B), labeling the 7 sequentially increasing verify pulses 1, 2, 3, 4, 5, 6, and 7. This example shows the results of such verification for a storage element in the (charged) state (i.e. sensed threshold or Vth level) between verify levels 3 and 4, as represented by dotted line 101 (also labeled A) representing either threshold voltage directly or another parameter (e.g. a current level) indicative of this state. The results of sensing at each of the verify levels in waveform for the sensing parameter (such as steering gate voltage) is captured by a sensing strobe, as represented by waveform 105 (also labeled C). The results of this strobed sensing verification is shown in waveform 107 (also labeled D). Whenever the verify level is lower than the stored charge level, this results in a “1” logic level pulse, as shown for the first three verify strobes, whereas when higher this results in a “0” logic level, as shown for the final four verify strobes.
In practice, using this full verify set for each step in the programming is overkill and wastes time (wherein typically each verify sensing operation takes about the same time as a programming pulse), since at any point in the programming progression there will be only a limited Vth range (or range of data states) over which the population of cells can span. Present designs exploit this characteristic by providing a limited, sliding range verify set implementation, as described in the following.
Using the progressive programming approach, there is a statistically well-behaved distribution of threshold voltages within a population of cells as they progress through the ascending states, starting with state 1, then to state 2 and so on up to state 7. To help explain the limited verify set concept, it helps to first disregard the data state conditional lockout; i.e. assume no lockout. Given this, an example of one Vth distribution scenario for this progression is described in the following snapshot. Starting from the erased state, the population of cells has been successively programmed to a point where a significant fraction of that population lies within the Vth range between states 4 and 5. In this scenario there are relatively few straggler that lie between states 3 and 4, and none with Vths below state 3. Likewise, there are relatively few cells racing ahead, with Vths between states 5 and 6 (i.e. reading as state 5) and none at states 6 and above. In such a scenario, it is pointless to perform the verify operations searching for states 1, 2, 6 or 7, since at this point the cells only exist in the Vth range spanning states 3, 4 or 5. Consequently the approach now in use reduces the range of Vth verify levels to span only that window range required to envelop the expected Vth range at that given point in the programming sequence. (e.g. in the above example, at this point in the programming sequence only three verifies are performed, spanning states 3, 4 and 5, in place of the full set of seven verifies.) As programming proceeds to higher threshold voltage ranges, the Vth verify window range is slid upwards, accordingly. In this way, the programming operation is speeded up substantially. For example in the case for which the time for each programming pulse is comparable to that for each verify step, this approach reduces the total write time in half, from the maximum 8 steps (i.e. 1 programming pulse plus 7 verifies) to 4 steps (1 programming pulse plus 3 verifies), doubling the raw write speed.
An example of this process is illustrated in FIG. 3. This is a schematic representation of which states would be checked at which stage in an exemplary programming process. This can be implemented through a look-up table maintained in the controller or other mechanism. In the table of FIG. 3, the multi-states verified after a given programming pulse are indicated by a checkmark at a corresponding point on the grid. For example, after the first two programming pulses, only the lowest state above ground (e.g. the 1 state) is checked, since it is likely none of the storage elements will have advanced to the 2 state this soon. After the third pulse, a verification of the 2 state is added, since at this point there may be cells arriving at the 2 state. The 3 state is similarly added to the verification list after the fifth pulse and so on. As any cell going to the 1 state is likely to have been programmed by the seventh pulse, the 1 state verify is dropped at this point. Similarly, the 2 state is dropped at the 11th pulse and so on.
Although this reduces the number of reads between programming pulses significantly compared to checking all the non-ground states (for example, 7 reads in the 3-bit example), there are a number of problems with using such a dead reckoning reduced scan, sliding window approach for higher write speed, while maintaining sufficient guard-band to the scan window to insure reliable write operation. These problems mainly relate to the determination of sufficient guard-band. Namely, how soon should each new state be brought in and when is it safe to drop out each state? The verify operation, as exemplified in FIG. 3, must safely cover the operation of the memory regardless of its operating conditions, such as temperature or voltage source, device age, or manufacturing differences associated with processing and other variations. Although performance is increased by checking less states during the verify operation, there must be enough checking to insure robust operation. Furthermore, although the performance improvement benefit does increase with the increasing number of states per storage element by using the above reduced scan, so does room for error, particularly considering the trend to lower operating voltages.
Returning to the cell-by-cell data state conditional lockout, essential to terminating further programming on each cell once its target data (Vth) state is achieved, this now must take place within the reduced window Vth scan. Since the remaining Vths are not checked, no lockout of their associated states is possible during that particular programming step. (e.g. In the above example, only cells with data states 3, 4 and 5 have the possibility of being locked out, whereas cells with data states 1, 2, 6, 7 cannot be so locked out during that specific programming/verify step). Therefore a critical requirement for this verify speed-up algorithm is that, at any time in the programming sequence, a sufficiently wide and properly positioned verify window range is established to cover the spread of the expected Vth distribution (excluding those cells already locked out).
In the case of inadequate verify span window, cells at both ends of the Vth distribution (i.e. both those which program too slowly or too quickly) may be missed when they in fact do achieve their proper Vth levels and require the programming lockout. This will inevitably lead to corresponding data state error (i.e. write failure), as those cells proceed to higher still Vth levels (never having been locked out in the case of the laggards, or having locked out too late, the likely fate for the speeders.) Consequently, the reduced Vth scan window algorithm (i.e. its window size and program step dependent placement) must be carefully tailored to achieve increased write speed without degrading write reliability.
An alternate existing approach for reducing the number of verify operations per programming pulse has been developed for a 2-bit per cell NAND architecture (whose four states, for referencing purposes, are designated here as 0, 1, 2, 3, in ascending Vth level), as described above in relation to U.S. patent application Ser. No. 09/893,277, filed Jun. 27, 2001, that was incorporated by reference above. One optional operating mode for this NAND implementation logic treats each storage unit as having multiple sector addresses, each address storing one of the two bits of the storage unit, rather than a single storage unit storing multiple bits within one sector address. In the case in which the higher two Vth states (2, 3) are to be programmed up from the lower two Vth states (0, 1) the operation goes as follows: Cells targeted to both states 2, 3 are first programmed and locked out to the lower Vth of those two higher states (i.e. state 2). This is accomplished using only a single verify-2 operation following each programming pulses, locking out further programming of both 2s and 3s as they pass that verify-2 level. Once all 2s and 3s have so locked out, the 3s are then automatically unlocked, and the programming sequence restarted on those 3s, but now with the single verify operation set at the verify-3 level. A variation begins with a 2s only verification during the concurrent programming of the 2 and 3 states. The 3 state's verification is added after a predetermined number of programming pulses, with the 2s verify eventually dropped out to leave only the 3s verify from then until completion. Various aspects of this process are discussed more in U.S. Pat. No. 5,920,507, which is hereby incorporated by reference.
This approach could be extended to greater levels of multi-state storage (e.g. storing 8 states per storage element), by locking all cells targeted for a Vth equal to or greater than a target Vth level (i.e. state), using a single verify at that targeted Vth level. Once all cells are so locked out, the operation is repeated for the cells targeted at the next higher Vt state or beyond, repeating this loop until those cells targeted for the highest data state pass their corresponding verify target.
Using this approach, only a single verify pulse is required with each programming pulse operation, a definite plus in systems whose verify times dominate those of programming, thereby offering an optimal write performance solution. However for systems whose single pulse programming times are comparable to those of single verifies, typical of existing mass storage FLASH memories, the above approach actually reduces write performance, for two reasons: (1) The programming progress of cells targeted for states above that being verified are stopped prematurely and unnecessarily, dictating additional programming time in subsequent Vth programming phases to make up for the progress lost by this early termination; (2) The initial programming conditions (e.g. steering, or control, gate voltage staircase starting level) upon resumption at the next higher state must be dropped back to a lower value from that left off at the end of the previous programming sequence. This drop-back is essential in order to insure that cells do not overshoot their target range, since the specific, appropriate level that the each cell of the population had previously locked out at (and from which corresponding level each cell should resume programming) can no longer be applied to the cells, as a population, in a single program starting condition. At best the starting condition needs to be reduced to that associated with the fastest programming cell (i.e. the programming voltage set for the first cell in the group to have locked out at), thereby increasing the required number of programming pulses for the remaining cells. For safety margin, the starting voltage should be reduced somewhat below that optimal level, increasing the number of programming pulses further still, degrading write performance. This approach also re-introduces the issue of coming up with a fixed (i.e. non-intelligent/adaptive) value (in this case for re-starting programming) which balances performance with reliable write. If pushed too aggressively in favor of increased write speed, this risks programming state overshoot, whereas if too conservative, write speed suffers.
In view of the limitations of existing program/verify approaches, the following section discusses an improved approach which can adaptively/dynamically satisfy this combined requirement of fast write performance while insuring write reliability.