This invention relates generally to all 2D and 3D NAND array architecture circuits. In particular, the present invention provides HiNAND array architecture circuits that include several preferred new circuits such as Segments and Groups into the NAND array along with a feature of circuit migration from conventional one-Block-one-row selection of Page Buffer, Sense Amplifiers, and Block-decoders in State-machine design to multiple Programs and Reads in Multiple-Block-Multiple-Rows in different Planes.
Nonvolatile memory (NVM) is well known in the art which provides the in-system or in-circuit repeatedly electrically programmable and erasable functions. So far, NVMs include three major standalone types such as EEPROM, NOR, and NAND Flash memory and one embedded type Flash (eFlash) memory. All above four NVMs are based on varied technologies.
The EEPROM is suitable for the Byte-alterable Data storage with the highest density below 4 Mb at 0.13 um node. The NOR flash is suitable for the block-alterable Code storage with the highest density below 8 Gb at 45 nm node. The eFlash is suitable for the page-alterable Code storages with the highest density below 64 Mb at 65 nm node. Lastly, NAND flash is suitable for the Segment-alterable Data storage with the highest density below 256 Gb at 19 nm node in MLC storage.
Currently, NAND flash memory has achieved the highest scalability, density and smallest feature of 1×nm node since 2012. The mainstream standalone NAND in mass production is mainly based on 2-poly floating-gate NMOS device, which employs 20V but the extremely low current FN channel-erase and FN channel-program schemes.
The NAND flash cell array comprises a plurality of NAND Strings that are organized in a matrix as a Plane with a plurality of rows and columns. Each NAND String is further comprised of a plurality of NMOS NAND cells connected in series sandwiched by two NMOS 1-poly String-select transistors, for example, MS located on top of the String and MG on bottom of the String. The number of NAND flash cells in one String can be made of 8, 16, 32, 64, 128 or arbitrary integer number, depending on NAND density requirement and applications. Each NAND cell has several different types of storages that include SLC (1 bit per cell), MLC (2 bits per cell), TLC (3 bits per cell), XLC (4 bits per cell) and even analog storage that stores more than 4 bits per NAND cell.
Today, a typical extremely high-density, nGb, NAND flash array architecture is comprised of a plurality of NAND Planes cascaded in rows in X-direction and columns in Y-direction. The number of rows and columns of each NAND Plane can be 2, 4 or 8 or more and is optimally determined by the trade-off of the chip layout and performance.
Each NAND Plane is further comprised of a plurality of NAND Blocks that are then physically cascaded one-by-one in the Y-direction and each NAND Block is further comprised of a plurality of NAND Strings cascaded in a row in the X-direction. Each NAND String includes a plurality of NAND cells, for example, M cells connected in series and sandwiched by one top String-select transistor and one bottom String-selected transistor. The value of M can be 8, 6, 32, 64, 128 or any arbitrary integer number, depending on the NAND specs and applications. The numbers of optimal Planes, Rows, Blocks and Strings are fully determined by the trade-off of the design factors such as the optimal chip size, chip performance, design features and reliability concerns of the NAND flash memory.
In the exemplary case of 1-row and 2-plane NAND flash memory, the main NAND Plane-decoder is preferably placed in the middle of the NAND array between left and right NAND Planes. The Block-decoder can be flexibly placed in the middle of the NAND array between two horizontal NAND Planes in one row so that each Block-decoder's multiple outputs can be used to drive the multiple selected word lines (WLs) of one selected corresponding NAND Strings placed either in left or right Plane.
In optimal layout, two big independent PBs (Page-Buffers) and SA (Sense Amplifier) circuit blocks are physically placed right on top of left and right NAND Planes across whole NAND array in the X-direction. The PB may include multiple latches with inputs and outputs to store the data read from the corresponding bit lines (BLs) of NAND flash cells or from the external data lines.
For the array organization of a 2-plane, 1-row NAND flash memory with a condition that only one Plane can be selected at a time for Read, Program, Program-Verify and Erase-Verification, then only group of Blocks are selected either from the left or right NAND Plane. If the array design allows two NAND planes to be selected simultaneously, then two groups of Blocks of both left and right Planes can be selected with 2-fold faster speed of Read and Program operations.
For the array organization of a 1-plane, 1-row NAND flash memory, then the Block-decoder is preferably placed in one end of the NAND array. In such a layout arrangement, the Block-decoder's multiple outputs can be used to drive the multiple selected WLs in the selected corresponding Strings of the select Block of NAND memory.
There exist other NAND array organizations such as N×M matrix of N rows and M Planes. Nevertheless, unless each Plane has its own PB circuit, multiple Blocks in different NAND Planes in different rows cannot be selected because PB and BLs are shared by all NAND Blocks cascaded vertically in the Y-direction. The operation of the selected Blocks in the same row of the selected NAND Plane has to be done sequentially one by one to avoid the data contention in BLs and PB.
Now, key Program operation of a conventional NAND is explained below via FIG. 1 and FIG. 2. FIG. 1 shows a typical NAND array with one portion of Block and one Sense Amplifier (SA) shared by one paired NAND Strings such as one Odd String with its drain node coupled to BLo metal bit line and one Even String with its drain node coupled to another BLe metal line. The whole NAND Block memory comprises a plurality of pairs of BLe and BLo (although only one pair of BLe and BLo is shown). In this example, the SA contains one Sensing and Precharging circuit and one Latch circuit for SLC Program and Read operation.
This NAND array has one metal line (metal0) for common source line (CSL) and another metal line (metal1) with an x-pitch size of 2λ for both BLe and BLo. The BLe and BLo are like GBL (global bit line) running from NAND array top and are connected the outputs of PB to the array bottom without being divided into a plurality of divided-BLs such as local bit lines (LBLs). In other words, along BL or a column in Y-direction in the array layout, the NAND array is made of a single metal1 NAND array.
In each BLo or BLe, it directly connects to a plurality of NAND Strings. Each NAND String, in this example, comprises 32 2-poly NMOS NAND cells connected in series sandwiched by one top 1-poly NMOS String-BL-select transistor MSe in BLe or MSo in BLo, gated by a common signal of SLL, and one bottom 1-poly NMOS String-SL-select transistor, MG1 in BLe or MG2 in BLo, gated by another common signal of GSL. The 32 NAND gates of each String are connected to 32 WLs such as WL[1] to WL[32].
Besides the NAND array, one sensing Latch circuit per each pair of BLe and BLo with PRESET, PLOAD, and PBLCH control signals for Program-Verify function are also shown in FIG. 1. Since one pair of BLo and BLe shares one sensing Latch circuit comprising two inverters INV1 and INV2, thus only one NAND String in either BLo or BLe is selected for Read operation in this NAND array. Therefore this conventional NAND array and sensing Latch circuit do not offer ALL-BL Read. In other words, to read a whole physical page requires two sub-steps to read either BLe group first and then BLo group later or vise versa.
Furthermore, in this conventional NAND array, only two Strings are shown with one pair of BLo and BLe. In fact, a full NAND array includes up to 4 KB pairs of BLo and BLe lines per WL or per physical page with a 8 KB size. Similarly, there are pluralities of NAND Strings in each BLo and BLe. The number of NAND Strings is subject to the required NAND density.
The so-called All-BL Program operation means that the Program size is one physical page and is performed in 1-cycle. But an Odd/Even page Program operation means that the Program operation is performed in unit of a logic page which is half of whole physical page. The whole physical Program operation needs a 2-cycle Read operation of two half-page Program operations. Programming bias conditions are summarized below:                a) Selected Flash cells' gate voltage WL in selected page to Vpgm ranging from 15V to 25V with Incremental Step Pulse Programming (ISPP) scheme and ΔVpgm ranging from 0.15V to 0.2V for MLC-type and TLC-type storage.        b) Selected Flash cell's channel voltage to 0V. This 0V is coupled from the corresponding bit data=0 in Page Buffer. The 0V is coupled to the NAND cells of the selected WL through a NMOS BL-Select transistor that is turned on in a conduction state. The advantage of Program BL=0V lies in that no BL precharge current is required.        c) Unselected Flash cells' channel voltage VInhibit≧7V for Program-Inhibit operation. This VInhibit voltage is generated by WL-gate coupling effect to boost the initial floating channel voltage of Vdd-Vt of bit data=1 in Page Buffer to 7V of unselected NAND cells in the same selected page or WL. This is referred as a Self-Boosting (SB) effect. The disadvantage Program-Inhibit BL=Vdd lies in that multiple high BL precharged currents are required because it needs to change the selected BL to Vdd.        d) NAND Program scheme: A low current FN channel tunneling effect to increase NAND cell's Vt from E state (erased state) to three program states such as A, B, or C state for a MLC storage.        e) Program-Inhibit voltage generation methods include SB, LSB and EASB.        
In a typical NAND Program operation, a high step-rising program voltage, Vpgm, ranging from 15V to 25V, is applied to one selected WL[m], but a Vpass(program) voltage of around 10V is applied to the rest of 31 (assuming total 32 WLs in each Block) non-selected WLs in the selected Strings along with the gate of bottom String-select transistor connected to Vss and the gate of top String-select transistor connected to Vdd.
As a result, 31 NAND cells in same String are in conduction-state while the String's bit line is grounded. The plurality of electrons from the selected NAND cells' channels are injected into the floating gate layer, Poly1, and NAND cells' threshold voltage, Vt, are raised from an erased Vt0 at E-state with a negative value to a desired positive value of Vt1 referred to a first programmed state, A-state.
More information about the programming methods can be found in U.S. Pat. No. 6,859,397, titled “Source Side Boosting Technique for Non-volatile Memory;” and U.S. Pat. No. 6,917,542, titled “Detecting Over Programmed Memory;” and U.S. Pat. No. 6,888,758, titled “Programming Non-Volatile Memory.”
In many cases, Vpgm pulse is applied to the selected WL[m] of NAND associated with several MHV pass-WL voltages such as Vpass(program) voltages, Vpass1, Vpass2, and others, applied to the non-selected WL[m−1] and WL[m+1] and the rest of WL[m] in the selected NAND Strings of the selected Blocks.
A series of Vpgm pulses (referred to as the programming gate pulses), with the magnitude of the pulses increasing are applied to WL[m]. Between each rising-step Vpgm pulse, a set of single or multiple Program Verify pulses like Read operation are performed to determine whether the selected NAND cells(s) in the selected page or WL are being programmed into the desired programmed Vtn values. The programmed Vtn values are determined by the type of storages such as SLC (1-bit per cell), MLC (2-bit per cell), TLC (3-bit per cell), XLC (4-bit per cell) or analog storage (more than 4-bit per cell).
Since Program-Verify operation is like the regular Read operation, the previously mentioned BL-precharge cycle and discharge cycle would be the same. Therefore, during each Program-Verify cycle, a NAND flash memory has to precharge all long BLs' large capacitance from Vss to VBL as described before. As a result, a large BL precharge current occurs and the large Vpass(read) 6V WL disturbance will be induced on NAND cell. In addition, Program-Verify cycle also has a long latency as Read due to the discharge process starts from a high value of VBL, which ranges from 0.8V to Vdd in today's NAND design.
If any of the selected NAND cells have reached their targeted programmed Vts as determined in Program-Verify step, then the further programs have to be stopped on those NAND cells to avoid over-programming into a next higher wrong Vt state. For those NAND cells' Vts that do not reach the desired value after Program-Verify operation, then the Vpgm pulses continue applying to those NAND cells in the selected page or WL associated with Vpass voltage of 10V or other HV to the non-selected WLs. If the desired Vts are not reached, then the programming and verify pulses would be repeatedly applied to those cells. Until all NAND cells in the selected page have been programmed successfully into the desired Vt states, then the Program and Program-Verify operations of the selected page would be stopped. The Program and Program-Verify operations would be continued on those remaining pages in the preferred sequence from String bottom to the String top in the selected Strings of the selected Blocks of the NAND memory. As the Program and Program-Verify operations repeat, the BL precharge current and Vpass WL-induced disturbance will be multiplied.
Typically, each NAND string physically comprises 16, 32, 64, or even 128 WLs. The MLC page number is doubled to SLC page number, TLC density is tripled, and XLC density is quadrupled.
A multi-state NAND memory device stores multiple bits of data per NAND cell by differentiating multiple distinct valid Vtn distributions separated by some preferred forbidden ranges such as ΔVtn. Each distinct Vtn has a distribution between Vtnmax and Vtnmin. Each ΔVtn is defined to be a value of Vtnmin of a higher-level state minus the Vtnmax of a lower-level Vtn state. Each Vtn is defined corresponding to a predetermined value for the set of data bits encoded in NAND device. As the number of bits of data per NAND cell is increased from SLC to MLC, TLC, and XLC, the number of valid Vtn states increases from 2 to 4, 8 and 16. As a result, the NAND data capacity is drastically increased, thus the die cost is greatly reduced.
There is a tradeoff. When each NAND cell storage capacity is programmed to increase, however, the programming time also increases and NAND cell's data reliability greatly degrades accordingly. In some applications, the increased programming time and the lower data reliability cannot be accepted.
Below, the conventional NAND Read and Program-Verify operations will be examined in term of Read disturbance, Read cycle, Read current and Read latency. FIG. 2 shows some typical time lines of some key control signals for properly operating the conventional NAND array as seen in FIG. 1. These key control signals include BLSHF, PBRST, PLOAD, SO & BLe or SO & BLo, PBLCH, Node A, WL (Selected one) and WLs (unselected 31 ones), etc for operating BL precharge and discharge, charge up of one selected WL and 31 unselected WLs, and proper control sequences for NAND data sensing and latch function for SLC Read. For each Read operation, a predetermined VRD voltage is applied to the selected WL and the a WL-pass voltage Vpass ranging from 5 to 7V is applied to the unselected N−1 WLs to turn the N−1 NAND cells into the conduction state so that the On state or Off state of the selected NAND cells can be accurately distinguished. The single VRD value of 0V is used for a SLC Read. But three distinct VRD values of 0V, 1.5V, and 3V are for a MLC Read and 7 distinct VRD values are used for TLC Read and 15 distinct VRD values are used for XLC Read.
Since each SLC Read from NAND String, all the non-selected cells in the non-selected WLs or pages suffer Vpass WL disturbance. For MLC Read, it will suffer 3 times Vpass WL disturbance, for TLC Read it will suffer 7 times Vpass WL disturbance, and for XLC Read it will suffer 15 times Vpass WL disturbance. As a result, the Vpass WL disturbance becomes more severe issue in NAND memory with higher storage compression. In addition, each Read of NAND programmed states of A, B and C would consume one high BLn precharge current.
Today, the averaged Read latency is 20 μS per page but Program latency is 200 μS for SLC Program and 600 μS is for MLC Program. Both Read and Program operations can only be performed in unit of whole physical page in one cycle or two-cycle Read for Odd and Even logic pages. All this Read and Program specs have not changed for 25 years. But when NAND technology is scaled down below 2×nm and the density being increased above 256 Gb, the above slow Read and Program latency becomes unacceptable for fast memory system applications. In addition, the high-power consumption and low P/E and Read cycles are getting the concerns.
As a result, it is highly desired to reduce Read and Program latency and power consumption and to increase the NAND reliability, P/E and Read cycles so that the less-sophisticated Error Correction Coding (ECC), DSP and Flash management tolls of Flash controller can be used at a lower cost. As an attempt to improve in this aspect, the present invention provides a HiNAND array adopting multiple-level BL architecture and a Non-Self-Boosting-Program-Inhibit method (Non-SBPI) along with other circuits such as Multiplier and XOR-Comparator to achieve the faster multiple-WL and All-BL Program and Read operations.