Many types of non-volatile memories (NVM) are known in the art: EPROM, EEPROM, Flash (floating gate, SONOS, etc.), MRAM, and ReRAM amongst others. In general, flash memory is the most common commercially available type of NVM. Typically, the memory cells in these devices are a single transistor with either a floating (e.g., polysilicon) or charge-trapping (e.g., SONOS) gate fabricated between the control gate and the conduction channel. The amount of charge stored between the control gate and the channel determines the threshold voltage (Vt) of the transistor. In general, Vt (sometimes called the turn-on voltage) is the gate-to-source voltage where the transistor changes from operating in the subthreshold mode (partially on, i.e., somewhat conductive) to operating in the inversion mode (fully on, i.e., highly conductive). By setting Vt to different levels controlling the degree of conduction, one or more bits of data can be stored in the memory cell by changing the current at predetermined gate voltages.
NAND flash memory is the most common type of Flash NVM commercially available, being present in most cameras, cellphones, tablets, universal serial bus (USB) drives, and solid-state drives (SSD) to name a few. The demand for more and cheaper memory in these devices is a major driver of semiconductor technology, pushing NVM manufacturers towards ever finer geometries and the storage of more bits per memory cell to increase the density.
Unfortunately, this trend has a negative impact on the memory cells. Smaller geometries and the accompanying stochastic variations make the cells more delicate than in earlier generations, making them more vulnerable to program and read disturbs (e.g., loss of data), stress and wear from program and erase operations (e.g., data retention and endurance issues), and physical damage to the memory cells (e.g., charge trapping in gate oxides, increased leakage current in memory cells and bit lines, and possibly complete failure of a cell). The trend towards two or three bits per cell exacerbates the situation by attempting to sense smaller Vt differences in each memory cell.
FIG. 1A shows a conventional NAND flash memory integrated circuit 100. Integrated circuit 100 comprises a memory array 102 which contains the non-volatile memory cells. Memory array 102 is divided into one or more planes 104. A plane 104 is typically comprised of a plurality of blocks 106 each of which further comprises a plurality of pages 108. A page 108 is typically the smallest unit of the array which can be read or programmed in a single operation while a block is typically the smallest erasable unit in both plane 104 and memory array 102. When multiple planes 104 are present in integrated circuit 100, they are typically used to simplify multiple simultaneous operations. In some NAND memory integrated circuits, an even larger unit called a Logical Unit (not shown in FIG. 1A) can be present comprising multiple planes 104 as described in the Open NAND Flash Interface Specification, revision 4.0, by the ONFI Workgroup, Apr. 2, 2014, page 67. The entire Section 3 entitled Memory Organization on pages 67 to 80 of the ONFI specification contains a good introduction to the topic of array architecture and the entire ONFI 4.0 Specification is hereby included by reference herein in its entirety.
NAND flash memory integrated circuit 100 also comprises other circuitry. Each plane 104 will have one or more page buffers 110 associated with it. Current commercial NAND flash offerings often have two: one called the page buffer (or page register) and the other called the cache buffer (or cache register) to allow two simultaneous operations either within a plane (or chip if there is only one plane) or between planes. Each page buffer 110 is divided into a main area 112 and a spare area 114 (illustrated as separated with a dashed line). Each page 108 is also divided into a main area 116 and a spare area 118 (also illustrated as separated with a dashed line). The page buffer 110, its main area 112 and its spare area 114 typically contain substantially the same number of bits as contained in page 108, main area 116 and spare area 118 respectively. This allows the page buffer to be conveniently used as a staging area for either program or read operations. Thus data can be prepared for writing from the page buffer 110 to a page 108 in the memory array 102 during a program operation, or it can be prepared being read from page buffer 110 to external circuitry outside of integrated circuit 100 by means of multiplexer 120 and Data I/O Interface 122 during a read operation. Multiplexer 120 selects between the targeted plane 104 and any non-targeted planes, if any. Data I/O Interface 122 comprises the input and output buffers that allow integrated circuit 100 to communicate data to and from external circuitry. Details of the use of page and cache buffers as well as multiple planes can be found in the application note Improving NAND Throughput with Two-Plane and Cache Operations, Micronix International, AN0268, Rev. 1, Nov. 15, 2013, and is hereby included by reference herein in its entirety.
NAND flash memory integrated circuit 100 further comprises control logic 124 and control I/O interface 126. Control logic 124 is typically coupled to most of the internal circuity with hundreds or thousands of signal lines (not shown in FIG. 1A) and typically comprises a large and complex state machine implemented in ASIC-type standard cells. Instructions to control logic 124 are input through input buffers in control I/O interface 126 and status information is output through output buffers also contained in control I/O interface 126.
Sometimes present are special one-time programmable (OTP) blocks internal to memory array 102 that can be used by either the manufacturer or the end user like, for example, user OTP block 128 and factory OTP block 130. These can be used for a variety of purposes including storing serial numbers, design revisions, process and analog calibration information, and design data. Once programmed these blocks can be locked to prevent tampering with the OTP data.
NAND flash memory integrated circuit 100 further comprises error correction code (ECC) circuit 132 which is capable of encoding external write data in page buffer 110 for programming into a selected page 108 of array 102. ECC circuit 132 is also capable of decoding data read from page 108 into the page buffer 110 to detect and correct any errors that may have occurred since page 108 was programmed. The ECC code implemented is chosen by the manufacturer of integrated circuit 100.
Many other circuits (and thousands of related signal lines) are present in integrated circuit 100, but are not shown in FIG. 1A to avoid overly complicating the diagram. In addition to the usual memory access circuits like row and column address decoders, sense amplifiers, and the like, there are typically many analog circuits such as band gap references, operational amplifiers, digital-to-analog and analog-to-digital converters, and charge pumps present that control logic 124 uses to provide the needed voltages to memory array 102 to allow read, program (write), and erase operations.
FIG. 1B illustrates a NAND string 132 suitable for use in NAND memory integrated circuit 100. NAND string 132 comprises a series connection of N non-volatile transistors 134-0 through 134-(N−1)—where N is an integer. At each end of the string are access transistors 136 and 138. Typically N is a power of two like, for example, 32, 64 or 128, but it need not be. For example, a two-bit per cell NAND device might use one-bit per cell on the two outermost devices in a string, for reliability reasons since device mismatches are typically greater at an array edge. Thus to store 128 bits the string would require N=66 (e.g., 62*2 bits+4*1 bit=128 bits and thus N=62 devices+4 devices=66 devices).
Sometimes one or more dummy cells (physically present for process matching but not used to store data) are placed at the ends of the string between non-volatile transistor 134-0 and access transistor 136 and non-volatile transistor 134-(N−1) and access transistor 138. These dummy cells are not shown in FIG. 1B to avoid overly complicating the diagram.
Coupled to the gate of each of the non-volatile transistors 134-0 through 134-(N−1) are word lines 140-0 through 140-(N−1) respectively. These are used to address individual rows of non-volatile transistors 134-i across at least a portion of memory array 102. Running parallel to the word lines 140-0 through 140-(N−1) are select lines 142 and 144 as well as source line 146. Select lines 142 and 144 are coupled to the gates of access transistors 136 and 138 respectively and are used to access NAND string 132 for program and read operations by means of source line 146 and bit line 148. Bit line 148 runs perpendicular to word lines 140-0 through 140-(N−1), select lines 142 and 144 and source line 146.
FIG. 1C is a block diagram illustrating more details of an exemplary block 106 from a plane 104 in memory array 102. Block 106 comprises a plurality of NAND strings 132 arranged in rows and columns. The NAND strings 132 in a column are coupled together by sharing a single bit line 148 and each column of NAND strings 132 has its own bit line 148. All of the bit lines 148 are coupled to block read/program circuit 150 which contains the sense amplifiers, write drivers, column selectors and various other circuits which are well known in the art and will not be discussed to avoid unnecessarily complicating the disclosure. Further information can be found in the Micron Technology Technical Note TN-29-19, NAND Flash 101: An Introduction to NAND Flash and How to Design It In to Your Next Product, Rev. B, April 2010, which is hereby included by reference herein in its entirety.
Each row of NAND strings 132 are coupled together by all of the horizontal lines detailed in the discussion of FIG. 1B above. Thus each row shares word lines 140-0 through 140-(N−1)—which are combined together as the bus 140-[N−1:0] in the figure—as well as select lines 142 and 144 and source line 146. Various other circuits such as block select logic, row decoders, etc., which are well known in the art are not shown in FIG. 1C to avoid over complication.
Typically, an entire page 108 is stored in a single row of non-volatile transistors 134. Thus N pages 108 are stored in each row of NAND strings 132. Addressing a particular page 108 requires accessing a particular instance of rows of NAND strings 132 by means of select lines 142 and 144 and then choosing the correct word line from word line bus 140[N−1:0] for the desired page. The block 106 includes both main area 116 and spare area 118 (illustrated as separated with a dashed line), as previously discussed in conjunction with FIG. 1A.
As geometries shrink, error correction coding (ECC) of increasing strength is typically used to mitigate the various problems associated with the smaller transistors. Redundant information in the form of additional bits (ECC overhead bits) is added to the data to allow reconstruction of the original data in the case of errors. There is a trade-off between the number of bad bits the ECC can correct and the amount of additional ECC overhead bits added.
Many different ECC codes have been used in NAND flash memories (e.g., Hamming codes, Reed-Solomon codes, and Bose-Chaudhuri-Hocquenghem (BCH) codes), though the current trend appears to be towards using BCH codes due to their higher efficiency relative to Reed-Solomon codes and ability to correct an arbitrarily large number of incorrect bits with predictable overhead. A good introduction to the subject can be found in a Micron Technology white paper entitled ECC Options for Improving NAND Flash Memory Reliability, Rev. C, January 2012, by Marina Mariano (henceforth Mariano), hereby included by reference herein in its entirety.
BCH codes are linear block codes and can be systematically encoded. For a linear code the encoding and decoding can be calculated using binary polynomials and linear algebra. A block code comprises code words of equal length. Systematic encoding means that a resulting code word can be arranged so that the original data word can be recognized in the encoded word. This allows any characteristics of the original data to be used in other advantageous ways while retaining the benefits of using error correction. A further discussion on the characteristics of BHC codes can be found in Error Detection and Correction Using the BCH Code, Hank Wallace, Atlantic Quality Design, Inc., 2001, which is hereby included by reference herein in its entirety.
Table 200 in FIG. 2 shows some exemplary data from Mariano showing some of the tradeoffs in using BCH codes of varying degrees of strength (i.e., how many single-bit errors can be corrected in a data word). In general, the stronger the code the greater the cost in additional ECC overhead bits.
As discussed above, NAND flash memory devices typically have their memory arrays separated into two parts: a data area comprising a number of bytes of user data (typically 512, 2048 or 4096 bytes in a page) and a spare area for containing information including such management data such as the ECC overhead bits, program and erase history to enable wear leveling, bad block data if a failure occurs, user metadata, etc.
The data in table 200 assumes that a page is 2,048 bytes with a spare area of either 64 bytes or 112 bytes for a total of 2,112 or 2,160 bites per page respectively, though other values can be used. The ECC encoding is assumed to be implemented in 512-byte sectors (i.e., four sectors per page).
In table 200, column 202 indicates the number of bits to be corrected in each 512-byte sector, column 204 shows the BCH code overhead in ECC bits for that sector, and column 206 shows the number of bytes needed to contain that number of ECC bits. Columns 208 and 210 show the percentage of 64-byte and 112-byte spare areas, respectively, used by the needed ECC bits for all four sectors. For example, if four-bit ECC is desired the overhead is 52 bits per sector which can be packed into 7 bytes (with four bits unused). Since there are four sectors per page, a total of 28 bytes is required in the spare area. This amounts to 44% of a 64-byte spare area and 25% of a 112-byte spare area. A usage greater than 100% in either column 108 or column 110 means the combination of that bit error correction strength and that spare area size cannot be implemented in the available hardware. For example, using 10-bit ECC would require 17 bytes per sector requiring a total of 68 bytes per page which cannot be accommodated in a 64-byte spare area but would fit in a 112-byte spare area.
Table 220 in FIG. 2B illustrates an exemplary address map of page 108 comprising a main area of 2,048 bytes (=2 kilo-bytes or 2 KB) and a spare area of 64 bytes. Each row shows the relevant data for a particular field. For example, by far the largest field is the User Data shown in line 236 since it fills the entire main area. The 64 bytes of the spare area are shown broken into five different fields on lines 236 through 246. This is merely an example; many different address maps could be used wherein the size, order and presence of fields may differ.
In table 220, column 222 shows the reference numbers (234 through 244) for each line in the table while columns 224, 226, and 228 show the first byte of the field's address within the page 108, the last byte, and the size of the field respectively. Column 230 indicates if the field is ECC protected or not. Column 232 indicates if the field is in the main or spare area, and column 234 describes the contents of each field.
The largest field is the user data on line 236 which in this example occupies the entire main area. The second largest field is on line 238 and comprises the 32 bytes assigned for ECC overhead bits. In the example from Mariano in FIG. 2A with 4-bit ECC correction, 28 bytes would be needed to store the BCH overhead bits for 2 KB of data. Since a portion of the spare area may also be ECC protected, an additional four bytes are allocated for those additional ECC overhead bits raising the total number of bytes to 32.
Metadata is data that describes something about other data. Thus a card catalogue (or an electronic equivalent) contains metadata about (the data in) the books in a library. The front matter in each book such as the copyright notice, publication history, table of contents, etc. (and arguably even its title, author's name, and cover art) are also metadata with respect to the book's contents.
In the context of non-volatile memories, metadata is typically information stored in the spare area that is relevant to the primary data stored in the main area but may also be used for other things. For example, if the primary data is a continuous stream of measurements from a sensor, then associated metadata could contain such information as time stamps for the start of the measurements, the duration of the interval between measurements, information about how to read or interpret the data (i.e., the length of a data word, etc.), and such. This can be thought of as user metadata because it pertains to data stored by or for the user of the memory.
Another type of metadata is factory metadata. This is typically data in the spare area that relates to the function of the NVM itself rather than the specific data in the main area of the associated page or block. Examples of this type of data would be, for example “bad block data” (e.g., warning the user that the block or page has been physically damaged and is unreliable), wear data (e.g., recording the number of program/erase cycles a block (and by implication its pages) to allow wear leveling), etc.
Metadata can be protected by ECC or not as a matter of design choice. In the example illustrated in table 220, line 240 shows that 16 bytes are used for user metadata that is to be ECC protected and line 242 shows that eight bytes are used for non-ECC protected user metadata. The ECC overhead bits for the metadata 1 and metadata 2 areas will be included in the 32 bytes allotted for ECC overhead bits on line 238.
Lines 244 and 246 in the example in table 220 show four bytes each allocated for Factory Metadata and Reserved (e.g., unused or unallocated incase an additional field is needed at some time in the future). Reserved data is typically kept in the erased state which in flash technology is defined as each bit in each byte set to logic-1.
Preferred state encoding (PSE) is known in the art. The technique is used to encode data to statistically favor either logic-1 (typically the logic state with a higher voltage) or logic-0 (typically the logic state with a lower voltage) to achieve an advantage that is a function of the sensitivity of underlying hardware to the effects of the higher or lower voltages used to define the binary data. Typically the cost is the addition of an inversion bit per data word encoded to track whether the data in that particular word is inverted or not.
Referring to FIG. 3, table 300 illustrates an example of encoding a four-bit initial data word into a five-bit encoded data word favoring logic-0. The four columns 302 contain the 16 possible binary values for the four-bit initial data word. Column 304 shows the weight of the initial data word (i.e., the number of logic-1 bits present) which is obtained by adding the bits in the four columns 302 in each row. The average weight of all 16 four-bit data words is 2.00 as shown at the bottom of column 304.
The five columns 306 show the resulting five-bit encoded data word and column 308 shows the weights of the encoded data words with the average (1.56) shown at the bottom. Note that the values in columns E<3:0> are identical to columns D<3:0> when E<4>=logic-0 and are inverted when E<4>=logic-1. Thus bit E<4> is known as the inversion bit for the PSE encoded data E<4:0>. The ratio of the average weight of the encoded data words to the weight of the initial data words is shown at the bottom of column 310 to be 0.78 or 78%. Alternatively this can be thought of as a 22% statistical improvement in whatever condition motivates the preferred state encoding. Note that the average weight per word drops even though there are 25% more bits stored in each word relative to the initial data.
Referring to FIG. 4, table 400 shows the efficiency of preferred state encoding for initial data words of 4, 8, 16, 32, 64 and 128 bits. While a data word of arbitrary length could be encoded, the word lengths corresponding to even powers of two are typically of the most interest in the memory arts. Column 402 shows the number of initial bits per word, column 404 shows the improvement in the average weight of the preferred state encoded words, column 406 shows percentage improvement, and column 408 shows the percentage area cost of adding an additional bit. In general, the shorter the initial data word, the higher the percentage cost of the additional inversion bit and the more effective the PSE encoding.
In a well-known example from the DRAM art, the JEDEC standard for DDR4 SDRAM (JESD79-4, September 2012) allows the user the option of programming the meaning of the data mask bits to be the inverting bit of a preferred state encoding scheme favoring logic-1. DDR4 data bits are implemented in a pseudo-open drain fashion. Thus a data bit in the logic-1 state will draw no power after transitioning high, while a logic-0 state will still draw power after transitioning low. This allows the user to save power at the cost of losing the masking function.
In another DRAM example, U.S. Pat. No. 9,007,866 to Fisch et al, teaches using preferred state encoding favoring logic-0 inside a DRAM array to minimize array leakage current, reduce refresh frequency, and improve reliability since logic-1 bits (higher voltage) lose more charge to leakage than logic-0 bits (lower voltage). The cost is an extra data bit per encoded word in the memory array.
It will be appreciated by one skilled in the art that the preference for logic-1 or logic-0 depends on the underlying circuitry and that it may be advantageous to use different preferred state encoding in different parts of a system. It will also be appreciated that the advantages are statistical when amortized over the entirety of the data stored.
Preferred state encoding is also known in the non-volatile memory (NVM) art. In U.S. Pat. Nos. 7,525,864 and 7,990,796 to Brown, preferred state encoding is used to reduce the sense current required by preferentially storing data in the state requiring the least sense current during a read.
In U.S. Pat. No. 7,518,922 to Maejima et al, (henceforth Maejima) the preferred state technique is used to minimize the current necessary to charge the bit lines in various operations in a NAND flash part using both single level cells (SLC—one bit stored per cell) and multi-level cells (MLC—two bits stored per cell). Maejima further teaches that the encoding function can be performed in the NAND flash integrated circuit itself or in a memory controller.
U.S. Pat. No. 8,014,196 to Graef teaches that NVM devices employing MLCs can reduce the total programming energy by increasing the number bits needed to hold the data and only partially programming the MLCs.
In U.S. Pat. No. 8,756,464 to Kim et al, preferred state encoding is used for wear leveling the memory array by choosing the preferred logic state based on the stress to the memory cell during programming. This has the effect of extending the life of the memory by statistically increasing the number of program/erase cycles before the NVM wears out.
Unfortunately, each of these approaches only addresses a single issue and thus lacks flexibility. Further, they represent design choices for a particular use of preferred state encoding made by the NVM manufacturer based on assumptions of the best way for customers to use their NMV products. In practice this may or may not be optimal for any particular system designed by a user of the NVM integrated circuit or module.