Solid-state memory capable of nonvolatile storage of charge, particularly in the form of EEPROM and flash EEPROM packaged as a small form factor card, has become the storage of choice in a variety of mobile and handheld devices, notably information appliances and consumer electronics products. Unlike RAM (random access memory) that is also solid-state memory, flash memory is non-volatile, and retaining its stored data even after power is turned off. Also, unlike ROM (read only memory), flash memory is rewritable similar to a disk storage device. In spite of the higher cost, flash memory is increasingly being used in mass storage applications. More recently, flash memory in the form of solid-state disks (“SSD”) is beginning to replace hard disks in portable computers as well as in fixed location installations. Conventional mass storage, based on rotating magnetic medium such as hard drives and floppy disks, is unsuitable for the mobile and handheld environment. This is because disk drives tend to be bulky, are prone to mechanical failure and have high latency and high power requirements. These undesirable attributes make disk-based storage impractical in most mobile and portable applications. On the other hand, flash memory, both embedded and in the form of a removable card or SSD are ideally suited in the mobile and handheld environment because of its small size, low power consumption, high speed and high reliability features.
Flash EEPROM is similar to EEPROM (electrically erasable and programmable read-only memory) in that it is a non-volatile memory that can be erased and have new data written or “programmed” into their memory cells. Both utilize a floating (unconnected) conductive gate, in a field effect transistor structure, positioned over a channel region in a semiconductor substrate, between source and drain regions. A control gate is then provided over the floating gate. The threshold voltage characteristic of the transistor is controlled by the amount of charge that is retained on the floating gate. That is, for a given level of charge on the floating gate, there is a corresponding voltage (threshold) that must be applied to the control gate before the transistor is turned “on” to permit conduction between its source and drain regions. In particular, flash memory such as Flash EEPROM allows entire blocks of memory cells to be erased at the same time.
The floating gate can hold a range of charges and therefore can be programmed to any threshold voltage level within a threshold voltage window. The size of the threshold voltage window is delimited by the minimum and maximum threshold levels of the device, which in turn correspond to the range of the charges that can be programmed onto the floating gate. The threshold window generally depends on the memory device's characteristics, operating conditions and history. Each distinct, resolvable threshold voltage level range within the window may, in principle, be used to designate a definite memory state of the cell.
Current commercial products configure each storage element of a flash EEPROM array to store either a single bit of data or more than a single bit of data. A single-level-cell (SLC) memory has each cell storing a single bit of data by operating in a binary mode, where a single reference level differentiates between two ranges of threshold levels of each storage element.
The threshold levels of transistors correspond to ranges of charge levels stored on their storage elements. In addition to shrinking the size of the memory arrays, the trend is to further increase the density of data storage of such memory arrays by storing more than one bit of data in each storage element transistor. A multi-level-cell (MLC) memory has each cell storing more a single bit of data by operating in a multi-level mode, where two or more reference levels differentiates between more than two ranges of threshold levels of each storage element. For example, commercial flash memory products now operate in four states (2 bits of data per storage element) or eight states (3 bits of data per storage element) or 16 states per storage element (4 bits of data per storage element). Each storage element memory transistor has a certain total range (window) of threshold voltages in which it may practically be operated, and that range is divided into the number of states defined for it plus margins between the states to allow for them to be clearly differentiated from one another. Obviously, the more bits a memory cell is configured to store, the smaller is the margin of error it has to operate in.
The transistor serving as a memory cell is typically programmed to a “programmed” state by one of two mechanisms. In “hot electron injection,” a high voltage applied to the drain accelerates electrons across the substrate channel region. At the same time a high voltage applied to the control gate pulls the hot electrons through a thin gate dielectric onto the floating gate. In “tunneling injection,” a high voltage is applied to the control gate relative to the substrate. In this way, electrons are pulled from the substrate to the intervening floating gate. While the term “program” has been used historically to describe writing to a memory by injecting electrons to an initially erased charge storage unit of the memory cell so as to alter the memory state, it has now been used interchangeable with more common terms such as “write” or “record.”
The memory device may be erased by a number of mechanisms. For EEPROM, a memory cell is electrically erasable, by applying a high voltage to the substrate relative to the control gate so as to induce electrons in the floating gate to tunnel through a thin oxide to the substrate channel region (i.e., Fowler-Nordheim tunneling.) Typically, the EEPROM is erasable byte by byte. For flash EEPROM, the memory is electrically erasable either all at once or one or more minimum erasable blocks at a time, where a minimum erasable block may consist of one or more sectors and each sector may store 512 bytes or more of data.
The memory device typically comprises one or more memory chips that may be mounted on a card. Each memory chip comprises an array of memory cells supported by peripheral circuits such as decoders and erase, write and read circuits. The more sophisticated memory devices also come with a controller that performs intelligent and higher level memory operations and interfacing. More recently, the memory devices in the form of SSD are being offered commercially in the form factor of a standard hard drive.
There are many commercially successful non-volatile solid-state memory devices being used today. These memory devices may be flash EEPROM or may employ other types of nonvolatile memory cells. Examples of flash memory and systems and methods of manufacturing them are given in U.S. Pat. Nos. 5,070,032, 5,095,344, 5,315,541, 5,343,063, and 5,661,053, 5,313,421 and 6,222,762. In particular, flash memory devices with NAND string structures are described in U.S. Pat. Nos. 5,570,315, 5,903,495, 6,046,935.
Nonvolatile memory devices are also manufactured from memory cells with a dielectric layer for storing charge. Instead of the conductive floating gate elements described earlier, a dielectric layer is used. Such memory devices utilizing dielectric storage element have been described by Eitan et al., “NROM: A Novel Localized Trapping, 2-Bit Nonvolatile Memory Cell,” IEEE Electron Device Letters, vol. 21, no. 11, Nov. 2000, pp. 543-545. An ONO dielectric layer extends across the channel between source and drain diffusions. The charge for one data bit is localized in the dielectric layer adjacent to the drain, and the charge for the other data bit is localized in the dielectric layer adjacent to the source. For example, U.S. Pat. Nos. 5,768,192 and 6,011,725 disclose a nonvolatile memory cell having a trapping dielectric sandwiched between two silicon dioxide layers. Multi-state data storage is implemented by separately reading the binary states of the spatially separated charge storage regions within the dielectric.
Flash Memory Characteristics and Trends
Flash memory behaves quite differently from traditional disk storage or RAM. First, existing data stored in the flash memory cannot be updated by simply being overwritten. Each cell must first be erased before a new write can take place on it. Consequently the update is always written to a new free location. To improve performance, a group of cells are operated on in parallel to access data page by page. When a page of data is updated by having the updated page written to a new location, the superseded page is rendered invalid and obsolete and becomes garbage cluttering the storage and will eventually be cleaned out to free up the space it is occupying.
Managing the updates and discarding the invalid ones are complicated by the block structure of flash memory. it is relatively time consuming to erase flash memory and to improve erase performance, the memory is organized into erase blocks where a whole block of memory cells are erased together simultaneously. A block generally contains a number of pages. As data is stored in a block page by page, eventually some of that data becomes obsolete. This means the block will contain many garbage data taking up space. However, the block can only be erased as a unit and so before the garbage data can be erased with the block, the valid data in the block must first be salvaged and copied into another block. This operation is commonly referred to as garbage collection and is an overhead of the block structure of the flash memory. The larger the block, the more time is required for the garbage collection. Similarly, the more frequently the data in the block is being updated, the more frequently will the block need to be garbage collect. Garbage collection is preferably performed in the foreground like during a write operation. This obviously will degrade the write speed.
Early applications of flash memory have been mainly for storing media files such as music and video files for portable hosts. These files tend to be a long run of data of sequential logical addresses which fills up the memory block by block. These data are archival in nature and not subject to much updating. Thus, the block structure works well for these type of data and there is little performance hit during writing since there is seldom need for garbage collection. The orderly sequential-address nature of the data allows logical address range to be partitioned into logical groups, with each logical group aligned with an erase block in the sense that the data of a logical group will fit neatly in a block. In this way, the addressing granularity is mainly at the block level as a page with a given logical address can be located by which block is storing the logical group it belongs to. Since the logical group is stored in the block in a self-indexed manner with its logical addresses in sequential order, the page can be quickly located.
The block management system implementing logical groups typically deals with updates and non-sequential writes by tracking them at the page level. It budgets a predetermined amount of resource for the page level tracking which manifests has limiting the number of logical groups having non-sequential or obsolete data. Generally, when subject to updates, some of the orderly blocks will contain obsolete data and keeping track of them will also consume part of the resource. When over the budget, a selected block with non-sequential or obsolete data is restored back to an orderly block in sequential order. This is accomplished by rewriting into a new block in sequential order with the latest updates. However the relocation will exact performance hit. Such a system will work well if a host writes data that are conducive to maintaining mostly such orderly blocks being tracked at the block level, with only some random writes being tracked at the page level. Thus, by implementing logical groups aligned to block boundary, the address table is greatly simplified and reduced.
However, the block management system implementing logical groups will begin to be less optimized if the host writes mostly short and non-sequential data. This type of write pattern is prevalent in applications from a personal computer or smart mobile device. Solid-state disk (SSD) using flash memory is an attractive replacement for disk storage due to its low power, speed and ruggedness. Instead of long sequential writes, the flash memory must now deal mostly with short random writes. Initially, the performance will not suffer since as long as free space can be found, the data can be written there. However, with constant use and frequent updates, the predetermined resource for page tracking will eventually be exhausted. At that point, performance can take a big hit as the next write may have to be accompanied by a relocation of a block. The larger is the block the longer it will take to perform relocation of a block. Also a large block and short and non-sequential data will cause the logical group in the block to contain invalid data more frequently and consume page addressing resource faster and therefore cause relocation to take place more frequently.
The problem with the large block size cannot be easily solved by simply reducing the block size as the block size tend to increase geometrically with each new generation of memory technology. With higher integration of circuits more memory cells are being fitted in the same die. The block size, measure in columns and rows increases geometrically. This is especially the case for memory of the NAND type. The memory is an array of NAND strings where each string is a daisy chain of memory cells and a minimum erase block must be formed by a row of such NAND string. If the NAND string has 32 cells, a block will contain 32 rows of cells. The number of memory cells in a NAND string also increases with each generation, so the block size increases column-wise and row-wise.
The block size, which is dictated by the physical memory structure, is in present generation as large as 4 MB. On the other hand, the operating system of personal computers typically allocates logical sectors in size of 512 kB and often writes a page as a cluster of logical sectors in 4 kB unit. Thus, there is a great mismatch in the addressing granularity of a logical group corresponding to a block and a page. In the scheme of logical group, the ideal situation for a block is either nothing is written or the block is filled up sequentially with the entire logical group of valid data. In either case there is no fragmentation and there is no need for garbage collection or relocation. In the case of short random writes into a large block, the block becomes non-ideal very quickly and eventually will need relocation. This amounts to inefficient writes since the same page may have to be written and then re-copied one or more times (also referred to as “write amplification”.)
An alternative, conventional addressing approach suitable for short random writes is to not use logical groups, but to track every page independently as it is being written to a block. Instead of maintaining the stored data as orderly logical group in a block, each page is tracked as to which block it is stored in and at what offset in the block. Thus, in this page addressing scheme, there is no burden of storing or maintaining data in groups in order of sequential logical addresses. However, the page addressing scheme will have an address table much larger than that for the logical group address scheme. For example, if there are 1000 pages in a block, then the address table for the page addressing scheme will be approximately 2 to 3 orders of magnitude larger.
The page addressing scheme exact penalty in terms of a much larger address table. In practice, it will require more system resources and a relative large RAM to work with the memory controller. This is because the address table is usually maintained in flash memory but is cached to the controller RAM during operation to provide faster access. Current technology allows at most 2 to 4 MB of RAM to be fabricated on the controller chip. This is insufficient for systems using a page addressing scheme and additional external RAM chips will be required. The additional pinouts and interface circuits to support external RAM chips would add significantly to the cost.
Another problem with addressing granularity having very small units, such as 4 kB, is that it creates fragmented data, which is scattered between the blocks so much that maximum parallelism during read and data copy (due to update) is not achievable. Also, the amount of copy increases as small update can still trigger copy of one or more entire block.
Thus, there is a need to provide a nonvolatile memory that can efficiently handle data access characterized by short random writes into large blocks without suffering from the disadvantages and problems mentioned above.