The following references illustrate the state of the art:    [1] J. Brewer, M. Gill, “Nonvolatile memory technologies with emphasis on flash”, IEEE Press Series on Microelectronic Sys., 2008.    [2] B. Lee, E. Ipek, O. Mutlu, and D. Burger. “Architecting phase change memory as a scalable DRAM alternative”. In ISCA-36, 2009.    [3] Laura M. Grupp et-al., “The Bleak Future of NAND Flash Memory”, 10th USENIX conf. on file and storage technologies (FAST), 2012.    [4] Laura M. Grupp, Adrian M. Caulfield, Joel Coburn, Steven Swanson, “Characterizing Flash Memory: Anomalies, Observations, and Applications”, MICRO'09.    [5] Samsung Electronics, “K9NBG08U5M 4 Gb*8 Bit NAND Flash Memory Data Sheet”.    [6] Samsung Electronics, “K9GAG08U0M 2 Gb*8 Bit NAND Flash Memory Data Sheet”.    [7] S. Lee, K. Ha, K. Zhang, J. Kim, and J. Kim, “FlexFS: A Flexible Flash File System for MLC NAND Flash Memory”, USENIX Annual Technical Conference, 2009.    [8] K. Takeuchi, et-al. “A multipage cell architecture for high-speed programming multilevel NAND flash memories”, Journal of Solid-State Circuits (JSSC), 1998.    [9] R. L. Rivest and A. Shamir, “How to reuse a write-once memory,” Infor- mation and Control, vol. 55, nos. 1-3, pp. 1-19, 1982.    [10] A. Jiang, R. Mateescu, M. Schwartz and J. Bruck, “Rank Modulation for Flash Memories”, IEEE Transactions on Information Theory, vol. 55, no. 6, pp. 2659-2673, June 2009.    [11] K. D. Suh et al., “A 3.3V 32 Mb NAND flash memory with incremental step pulse programming scheme,” ISSCC, pp. 128-129, 1995.    [12] PCMARK-VANTAGE, “White paper v1.0”, http://www.futuremark.com/benchmarks/pcmarkvantage/support/    [13] F. Bedeschi, R. Fackenthal, C. Resta, E. Donze et al., “A bipolar-selected phase change memory featuring multi-level cell storage,” IEEE Journal of Solid-State Circuits, vol. 44, no. 1, pp. 217-227, 2009.    [14] M. Joshi, Wangyuan Zhang, Tao Li, “Mercury: A fast and energy-efficient multi-level cell based Phase Change Memory system”, IEEE High Performance Computer Architecture (HPOA' 11), 2011.    [15] J. Hu et-al., “Write Activity Minimization for Nonvolatile Main Memory Via Scheduling and Recomputation”, IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems. Vol. 30, 2011.    [16] T. Nirschl et al., “Write strategies for 2 and 4-bit multi-level phase-change memory. In IEDM '07: Proceedings of the 2007 IEEE International Electron Devices Meeting, 2007.    [17] J.-T. Lin, Y.-B. Liao, M.-H. Chiang, and W.-C. Hsu, “Operation of multi-level phase change memory using various programming techniques,” in Proc. IEEE Int. Conf. on IC Design and Technology, May 2009, pp. 199-202.    [18] HanBin Yoon, Naveen Muralimanohar, Justin Meza, Onur Mutlu, Norman P. Jouppi, “Data Mapping for Higher Performance and Energy Efficiency in Multi-Level Phase Change Memory”, NVMW'12    [19] Moinuddin K. et-al. “Improving Read Performance of Phase Change Memories via Write Cancellation and Write Pausing”, IEEE High Performance Computer Architecture (HPOA'10), 2010.    [20] G. Hemink et-al. “Fast and accurate programming meothd for multi-level NAND EEPROMs”, Symp. on VLSI Tech., pp. 129-130, 1995.    [21] K. D. Suh et al., “A 3.3V 32 Mb NAND flash memory with incremental step pulse programming scheme,” ISSCC, pp. 128-129, 1995.    [22] M. Grossi et-al., “Program Schemes for Multilevel Flash Memories”, Proceedings of the IEEE, Vol. 91, No. 4, 2003.    [23] H. Kim,. et-al., “A 159 mm2 32 nm 32 Gb MLC NAND-Flash Memory with 200 MB/s Asynchronous DDR Interface”, in IEEE International Solid-State Circuits Conference (ISSCC), 2010.    [24] T. Tanaka et-al. “A quick interlligent page-programming architecture and a Shielded bitline sensing method for 3V-only NAND flash memory”. IEEE J. solid-state circuits, Vol. 29, No. 11, Nov. 1994.    [25] T. Hara et-al. “A 146 mm2 8 Gb NAND flash memory with 70 nm CMOS technolgy”, Intl. Solid-State Circuits Conf. (ISSCC), pp. 44-45, 2006.    [26] S. Chang et-al., “A 48 nm 32 gb 8-level NAND flash memory with 5.5 mb/s program throughput”, “, in IEEE International Solid-State Circuits Conference (ISSCC), 2009.    [27] A. Berman, Y. Birk, “Constrained Flash Memory Programming”, Intl. Sym. Information Theory (ISIT), 2011.    [28] K. Takeuchi et-al., “A 56 nm CMOS 99 mm2 8 Gb multi-level NAND flash memory with 10 Mbyte/sec program throughput”. ISSCC, 2006.    [29] C. Trinhlet-al, “13.6 A 5.6 MB/s 64 Gb 4 b/Cell NAND Flash Memory in 43 nm CMOS”, in IEEE International Solid-State Circuits Conference (ISSCC) 2009.    [30] Joowon Hwang et-al., “A middle-1× nm NAND flash memory cell (M1×-NAND) with highly manufacturable integration technologies”, IEEE International Electron Device Meeting (IEDM), 2011.    [31] K. Imamiya et-al., “A 130 mm 256 Mb NAND flash with shallow trench isolation technology”, ISSCC, pp. 112-113, 1999.    [32] T. Futatsuyama et-al., “A 113 mm2 32 Gb 3 b/cell NAND Flash memory”, in IEEE International Solid-State Circuits Conference (ISSCC), 2009.    [33] Yan Lil et-al. “128 Gb 3 b/Cell NAND Flash Memory in 19 nm Technology with 18 MB/s Write Rate and 400 Mb/s Toggle Mode”, in IEEE International Solid-State Circuits Conference (ISSCC'12) 2012.
NAND Flash is currently the most prominent non-volatile semiconductor memory technology, used mostly for storage [1]. Phase-Change Memory (PCM) is viewed by some as a possible replacement for DRAM [2]. Both Flash and PCM employ multi-level cells (MLC) [1,2], and designers strive to increase density by reducing cell size and increasing the number of levels. (Single-level cells (SLC), namely cells with an “erased” level and a single non-erased level, capable of holding a single bit of information, are also used.)
Performance implications of MLC
Flash MLC programming (writing) entails several steps: first, a data page is transferred from the host to an on-chip memory buffer; next, a high voltage pulse (program pulse) is applied to the cells being programmed. A program pulse's impact on different cells may vary due to manufacturing variations. Also, decreasing a cell's level entails applying voltage to the bulk, so it cannot be performed to individual cells. Consequently, over-programming of a cell must be avoided , or held down to a minimum so that error correction codes can be employed at reasonable cost and used to correct resulting errors. Programming is therefore carried out via a sequence of small pulses, each followed by read in order to verify the cell's level. The program-verify cycle is repeated until the desired levels are achieved [1].
TABLE 1PCMNAND FlashReadSLC 10 ns 25 μsLatencyMLC 44 ns 50 μsWriteSLC100 ns200 μsLatencyMLC395 ns900 μs
Table 1 illustrates a latency of SLC and 4-level MLC in PCM and Flash memories [3,4,5,6].
Write latency increases with an increase in the number of levels. As seen in Table 1, it increases faster than the increase in the number of levels, e.g., from 200 μs for 2-level cells to 900 μs for 4-level cells.
A cell's level is determined by applying a reference voltage to it and comparing the cell's threshold voltage to it. While each read-verify (during Write) entails a single reference comparison, the determination of a cell's level during read requires multiple reference comparisons, each with a different reference voltage. Therefore, read latency also increases with an increase in the number of levels [3] (Table 1).
The move to MLC, while beneficial in terms of storage capacity and cost per bit, comes at a performance penalty. Moreover, with an increase in capacity and a reduction in performance, the “normalized” performance drop is dramatic. There is therefore a true need for schemes that can somehow mitigate the performance drop.
Another problem with MLC is endurance, namely the permissible number of erasure cycles that a cell may undergo before it degrades. Endurance can be 10× lower for 4-level cells than for 2-level cells. This invention does not directly address endurance.
The key to all schemes for mitigating the performance drop, specifically the increase in read and/or write latency, is a critical observation whereby if the maximum (over cells being accessed) current cell level (for read) and cell target level (for write) is known, then one can save time. For example, if the maximum target level is 2 then one need not spend the time for reaching level 3 or above. Similarly, if (when reading), it is known that all cells are at one of the first two levels, the number of reference comparisons can be reduced accordingly.
In FlexFS [7], the file system dynamically decides whether to use any given physical page as SLC or MLC. Use in SLC mode increases endurance and accelerates access. In all modes, any given cell contains data belonging to a single data page. The number of cells per data page varies with the number of levels being used, reflecting the change in cell capacity and keeping a fixed logical (data) page size. In any case, a page (and, in fact, the entire physical block of cells containing it) must be erased when switching its mode.
In Multipage Programming (MP) [8], each 4-level cell is shared among two pages. A physical page's capacity equals twice that of a logical page. The two logical pages sharing a physical page are typically written one at a time. The content of the first page being written determines one bit in the level number of a cell, and the second page determines the value of the other bit. When writing the second page, one must first read the cell to determine its current level, as the cell's final level is determined by the values of the both pages' bits. MP has several salient features: 1) when writing the first of the two “partner” pages, only the two lower levels are used, so writing is as fast as for SLC; 2) as long as the second page has not been written, reading of the first one is also fast; 3) no erasure is required when switching from SLC to MLC; and 4) Once the second page has been written, this slows down the reading of both pages, as one must determine the exact level of the cell, which may be any of the four levels.
It is important to note that both MP and our new scheme, MMLP, are fundamentally different from various coding schemes that are used to permit multiple writes to MLC pages between erasures. (Examples of the latter include WOM codes [9] and Rank Modulation [10].) In the other schemes, the old content is lost, whereas both MP and MMLP add information without harming the old one.