In computer architecture the memory hierarchy is a concept used for storing and also often used for discussing performance issues related to computer architectural design, algorithm predictions, and the lower level programming. The memory hierarchy in computer storage distinguishes each level in the hierarchy by response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies.
In the context of this disclosure, the memory hierarchy of interest consists of (a) processor registers, (b) caches (L1, L2, L3, etc.), (c) main memory, and (d) disk storage. For memory devices that are “farther” away from the CPU (or “lower” in the memory hierarchy), the capacity is bigger while the response time is longer. The capacity of these memory devices is in the order of (a) less than Kilobytes, (b) Megabytes to tens of Megabytes, (c) tens of Gigabytes, and (d) Terabytes, respectively. The response time of these memory devices is in the order of (a) sub-nanoseconds, (b) nanoseconds, (c) tens of nanoseconds, and (d) tens of milliseconds for random access of Hard Disk Drives (HDDs), respectively. In recent years, one of the major technology advancement in memory hierarchy has been the wider adoption of solid-state disks (SSDs), built with NAND Flash, which improves the disk response time to tens of microseconds.
Even with SSDs, there is still a big gap in response time between (c) and (d). On top of that, (a)-(c) are “byte-addressable” (although in practice, memory hierarchy levels (b)-(c) are often addressed with 64-byte unit), while memory hierarchy level (d) is “block-addressable” with a minimum 4K-byte block. In computer terms, the former is a “memory access” while the latter is a “storage access” or “I/O (Input/Output) access”. The different access semantics and block transfer size also increase the overhead of accessing the disk.
One attempt to avoid the disk access as much as possible, in order to improve performance, is to increase the main memory capacity. However, due to cost and power reason, there is a bound to this investment, especially as the Moore's Law scaling for DRAM will no longer be able to reduce cost and power much more. Furthermore, given the overwhelming trend of cloud computing and big data applications, the data size of interest is getting bigger and bigger, and hence simply trying to increase main memory capacity will lose in this foot race.
Other than response time and capacity difference, there is also another significant difference between memory and disk. Memory is volatile and disks (SSDs or HDDs) are non-volatile. When power is lost, the memory content is lost, while the disk content is kept. It is very important for online transaction processing (OLTP) to write the results to some non-volatile storage to formally complete the transaction to safeguard against unexpected power loss. This is another reason why disk operations are necessary. How to efficiently interacting with disks while not slowing down the operation performance has been an active topic of research and development by computer scientists for decades.
It is therefore intuitively obvious that it would be ideal to have a memory device that has the response time and byte-addressable property of the memory, and the capacity and non-volatile property of the disks. This kind of memory is generally referred to as the Storage Class Memory (SCM) (G. Burr et al., “Overview of candidate device technologies for storage-class memory”, IBM Journal of Research and Development 52(4/5): pp. 449-464, June 2008). In the past many years, there were numerous attempts by different companies and research groups to develop SCMs using different materials, processes, and circuit technologies. Some of the most prominent examples of SCMs to date include Phase Change Random Access Memory (PCRAM), Resistive Random Access Memory (RRAM), and Spin-transfer torque Magnetic Random Access Memory (STT-MRAM). Recently, Intel™ and Micron™ announced advanced SCM, that is claimed to be “1000 faster than NAND flash and 10 times denser than DRAM. If SCMs become available, many believe that a natural place in the memory hierarchy for SCMs will be between memory hierarchy level (c) and memory hierarchy level (d) mentioned above to bridge the gap in between.
One common characteristic of SCMs, which is also shared by NAND flash, is that these memory devices have finite write endurance. As the functionality of a memory device is to support data written to and read from, a memory device with finite write endurance means that it cannot be written indefinitely. The number of times they can be written varies for different kinds of SCMs or NAND flash. Recent TLC (Triple-Level Cell) 3D NAND flash may endure as few as several thousand writes. SCMs usually can endure a few orders of magnitude more than NAND flash, but are also orders of magnitude worse than convention DRAM (which usually quoting write endurance around 1015).
One important (and arguably necessary) technique that needs to be developed for any memory device with finite write endurance is called wear leveling. If a particular memory location is being written too many times exceeding the write endurance, then the memory location cannot be used reliably for subsequent memory operations. Hence to prolong the lifetime of such memory devices, it is best to write to every memory location about the same number of times (hence the “wear” is “leveled”). But since the addressing pattern is application dependent, and cannot be steered to observe the equal wear constraint, it is up to the memory subsystem to perform wear leveling without the cooperation or awareness from the host application.
Simply put, a wear leveling technique needs to optionally write to a different memory location than what is required by the application program. Otherwise, the wear is determined by the addressing pattern of the application program and there is no defense from the memory devices. NAND flash in general uses a table to map the logical address (which the application program wants to write to) to the physical address (which the NAND flash actually writes to). The same logical address could be mapped to different physical address at different time, and hence the wear is leveled. Recall that NAND flash is a block addressing device with a minimum addressing unit of 4K-byte, hence the mapping table can be constructed at about a size that is 0.1% of the storage size. (e.g., for 1 Terabyte of SSD, the table is about 1 Gigabyte.) Since this table search is on the critical path of NAND flash memory access, the table is usually implemented by faster technology such as DRAM.
Unfortunately this general table-mapping wear-leveling technique cannot be used for SCMs. As SCMs are byte-addressable, a much smaller unit than the 4K-byte block, if we attempt to adopt the table mapping wear leveling technique, the table size will be in the same order as the storage size. This defeats the purpose and negates the advantage of SCMs.
In this disclosure, I describe one wear-leveling invention that is designed for SCMs. To summarize the motivation of this invention, SCMs are being sought as the ideal new introduction to the memory hierarchy that has the response time and byte-addressable property of the memory, and the capacity and non-volatile property of the disks. The present invention is to provide an innovative wear-leveling technique to address the write endurance issue for SCM.