1. Field of the Invention
The present invention relates to flash memory storage devices, and, in particular, to garbage collection routines for solid state disks (SSDs).
2. Description of the Related Art
Flash memory is a type of non-volatile memory that is electrically erasable and re-programmable. Flash memory is primarily used in memory cards and USB flash drives for general storage and transfer of data between computers and other digital products. Flash memory is a specific type of electrically erasable programmable read-only memory (EEPROM) that is programmed and erased in large blocks. One commonly employed type of flash memory technology is NAND flash memory. NAND flash memory forms the core of the flash memory available today, especially for removable universal serial bus (USB) storage devices known as USB flash drives, as well as most memory cards. NAND flash memory exhibits fast erase and write times, requires small chip area per cell, and has high endurance. However, the I/O interface of NAND flash memory does not provide full address and data bus capability and, thus, generally does not allow random access to memory locations.
There are three basic operations for NAND devices: read, write and erase. The read and write operations are performed on a page by page basis. Page sizes are generally 2N bytes, where N is an integer, with typical page sizes of, for example, 2,048 bytes (2 kb), 4,096 bytes (4 kb), 8,192 bytes (8 kb) or more per page. Pages are typically arranged in blocks, and an erase operation is performed on a block by block basis. Typical block sizes are, for example, 64 or 128 pages per block. Pages must be written sequentially, usually from a low address to a high address. Lower addresses cannot be rewritten until the block is erased.
A hard disk is addressed linearly by logical block address (LBA). A hard disk write operation provides new data to be written to a given LBA. Old data is over-written by new data at the same physical LBA. NAND flash memories are accessed analogously to block devices, such as hard disks. NAND devices address memory linearly by page number. However, each page might generally be written only once since a NAND device requires that a block of data be erased before new data is written to the block. Thus, for a NAND device to write new data to a given LBA, the new data is written to an erased page that is a different physical page from the page previously used for that LBA. Therefore, NAND devices require device driver software, or a separate controller chip with firmware, to maintain a record of mappings of each LBA to the current page number where its data is stored. This record mapping is typically managed by a flash translation layer (FTL) in software that might generate a logical to physical translation table. The flash translation layer corresponds to the media layer of software and/or firmware controlling an HDD.
Associated with each page is a spare area (typically 100-500 bytes) generally used for storage of error correction code (ECC) information and for storage of metadata used for memory management. The ECC is generally needed for detecting and correcting errors in the user data stored in the page, and the metadata is used for mapping logical to and from physical addresses. As such, the additional bytes of memory are “hidden” from the user and are not available for storing data. The first block (block 0) of a flash die is generally provided from the manufacturer error-free, and is commonly used by designers to include program code and associated metadata for block management.
Typically, for high capacity solid state disks (SSDs), several design tradeoffs might be considered when implementing a method to maintain a logical to physical translation table. These tradeoffs typically include: efficient random access memory (RAM) usage; efficient flash usage; fast address lookup for both read operations and write operations; and fast reconstruction of the translation table on device startup.
Several techniques are known in the art for maintaining the logical to physical translation table. One such approach is known as direct page mapping, an example of which is described in the paper by Andrew Birrell & Michael Isard, et al., A DESIGN FOR HIGH-PERFORMANCE FLASH DISKS, ACM SIGOPS Operating Systems Review, Vol. 41, Issue 2, pp. 88-93, (April 2007), which is incorporated herein by reference in its entirety (hereinafter “Birrell”). Direct page mapping maintains a lookup table in RAM having an entry for each flash page, and a summary page for metadata at the end of each block, from which the logical to physical translation table may be reconstructed at startup. For example, a direct page mapped translation table might contain, for every LBA, a logical sector number corresponding to a physical block number and a physical page number. Thus, direct page mapping comprises a single-level logical-to-physical translation. The summary page for each block might contain the LBA and valid bits for each page in the block so that the translation table can be reconstructed at startup. Thus, the direct page mapping scheme requires a large amount of RAM (on the order of 1-2 MB per GB of user storage) to store the translation table, which can become burdensome for higher capacity SSDs.
Another approach is known as block mapping. Block mapping generally classifies blocks as either data blocks (D-blocks) or update blocks (U-blocks). The total size of the D-blocks is the effective storage space for user data. U-blocks are invisible to users. Generally, when a write command cannot be accommodated in the D-block corresponding to the LBA, a U-block is allocated to receive the new data and the old data in the D-block is invalidated. Subsequent writes to that D-block will be received by the allocated U-block. When the U-block becomes full, another U-block might be allocated, or the U-block might be merged with the original D-block. Thus, block mapping maintains a lookup table in RAM that maps a logical block to a physical block. Block mapping lacks a page-level map, relying instead on the typical case that data is stored in sequential order within the block. For example, a block mapped translation table might contain a logical sector number corresponding to a logical block number and a logical page number. The logical block number can be translated into a physical block number and the logical page number might correspond to a physical offset within the physical block. Thus, block mapping comprises a two-level logical-to-physical translation. The size of the translation table is proportional to the number of blocks in the flash memory, thus requiring less RAM than a page mapped translation table.
However, because block mapping does not have a page-level map, the flash media may be inefficiently utilized when the data access workload is non-sequential. For non-sequential data access workloads, block mapping might require data to be copied and re-written numerous times to maintain the correct mapping. An example of block mapping is described in the paper by Jeong-Uk Kang & Heeseung Jo, et al., A SUPERBLOCK-BASED FLASH TRANSLATION LAYER FOR NAND FLASH MEMORY, Proceedings of the 6th ACM & IEEE International Conference On Embedded Software, pp. 161-170, (Oct. 22-25, 2006), which is incorporated herein by reference in its entirety (hereinafter “Kang”).
A third approach for maintaining the logical-to-physical translation table is known as superblock mapping. Superblock mapping groups together a set number of adjacent logical blocks into a superblock. Superblock mapping maintains a page global directory (PGD) in RAM for each superblock. Page middle directories (PMDs) and page tables (PTs) are maintained in flash. Each LBA can be divided into a logical block number and a logical page number, with the logical block number comprising a superblock number and a PGD index offset. The logical page number comprises a PMD index offset and a PT index offset. Each entry of the PGD points to a corresponding PMD. Each entry of the PMD points to a corresponding PT. The PT contains the physical block number and the physical page number of the data. Super-block mapping, thus, comprises a four-level logical-to-physical translation and provides page-mapping.
The PMD's and PT's are stored in the spare areas of the flash pages to provide page-mapping without using an excessive amount of RAM. However, because the spare area is used to store page-level mapping information, less memory is available for error correction codes (ECC). Further, the limited amount of memory available in the spare area precludes storing complicated mapping information. Finally, reconstruction of the translation table at startup can be time-intensive. An example of a superblock mapping scheme is described in Kang.
As described above, for write operations NAND devices store the new data for the LBA on a new page, unlike hard disk drives (HDDs) that can rewrite individual physical sectors. Thus, a NAND device generally requires that a block be erased before new data can be written to the block. Further, as described above, often a NAND device will write new data for a given LBA to an erased page that is a different physical page from the page previously used for that LBA. Thus, NAND devices also generally require the device driver software or the separate controller chip periodically initiate a process to erase data that is “stale” or out-of-date. As would be apparent to one of skill in the art, without periodically erasing out-of-date data, the flash memory would fill up with data that is mostly out-of-date. This inefficiency would reduce the realized flash memory capacity because less current data could be stored. Therefore, device driver software or controller chips generally periodically run a “garbage collection” routine adapted to provide efficient flash memory utilization by erasing out-of-date blocks. An example of a garbage collection routine is described in Kang. Garbage collection routines impact performance of the flash memory system by utilizing processor resources and potentially delaying write operations to the flash media.
However, NAND device blocks can be erased a limited number of times before device failure (typically on the order of 100,000 erasures). Therefore, over the operational life of an SSD, blocks of flash memory will fail and become unusable. Thus, the device driver software or the separate controller chip should minimize the number of erasures, and must also maintain a record of bad blocks. For example, device driver software or controller chips might implement wear leveling to spread the erasing and writing of blocks over the entire flash memory evenly to avoid repeatedly erasing and writing a given subset of blocks.