1. Field of the Invention
The present invention relates to flash memory storage devices, and, in particular, to skip operations for solid state disks (SSDs).
2. Description of the Related Art
Flash memory is a type of non-volatile memory that is electrically erasable and re-programmable. Flash memory is primarily used in memory cards and USB flash drives for general storage and transfer of data between computers and other digital products. Flash memory is a specific type of electrically erasable programmable read-only memory (EEPROM) that is programmed and erased in large blocks. One commonly employed type of flash memory technology is NAND flash memory. NAND flash memory forms the core of the flash memory available today, especially for removable universal serial bus (USB) storage devices known as USB flash drives, as well as most memory cards. NAND flash memory exhibits fast erase and write times, requires small chip area per cell, and has high endurance. However, the I/O interface of NAND flash memory does not provide full address and data bus capability and, thus, generally does not allow random access to memory locations.
There are three basic operations for NAND devices: read, write and erase. The read and write operations are performed on a page by page basis. Page sizes are generally 2N bytes, where N is an integer, with typical page sizes of, for example, 2,048 bytes (2 kb), 4,096 bytes (4 kb), 8,192 bytes (8 kb) or more per page. Pages are typically arranged in blocks, and an erase operation is performed on a block by block basis. Typical block sizes are, for example, 64 or 128 pages per block. Pages must be written sequentially, usually from a low address to a high address. Lower addresses cannot be rewritten until the block is erased.
Other storage devices, such as conventional hard disk drives (HDDs), support additional disk-access operations, such skip-write and skip-read. A skip operation is used for reading or writing relatively closely located, but non-contiguous, blocks on an HDD. The device requesting the skip-read or skip-write provides a starting logical block address (LBA), a length count of the number of blocks to read/write, and a skip mask. The skip mask comprises a number of bits where each bit in the mask corresponds to a block offset from the starting block address. A logic ‘1’ bit in the skip mask signifies that the block corresponding to that bit position will be read/written. A logic ‘0’ bit in the skip mask signifies that the block corresponding to that bit position will not be read/written and will be skipped. The length count comprises the total number of blocks to transfer, not the span of the request. Thus, the length count matches the total number of logic ‘1’ bits in the skip mask. HDDs process skip commands at a media layer of the network, for example corresponding to a layer in the OSI (“Open Systems Interconnection”) model. A skip operation is useful for reading or writing several non-contiguous memory locations without issuing separate requests and requiring additional revolutions of the HDD. Further, only the requested data is transferred to or from the HDD.
A hard disk is addressed linearly by logical block address (LBA). A hard disk write operation provides new data to be written to a given LBA. Old data is over-written by new data at the actual LBA. NAND flash memories are accessed analogously to block devices, such as hard disks. NAND devices address memory linearly by page number. However, each page might generally be written only once since a NAND device requires that a block of data be erased before new data is written to the block. Thus, for a NAND device to write new data to a given LBA, the new data is written to an erased page that is a different physical page from the page previously used for that LBA. Therefore, NAND devices require device driver software, or a separate controller chip, to maintain a record of mappings of each LBA to the current page number where its data is stored. This record mapping is typically performed by a flash translation layer (FTL) in software that might generate a logical to physical translation table. The flash translation layer corresponds to the media layer of software controlling an HDD.
Associated with each page is a spare area (typically 100-500 bytes) generally used for storage of error correction code (ECC) information and for storage of metadata used for memory management. The ECC is for detecting and correcting errors in the user data stored in the page, and the metadata is used for mapping logical to physical addresses and vice-versa. As such, the additional bytes of memory are “hidden” from the user and are not available for storing data. The first block (block 0) of a flash die is generally provided from the manufacturer error-free, and is commonly used by designers to include program code and associated metadata for block management.
Typically, for high capacity solid state disks (SSDs), several design tradeoffs might be considered when implementing a method to maintain a logical to physical translation table. These tradeoffs typically include: efficient random access memory (RAM) usage; efficient flash usage; fast address lookup for both read operations and write operations; and fast reconstruction of the translation table on device startup.
Several techniques are known in the art for maintaining the logical to physical translation table. One such approach is known as direct page mapping, an example of which is described in the paper by Andrew Birrell & Michael Isard, et al., A DESIGN FOR HIGH-PERFORMANCE FLASH DISKS, ACM SIGOPS Operating Systems Review,Vol. 41,Issue 2,pp. 88-93,(April 2007), which is incorporated herein by reference in its entirety (hereinafter “Birrell”). Direct page mapping maintains a lookup table in RAM having an entry for each flash page, and a summary page for metadata at the end of each block, from which the logical to physical translation table may be reconstructed at startup. For example, a direct page mapped translation table might contain, for every LBA, a logical sector number corresponding to a physical block number and a physical page number. Thus, direct page mapping comprises a single-level logical to physical translation. The summary page for each block might contain the LBA and valid bits for each page in the block so that the translation table can be reconstructed at startup. Thus, the direct page mapping scheme requires a large amount of RAM (on the order of 1-2 MB per GB of user storage) to store the translation table, which can become burdensome for higher capacity SSDs.
Another approach is known as block mapping. Block mapping generally classifies blocks as either data blocks (D-blocks) or update blocks (U-blocks). The total size of the D-blocks is the effective storage space for user data while U-blocks are invisible to users. Generally, when a write command cannot be accommodated in the D-block corresponding to the LBA, a U-block is allocated to receive the new data and the old data in the D-block is invalidated. Subsequent writes to that D-block will be received by the allocated U-block. When the U-block becomes full, another U-block might be allocated, or the U-block might be merged with the original D-block. Thus, block mapping maintains a lookup table in RAM that maps a logical block to a physical block. Block mapping lacks a page-level map, instead relying on the typical case that data is stored in sequential order within the block. For example, a block mapped translation table might contain a logical sector number corresponding to a logical block number and a logical page number. The logical block number can be translated into a physical block number and the logical page number might correspond to a physical offset within the physical block. Thus, block mapping comprises a two-level logical to physical translation. The size of the translation table is proportional to the number of blocks in the flash memory, thus requiring less RAM than a page mapped translation table.
However, because block mapping does not have a page-level map, the flash media may be inefficiently utilized when the data access workload is non-sequential. For non-sequential data access workloads, block mapping might require data to be copied and re-written numerous times to maintain the correct mapping. An example of block mapping is described in the paper by Jeong-Uk Kang & Heeseung Jo, et al., A SUPERBLOCK-BASED FLASH TRANSLATION LAYER FOR NAND FLASH MEMORY, Proceedings of the 6th ACM & IEEE International Conference On Embedded Software,pp. 161-170,(Oct. 22-25, 2006), which is incorporated herein by reference in its entirety (hereinafter “Kang”).
A third approach for maintaining the logical to physical translation table is known as a superblock mapping scheme. Superblock mapping groups together a set number of adjacent logical blocks into a superblock. The superblock mapping scheme maintains a page global directory (PGD) in RAM for each superblock. Page middle directories (PMDs) and page tables (PTs) are maintained in flash. Each LBA can be divided into a logical block number and a logical page number, with the logical block number comprising a superblock number and a PGD index offset. The logical page number comprises a PMD index offset and a PT index offset. Each entry of the PGD points to a corresponding PMD. Each entry of the PMD points to a corresponding PT. The PT contains the physical block number and the physical page number of the data. Super-block mapping, thus, comprises a four-level logical to physical translation and provides page-mapping.
The PMD's and PT's are stored in the spare areas of the flash pages to provide page-mapping without using an excessive amount of RAM. However, because the spare area is being used to store page-level mapping information, less memory is available for error correction codes (ECC). Further, the limited amount of memory available in the spare area precludes storing complicated mapping information. Finally, reconstruction of the translation table at startup can be time-intensive. An example of a superblock mapping scheme is described in Kang.
As described above, for write operations NAND devices store the new data for the LBA on a new page. Thus, NAND devices also generally require that the device driver software or the separate controller chip periodically initiate a process to erase data that is out-of-date. As would be apparent to one of skill in the art, without periodically erasing out-of-date data, the flash memory would fill up with data that is mostly out-of-date. This inefficiency would reduce the realized flash memory capacity because less current data could be stored. Therefore, device driver software or controller chips generally periodically run a “garbage collection” routine adapted to provide efficient flash memory utilization by erasing out-of-date blocks. An example of a garbage collection routine is described in Kang.
However, NAND device blocks can be erased relatively few times before device failure (typically on the order of 100,000 erasures). Therefore, over the operational life of an SSD, blocks of flash memory will fail and become unusable. Thus, the device driver software or the separate controller chip should minimize the number of erasures, and must also maintain a record of bad blocks. For example, device driver software or controller chips might implement wear leveling to spread the erasing and writing of blocks over the entire flash memory to avoid repeatedly erasing and writing a given subset of blocks.