This invention relates generally to data management in solid state storage devices (SSDs), and more particularly to methods and apparatus for controlling data storage and erasing procedures in SSDs.
Solid-state storage is non-volatile memory which uses electronic circuitry, typically in integrated circuits (ICs), for storing data rather than conventional magnetic or optical media like disks and tapes. SSDs such as flash memory devices are currently revolutionizing the data storage landscape. These devices are more rugged than conventional storage devices due to the absence of moving parts, and offer exceptional bandwidth, significant savings in power consumption, and random I/O (input/output) performance that is orders of magnitude better than hard disk drives (HDDs).
In SSDs, the storage is organized into storage areas, or “blocks”, each of which contains a set of storage locations to which data can be written. In the following, various operational characteristics of SSDs will be described with particular reference to NAND-based flash memory devices. It will be understood, however, that similar principles apply to other types of SSD. Flash memory, for example, is organized into storage blocks containing data write locations known as “pages”. A typical flash page is 4 kB in size, and a typical flash block is made up of 64 flash pages (thus 256 kB). Read and write operations can be performed on a page basis, while erase operations can only be performed on a block basis. Data can only be written to a flash block after it has been successfully erased. It typically takes 15 to 25 μs to read a page from flash cells to a data buffer inside a flash die. Writing a page to flash cells takes about 200 μs, while erasing a flash block normally takes 2 ms or so. Since erasing a block takes much longer than a page read or write, a write scheme known as “write-out-of-place” is used to improve write throughput and latency. With this scheme, a stored data page is not updated in-place in the flash storage. Instead, the updated page is written to another free flash page, and the associated old flash page is marked as invalid by setting a validity flag in the metadata stored as part of each page.
The write-out-of-place scheme, as well as other flash storage characteristics, requires certain “housekeeping” operations to be performed for internal management of the flash storage. For example, as pages are updated and old pages invalidated, a follow-up process is required to eliminate invalid data and release storage locations for new input data. This internal management process is commonly known as “garbage collection”. The garbage collection process involves selecting an occupied flash block and recovering all still-valid data from that block. The valid data pages are copied to another place in the flash storage, and the block is then erased. Blocks are typically selected for garbage collection based on the number of invalid pages they contain. However, garbage collection and block erasures can also be performed as part of other internal management processes which involve, in effect, moving data within the solid state storage. Wear-leveling is one example of such an internal management process. This process addresses the wear-out characteristics of flash memory. In particular, flash memory has a finite number of write-erase cycles before the storage integrity begins to deteriorate. Wear-leveling procedures aim to distribute write-erase cycles evenly among all available flash blocks to avoid uneven wear, so lengthening overall lifespan. In particular, wear-leveling functionality governs selecting blocks to which new data should be written according to write-erase cycle counts, and also moving stored data within the flash memory to release blocks with low cycle counts and even out wear.
Data placement and internal management operations are typically performed by dedicated control apparatus, known as a flash controller, which accompanies the flash storage. The flash controller manages data in the flash memory generally, controlling all internal management operations, and maintains address metadata in controller memory to track the location of data in the flash storage. In particular, the flash controller runs an intermediate software level called “LBA-PBA (logical block address—physical block address) mapping” (also known as “flash translation layer” (FTL) or “LPN-FPN (logical page number-flash page number) address mapping”. This maintains metadata in the form of an address map which maps the logical addresses associated with input datablocks from upper layers, e.g. a file system or host in a storage system, to physical addresses (flash page numbers) on the flash. This software layer hides the erase-before-write intricacy of flash and supports transparent data writes and updates without intervention of erase operations.
The internal management functions performed in SSDs lead to so-called “write amplification”. This arises because data is moved internally in the storage, so the total number of data write operations is amplified in comparison with the original number of data write requests received by the SSD. Write amplification is one of the most critical issues limiting the random write performance and write endurance lifespan in solid-state storage devices. Another key issue is error performance. Error correction (EC) coding is performed in SSDs by adding redundancy at the write-unit level. Specifically, an EC code is computed for the input data written to each page, or each sector within a page, and this EC code is recorded in that page, or sector, with the input data. This coding allows recovery from errors within individual data pages. However, solid state storage systems can employ additional EC coding to protect against failures at the device level. This coding is performed by managing a collection of devices in the manner of a RAID (redundant array of independent devices) array as commonly employed in HDD storage systems. SSD systems employing RAID-like protection are discussed in US Patent Application publication number US 2008/0320214A1, and “SSD Controllers by Start-Up Sandforce”. In one scenario, a storage system can employ multiple SSDs, each operating as described above with a controller managing its own local storage. The collection of SSDs can then be managed at a higher level like a RAID array. The basic operating principles of such a system will be illustrated below with reference to FIG. 1 of the accompanying drawings.
FIG. 1 is a schematic block diagram of an exemplary RAID-like SSD-based storage system 1. In this system, multiple SSDs 2 operate under storage controller 3 which services read/write requests received from hosts. Each SSD 2 operates as already described to manage data in its internal storage 4. In general, storage 4 may consist of one or more storage channels each having one or more chips or packages of chips, where each chip may contain one or more solid state storage dies. The host LBA (logical block address) space is logically partitioned in storage controller 3 and one segment of each logical block is allocated to a respective SSD 2. Redundancy is added at this stage to allow addition of RAID parity. Specifically, storage controller 3 EC codes each input host datablock (corresponding to a given host (“global”) LBA), and the resulting RAID parity is added to the host datablock. The parity-coded block is then partitioned by controller 3 into “unit datablocks”. Each unit datablock is supplied under an assigned unit LBA (uLBA) to a respective SSD 2 for storage. The mapping of global LBAs (gLBAs) to uLBAs in the set of SSDs is recorded by controller 3 in a gLBA-uLBA mapping table. Each SSD stores its respective unit datablock and records the physical storage location in a uLBA-PBA mapping table as usual. As a result of this process, RAID codewords are distributed across the array of SSDs 2 as illustrated schematically by the shaded section in the figure. This provides an additional level of EC coding which protects against failures at the SSD level. Within each SSD 2, a local controller performs internal management of storage 4 as described above, but this functionality, and the consequent remapping of uLBAs to PBAs, is transparent to storage controller 3 in this architecture.
FIG. 1 illustrates a so-called “outer RAID” configuration where the RAID codewords span multiple SSDs and hence multiple controllers. An “inner RAID” system can also be employed, in addition or alternatively to outer RAID. Inner RAID is implemented within an SSD controller. In the controller, the LBA space is logically partitioned and one segment of each logical block is assigned to a different sub-unit of the overall storage space. Redundancy is again added to allow addition of inner RAID parity, so that inner RAID codewords are partitioned and distributed among the set of sub-units. Specifically, an inner RAID codeword is partitioned into sub-unit datablocks, and each sub-unit datablock is assigned a sub-unit LBA (suLBA) in the address space of a respective sub-unit. The mapping of LBAs to suLBAs is recorded by the controller in a LBA-suLBA address map. Each subunit datablock is then stored in the respective storage sub-unit at a physical storage location which is recorded in a suLBA-PBA mapping table for that unit. This process provides EC coding which protects against failures at the sub-unit level in an SSD. Like outer RAID systems, the controller performs internal management (garbage collection, wear levelling, etc) independently within each storage sub-unit so that this functionality, and the consequent remapping of suLBAs to PBAs, operates at a lower logical level than the RAID coding at the logical block level.