The present invention relates to flash memory, and more particularly, this invention relates to optimized data packaging in flash memory.
Performance characteristics of NAND flash-based solid-state drives (SSDs) are fundamentally different traditional hard disk drives (HDDs). Data is organized in pages, each page typically being 4, 8, or 16 KB in size. Page read operations are typically one order of magnitude faster than write operations, and latency neither depends on the current nor the previous location of operations, unlike in HDDs. However, memory locations must be erased prior to writing to the same memory location. The size of an erase block unit is typically 256 pages, and the erase operation takes approximately one order of magnitude more time than a page program operation. Due to these properties of NAND flash memory, SSDs write data out-of-place, and maintain a mapping table that maps logical to physical addresses, called a logical-to-physical table (LPT).
Since flash chips/blocks/pages are capable of exposing errors or completely failing due to limited endurance or other reasons, additional redundancy has to be applied within flash pages, sing error correction code (ECC) such as Bose-Chaudhuri-Hoequenghen (BCH), as well as applying redundancy across flash chips, such as redundant array of independent disks or drives (RAID) configurations including RAID-5, RAID-6 and other similar schemes.
Garbage Collection (GC) in the context of flash SSD controllers refers to the process of identifying blocks of pages (or block-stripes depending on the specific controller and the respective GC unit of operation) to be reclaimed for future usage and relocating all still valid pages therein. A GC unit of operation is referred to herein as a Logical Erase Block (LEB). Note that a LEB may include any multiple of the physical flash blocks, which is the unit of physical erasure. For example, in a RAID scheme, multiple flash blocks from different lanes (i.e., channels) may be grouped together in a block stripe. Since the RAID parity is computed against the data in all the participating blocks, these blocks cannot be reclaimed individually. Rather, the full stripe has to be garbage-collected as a single unit. Garbage collecting a LEB requires relocation of any valid logical pages within a LEB to new physical pages to allow for erasing the entire LEB and subsequently making it (or the flash blocks it includes in case it was formed out of multiple flash blocks as a block-stripe) ready to be populated with new logical pages. The amount of data relocated due to GC relocation of valid pages constitutes garbage-collection induced write amplification, which is undesirable.
In addition, the “heat” of data refers to the rate (frequency) at which the data is read. “Hot” data tends to be read very frequently, whereas “cold” data is only rarely read. Tracking the heat of a logical page involves, for instance, allocating a certain number of bits in the LPT mapping entry for the page to keep track of how many read operations the page has seen in a certain time period or window.