The present invention relates generally to non-volatile memory, such as NAND flash memory, and more particularly to data deduplication within a non-volatile memory.
Data deduplication is known. Data deduplication is a specialized data compression technique for eliminating duplicate copies of repeating data. Data deduplication typically involves identifying unique chunks of data and storing them during an analysis process. As the process continues, other chunks are compared to the stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Data deduplication is commonly used to improve storage utilization.
Non-volatile memory, or memory that retains data without power, is known. NAND-based flash memory is one type of non-volatile memory. The performance characteristics of conventional NAND-based solid state drives (SSDs) are fundamentally different from those of traditional hard disk drives (HDDs). Data in conventional SSDs is typically organized in pages of 4, 8, or 16 KB sizes. Moreover, page read operations in SSDs are typically one order of magnitude faster than write operations and latency neither depends on the current nor the previous location of operations.
In flash-based SSDs, memory locations are erased in blocks prior to being written to. The size of an erase block unit is typically 256 pages and an erase operation takes approximately one order of magnitude more time than a page program operation. Due to the intrinsic properties of NAND-based flash, flash-based SSDs write data out-of-place whereby a mapping table maps logical addresses of the written data to physical ones. This mapping table is typically referred to as the Logical-to-Physical Table (LPT). In flash-based SSDs, an invalidated data location cannot be reused until the entire block it belongs to has been erased. Before erasing, the block undergoes garbage collection, whereby any valid data in the block is relocated to a new block. Garbage collection of a block is typically deferred for as long as possible to maximize the amount of invalidated data in the block, and thus reduce the number of valid pages that are relocated, as relocating data causes additional write operations, and thereby increases write amplification.
As flash-based memory cells exhibit read errors and/or failures due to wear or other reasons, additional redundancy may be used within memory pages as well as across memory chips (e.g., RAID-5 and RAID-6 like schemes). The additional redundancy within memory pages may include error correction codes (ECC) which, for example, may include BCH codes. While the addition of ECC in pages is relatively straightforward, the organization of memory blocks into RAID-like stripes is more complex. For instance, individual blocks are retired over time which requires either reorganization of the stripes, or capacity reduction of the stripe. As the organization of stripes together with the LPT defines the placement of data, SSDs typically utilize a Log-Structured Array (LSA) architecture, which combines these two methods.