The flash translation layer (FTL) is a software or firmware layer implemented in a flash-based solid-state drive (SSD) device that enables a flash memory to emulate certain aspects of a hard disk drive. The FTL maintains a mapping table in a memory (e.g., SRAM) of the SSD device that maps a logical page address (LPA) of an input/output (I/O) request received from a host computer to a physical page address (PPA) of the SSD device. The FTL can evenly distribute erasure requests to multiple flash blocks by wear leveling and garbage collection to improve the performance and lengthen the lifetime of the SSD.
I/O traffic from a host computer to an SSD device can be categorized as either random or sequential. Most workloads have a combination of random and sequential streams depending on a time window that an observation is made on. Workloads whose I/O operations are largely sequential and access blocks have high spatial locality are classified as a sequential workload or a sequential stream.
In a typical flash-based storage use case, an operating system of the host computer sends mixed random and sequential streams to a solid-state drive (SSD) device because memory transaction requests are generated from multiple tenants or multiple applications. The SSD device usually has no information about the incoming interleaved sequential and random streams. The mixed random and sequential streams may trigger many Full Merges (FMs) in a log-structure based FTL of an SSD, thereby increasing the cost of operation and write amplification.
Table 1 summarizes the symbols and abbreviations used in the present disclosure.
TABLE 1List of SymbolsAbbreviationsDescriptionFMFull MergePMPartial MergeSMSwitch MergeSLBSequential Log BlockRLBRandom Log blockLBALogical Block AddressPBAPhysical Block AddressLPALogical Page AddressPPAPhysical Page AddressSZSequential Buffer ZoneKZK-associative Sequential Buffer ZoneRZRandom Buffer ZoneNLNumber of Log BlocksNKUpper Bound of KNPNumber of Pages in a BlockLBlock “L”|L|Number of Valid Pages in Block “L”K (L)Associative Degree of Block “L”XAn Unfixed ValueBABlock-AssociativeFAFully-AssociativeKAK-Associative
Data updates in a flash memory may incur invalid pages in data blocks and eventually invoke a garbage collection in the FTL. The garbage collection may lead to merge operations to reclaim the invalid pages. Since the merge operations heavily affect the FTL performance, reducing the number of merge operations is one of the main design concerns for an FTL scheme.
In a log-structure-based FTL scheme, physical blocks of an SSD device are logically partitioned into two groups: data blocks and log blocks. When a write request arrives, the FTL first writes the new data to a log block and invalidates the data in the corresponding data block. Block-mapping information for data blocks and page-mapping information for log blocks are kept in the memory (e.g., RAM) of the SSD device for performance purposes. When the log blocks are full, the data in the log blocks are immediately flushed into the data blocks and erased to free up the log blocks. More specifically, the valid data in data blocks and the valid data in the corresponding log block are merged and written to a new clean data block. This process is referred to as a merge operation. Merge operations can be classified into three types: Full Merge (FM), Switch Merge (SM), and Partial Merge (PM).
A switch merge is triggered when all pages of a victim block (inside the log blocks) are sequentially updated from the first logical page (header) to the last logical page (tail). The FTL erases a data block filled with invalid pages and switches the corresponding log block into the data block. A victim block refers to a log block in a log block area of the SSD that is selected to be merged with its corresponding data block in a data block area.
To perform a switch merge operation, all pages in a victim block are required to be entirely filled and written in a sequential order as the data block (i.e., in-place written from the header to the tail of the block). When a new page comes to the same log block, the FTL triggers the log block to switch (replace) with the corresponding data block and erases the data block. Switch Merge does not involve any data copy, so the copy time for a switch merge is 0. The erase time for a switch merge is 1. Switch Merge is the cheapest merge operation among Switch Merge, Partial Merge, and Full Merge.
A partial merge is similar to a switch merge except for requiring a copy of one or more valid pages from a data block to a log block (victim block) in a log block area. After the one or more valid pages are copied to the log block, the FTL erases the data block, mark the log block as the data block, and assign a new empty block from an empty block list (EmptyBlockList) to the log block area. The FTL performs a partial merge when the log block is written from a header to a middle page in the block that is not the tail (i.e., in-place written but the block is not filled from the header to the tail of the block).
To perform a partial merge operation, a new incoming page should belong to the same log block (i.e., owner of the new incoming page) and be the header of the log block. The FTL copies all the remaining valid pages from the corresponding data block to this log block and erases the data block. The FTL does not write the new incoming header page to the log block because the programmed header page in that log block cannot be rewritten. Instead, the FTL marks this log block as the data block, similar to the switch merge, and assigns a new empty block from the EmptyBlockList to the log block area. Lastly, the FTL writes the new incoming header page to the newly assigned block. The copy time for a partial merge is determined by the difference between the number of pages in a block and the number of valid pages in the same block (NP−|L|), and the erase time for a partial merge is 1, which is the same as the erase time for a switch merge. Although the block-erase time is far greater than a page-copy time, the page-copy time of a partial merge cannot be ignored since the accumulated multiple page-copy time can be significant.
A full merge requires the largest overhead among the three merge operations. The FTL allocates a clean block from the EmptyBlockList and copies all the valid pages from either the data block or the log block into the clean block. After all the valid pages are copied, the clean block becomes the data block, both the former data block and the log block are erased, and a new empty block will be assigned to the log block area from the EmptyBlockList. Therefore, a single full merge operation requires as many read and write operations as the number of valid pages in a block, plus two erase operations.
In a full merge, if a log block of the SSD is not written sequentially from the first page to the last page, the FTL copies valid pages from the log block and its corresponding data block to a newly allocated data block and erases the log block and its corresponding data blocks. The copy time for a full merge is determined by the product of the associativity of block L and the number of pages in the block, i.e., K(L)×NP, and the erase time for a full merge is determined by K(L)+1, where K(L) is for external associated data blocks, and 1 is for the victim log block.
There exist several FTL mapping schemes. Examples of such FTL mapping schemes include, but are not limited to, Block Associative Sector Translation (BAST), Fully-Associative Sector Translation (FAST), Locality-Aware Sector Translation (LAST), and K-associative Sector Translation (KAST). Each of these FTL mapping schemes has advantages and disadvantages compared to other FTL mapping schemes.
BAST uses multiple log blocks to cache incoming write requests. Once every page in a log block is written, the log block replaces the corresponding data block. In this sense, BAST is referred to as a block associated dedicated translation. While FAST, LAST, and KAST can support all of the full merge, the partial merge, and the switch merge, BAST can support only the full merge and the switch merge. BAST may save cost for the switch merge. When intensive non-sequential overwrites for one hot block, or lots of (greater than the number of log blocks in the log block area) blocks occur during a given time window (e.g., cross-block thrashing), BAST can result in increased write operations.
FAST allocates a log block to more than one data blocks to increase the utilization of log blocks. To capture sequential writing streams from a mixed stream of requests, FAST separates the log blocks into a sequential log block (SLB) (block associative) and random log blocks (RLBs) (fully associative). The separation of the log blocks into the SLB and RLBs may help to resolve the thrashing issues. FAST optimizes the merge operations and introduces the partial merge. However, FAST cannot handle more than one sequential stream of requests. In addition, the way FAST determines sequentiality is whether or not a page is a header of a block; (although it is a necessary condition for SM or PM), it cannot cover all sequential cases. It is noted that “whether the workload is sequential or not” and “whether the write stream is starting from a block header” are two different conditions. FAST simply uses the former condition to determine the latter condition. Finally, in terms of cost evaluation, in FAST, full merges need to search for more than one data blocks, slowing down the merge operation.
LAST employs an access detector to detect whether a request is sequential or random, based on the write request size (e.g., threshold=4 KB). Multiple SLBs and RLBs are good for multiple streams. LAST can trigger a switch merge for a log block. A large write has a relatively high sequential locality (but not always). LAST also separates random log blocks into hot and cold regions to reduce the cost of a full merge However, LAST that dynamically changes request streams may impose severe restrictions on the utility of this scheme to efficiently adapt to various workload patterns.
KAST controls the maximum log block associativity to control the worst-case blocking time and increase the performance. In KAST, write requests are distributed among different log blocks, including multiple sequential log blocks. KAST automatically partitions between sequential and random log blocks. However, KAST requires the user to configure the K-associativity, which makes the scheme less stable and reliable.
Table 2 shows the comparison of block numbers of different associative degrees (e.g., block-associative BA, fully-associative FA, and K-associative KA) and types of merges supported by the FTL mapping schemes. These FTL mapping schemes use one or two associativities as indicated in Table 2. For example, BAST uses only block-associative, FAST and LAST use only block-associative and fully-associative, and KAST uses only fully-associative and K-associative. Table 2 also shows the number of different associative blocks in different FTL schemes.
TABLE 2Comparison of FTL schemesBlock NumberFTLof Different AssociativeSchemesBAFAKAMergesBASTNL00FM, SMFAST1NL − 10FM, PM, SMLASTXNL − 10FM, PM, SMKAST0XNL − XFM, PM, SM