Computing environments are frequently supported by block-based storage. Such block-based storage is increasingly provided by solid state drives (SSDs). SSDs provide a block-style interface, making it easy to integrate these drives into systems that have traditionally relied on hard drives and other block storage devices. SSD drive manufacturers incorporate a controller which provides the block-style interface and which manages data storage and mapping. For example when a read or write request is received, it may include a logical block address associated with the request. The controller may determine a physical location on the SSD from where the data should be read, or to where the data should be written. The controller may also manage data storage on the SSD to improve the longevity of the device and manage other flash-specific functions. However, while the drive manufacturer may provide a controller that is suitable for the average customer, such controllers may not provide sufficient flexibility or customizability for all users and applications for which such drives may be deployed.
Data on flash-based drives, unlike magnetic and other media, cannot be overwritten. Instead, the part of the drive, which has been written on previously, has to be erased before data can be written to it again. Typically, the smallest unit of an SSD that can be erased is a physical block, which is larger than the smallest unit of an SSD that can be written, which is typically a page. Since an entire block is erased at a time, before the block can be erased, a determination needs to be made as to whether all data in the block can be deleted. For example, if all data on the block is invalid (that is, newer instances of the same LBAs exist elsewhere), then the block can be erased. However, if some of the data is still valid, it needs to be relocated before the block is erased or it will be lost. Garbage collection, or recycling, refers to this process of identifying valid and invalid data and relocating data as needed to erase blocks.
Garbage collection traditionally has been performed by reading an entire block being garbage collected to determine what data needs to be relocated. This can reduce performance experienced by an end user by increasing the numbers of reads and writes to the drive by the garbage collector. It may also reduce the longevity of the drive by contributing to write amplification.