Many computer data storage systems have an integrated key component, referred to as a garbage collector, which is configured to track the validity of data written to the underlying non-volatile memory (NVM) storage media, such as hard-disk drives (HDDs) and solid-state drives (SSDs), and overwrite data that is no longer valid with new, valid data.
For example, when a host computer initially stores a data-object on an NVM storage device, the data-object is marked as valid. When the data-object is updated or deleted from memory by a user or an application being executed on the host computer, the originally stored data-object becomes invalid and the space it occupies needs to be “garbage collected”.
Most storage media units, including HDDs and SSDs, work in fixed block sizes, meaning an entire block of data needs to be written every time. Many storage systems impose an additional, larger block size. These larger blocks, composed of many media-level blocks, are again written in their entirety, and are may be referred to as “write units” (WUs).
When performing a process of garbage collection from WUs, parts of the WU are usually valid, and parts are invalid. The valid parts need to be copied and aggregated into a new WU and the invalid parts need to be discarded.
At its simplest, garbage collection can be described as the following process: (a) a central processing unit (CPU) reads some WUs that are partly valid and partly stale from the underlying data-storage media into memory; (b) the CPU locates the parts that are marked as valid within these WUs, and copies them into a new WU in a memory module associated with the CPU; (c) the CPU writes the new WU into the underlying media units; and (d) the CPU updates the metadata related to the garbage collection (GC) operation, e.g. updating the validity and location of stored data-objects in the NVM storage media.
As known to persons having ordinary skills in the art, the process elaborated above may consume considerable resources (e.g. CPU computation cycles and memory allocation) of the host computer due to the copying of data-objects from the underlying media into a memory space associated with the CPU, identifying valid data-objects, and copying the data from the CPU memory onto the underlying NVM media.
A system and a method for garbage collection that does not require CPU and memory resources from the host computer is therefore required.