In a deduplicating system, data is broken up into segments, and if the segment is already stored on the system, a reference to the already-stored segment is stored instead of storing the segment again. Segments may be stored in containers as the unit of storage in the system, and may be stored immutably as each segment is unique in the deduplicating system.
Garbage collection in a deduplicating system comprises determining and/or reorganizing containers that have few or no references to alive segments to reclaim disk space for the deduplicating system. Throughout this specification “alive” data refers to data being actively used/stored by a user, system and/or administrator. Deleted data refers to data no longer being referenced/wanted by said user, system and/or administrator.
There exists a need to reclaim the disk space through garbage collection efficiently.