1. Technical Field
The present invention generally relates to the field of data storage and, more particularly, to reclaiming data storage space.
2. Background Information
As the volume of data stored each year grows, a multitude of costs associated with maintaining this data also grows. The cost of purchasing and powering storage devices is just a fraction of the total cost of ownership. To achieve the reliability, dataset sizes, and performance demanded by modern big data applications, thousands of such devices must be interconnected and managed by complex data storage systems. The costs to purchase, install, and maintain such systems dominate the overall cost of storing any given unit of data. Reclaiming space used to store obsolete, unreferenced data (aka, garbage collection) is an important technique for controlling the growth of storage costs.
Traditional data storage systems eagerly delete from the underlying backing store in response to a user-level delete. Relaxing this requirement can lead to improved performance and simplified design across a wide range of data storage systems, from individual hard disks and SSDs to storage arrays and distributed file systems. Yet data storage systems that defer deletion can accumulate garbage data that is no longer referenced; such garbage consumes storage capacity and decreases throughput. If left unchecked, the cost overhead and performance degradation will become substantial.
Thus, better garbage collection techniques that can bound the total amount of waste while incurring minimal maintenance overhead are important to data storage systems.