Content Addressed Storage (CAS) systems store data objects referred to as Binary Large Objects, (BLOBs), which are referenced by data objects of a different type, referred to as C-Clip Descriptor Files (CDFs). In general, with CAS, a hash computation of the object's content provides the address for accessing the data object's content, with the address stored in a directory or the like for subsequent access.
One cloud-based storage system that supports CAS is Dell EMC® Elastic Cloud Storage (ECS™). ECS™ is a cloud storage technology that assures strong consistency of user data, including by supporting geographically distributed setups (sometimes referred to as “GEO” for short) comprising two or more zones, in which each zone is normally an ECS™ cluster. Each data object (e.g., including a BLOB or a CDF) is owned by exactly one zone. User data, the related user metadata, and some system metadata are synchronized between zones by means of asynchronous low level (that is, not object level) replication. This allows access to an object even when the zone that owns the object is not available because of temporary zone outage.
There are problems with garbage collection of BLOBs in a geographically distributed storage system. These include garbage collection issues related to temporary zone outages, and garbage collection issues related to ensuring that data loss cannot occur.