Enterprise storage systems currently available are proprietary storage appliances that integrate the storage controller functions and the storage media into the same physical unit. This centralized model makes it harder to independently scale the storage systems' capacity, performance and cost. Users can get tied to one expensive appliance without the flexibility of adapting it to different application requirements that may change over time. For small and medium scale enterprise, this may require huge upfront capital cost. For larger enterprise datacenters, new storage appliances are added as the storage capacity and performance requirements increase. These operate in silos and impose significant management overheads.
These storage systems either build storage systems as in-place filesystem (where data being overwritten in place), log-structured (where data being written is redirected to a new location) or copy on write (where the data is written in place, but a copy of the original data is written to new location). In all of these approaches, cleaning of up data to reclaim space, that was generated either by invalidation of old data by new writes or user triggered deletes, poses a challenging problem.
In addition, storage systems build a reference counting mechanism to track data accessible by the user. Whenever a data block or segment reaches a reference count of 0, it becomes a viable candidate for reclamation. That approach is efficient on a single node where there is no requirement to coordinate the reference count on a datablock. However, this mechanism becomes a challenge in a distributed multi-node environment.