In data storage systems space is allocated for storing a primary set of user data. Additional storage space is allocated for providing data protection for the primary set of data. For example, data protection can include mirroring to generate a backup copy of the primary data. The backup copy provides protection against data loss in the event of primary data failure.
In geographically distributed data storage systems, data protection can include replication to generate copies of primary and backup data and stored independently to provide additional protection.
The amount of additional storage space needed for data protection varies over time. Allocating too much or too little risks data loss, inefficient storage utilization and/or an increase in the cost of storage. Because providing data protection can be costly in terms of storage capacity and processing requirements, large-scale data protection for distributed data storage systems requires complex software architecture and development to achieve outstanding availability, capacity use efficiency, and performance.
The Dell EMC® Elastic Cloud Storage (ECS™) distributed data storage solutions employ data protection methodologies that minimize capacity overhead while providing robust data protection. Among other innovations, rather than relying on a conventional file system, ECS™ partitions disk space into a set of blocks of fixed size called chunks to help manage disk capacity, ranging in size from 64 MB to 128 MB. All user data is stored in these chunks and the chunks are shared in that one chunk may (and, in most cases, does) contain fragments of several user objects. Chunk content is modified in append-only mode. When a chunk becomes full enough, the chunk is sealed. The content of sealed chunks is immutable.
Storing user data in chunks allows the use of other techniques to minimize capacity overhead while providing robust data protection. For example, for geographically distributed storage, ECS™ provides additional protection of user data with geo-replication, also referred to as GEO data protection, in which replication is performed at the chunk level and geographically distributed. Among other techniques, to minimize the storage capacity overhead associated with GEO data protection, ECS™ uses an exclusive or (XOR) operation to reduce the impact of replicating chunks.
Notwithstanding the advancements achieved using ECS™ data protection for geographically distributed storage, large cloud-scale data storage systems continue to present new challenges, including reducing the capacity overhead associated with data protection and improving the ability to recover from complex failures of storage infrastructure.