The disclosure generally relates to the field of data processing, and more particularly to data storage and recovery.
In distributed data storage systems, various methods can be used to store data in a distributed manner, e.g., to improve data availability, reliability, protection. Erasure coding is one such method of data protection in which a data object is broken into fragments, encoded with parity information, and stored across a set of storage nodes in the distributed data storage system. When a data object is erasure coded, the distributed data storage system stores the storage information in metadata. This metadata can include identities of the storage nodes that store each fragment of the encoded data object. The metadata may be maintained in a distributed database that is stored across storage nodes in the distributed data storage system.
Erasure coding involves transforming a set of k fragments of a data object into n erasure coded fragments by using the k fragments to generate m parity fragments, where n=k+m (often referred to as k+m erasure coding scheme). Some examples of k+m erasure coding scheme include 2+1, 6+3, and 8+2 erasure coding schemes. A data object can be rebuilt using a subset k of the n erasure coded fragments. If the number of available fragments is less than k, then the object cannot be recovered.