Several of the disclosed embodiments relate to distributed data storage services, and more particularly, to storing data in a distributed data storage system using virtual chunk services.
In distributed data storage systems, various methods can be used to store data in a distributed manner, e.g., to improve data reliability, protection. Erasure coding is one such method of data protection in which a data object is broken into fragments, encoded with parity information and stored across a set of different storage nodes in the distributed data storage system. When a data object is erasure coded, the distributed data storage system has to typically store the storage information in its metadata. This metadata can include identities of the storage nodes that store each fragment of the encoded data object. When a storage node in the distributed data storage system fails, all the objects that were stored in that storage node have to be discovered and repaired, so that the reliability is not compromised.
For recovering the lost data, the distributed data storage system may have to go through the metadata of all the data objects to identify the data objects impacted by the failed node. Then alternate nodes are selected to move the fragments. After the fragments are moved, the metadata of each moved object should be updated to reflect the new set of storage nodes that the fragments of the objects are stored in. This approach can be resource intensive and can have the following performance bottlenecks: (a) metadata query for each object to find if it is impacted and (b) metadata update for each impacted object after repair due to node or volume loss. This can be a resource intensive process as the distributed data storage system can have a significantly large number of data objects, e.g., billions of data objects. Further, reading such significantly large number of data objects to identify a subset of them that are stored on the failed node, which can be a small the fraction of entire number of data objects is inefficient. In a system with billions of data objects, with each node storing millions of fragments, both these can cause serious performance issues for the recovery process.