The present disclosure relates to a distributed data storage system. In particular, the present disclosure relates to distributing data in the distributed data storage system using a hierarchy rule that is generated based on a spreading policy and a set of tolerable failures specified by a user without knowledge of the deployment of the distributed data storage system.
There are many storage systems for storing data in a reliable way, for example, using redundancy. Some data distribution algorithms used in the storage systems even allow a user to define a protection level by describing what kind of failure scenarios can be tolerated, such that data can still be recovered even after such a failure occurs. However, such description is impacted by the layout of the storage system and may cause the storage system to malfunction when entities (e.g., data storage devices, storage nodes, racks, data centers) are added to or removed from the system. For example, if a new entity is added to the system, every write of new data and every reconstruction of old data (e.g., the data already stored in the system) would cause the data to be spread out over all entities of the system including the newly added entity. This is problematic because the new entity and the old entities are not the same size, and therefore the benefits of these entities are not the same. This is further problematic because the old data may have a different protection guarantee than the new data.