The use of virtual machines (VMs) to improve the use of computing resources continues to increase. Such VMs can be characterized as software-based computing “machines” implemented in a virtualization environment comprising various hardware resources (e.g., CPU, memory, etc.). The VMs can operate based at least in part on the computer architecture and/or functions (e.g., operating system) of a real or hypothetical computer. Multiple VMs can operate on one physical machine (e.g., computer), with each VM sharing the resources of that physical computer across multiple environments. Various VMs can run multiple operating systems and/or multiple applications on the physical computer. Such flexibility can be facilitated at least in part by a hypervisor, which hypervisor allocates hardware resources dynamically and transparently.
The high storage I/O demand of VMs has precipitated an increase in distributed storage systems implemented in the virtualization environments. Specifically, such distributed storage systems can aggregate various physical storage facilities to create a logical storage pool throughout which certain data may be efficiently distributed according to various metrics and/or objectives. Metadata describing the storage pool and/or its virtualized representations may be also distributed any number of times among various nodes in the distributed storage system. Users of distributed storage systems have a data consistency expectation (e.g., “strictly consistent”) of a distributed storage platform to provide consistent and predictable storage behavior (e.g., availability, accuracy, etc.) for data and/or metadata. Distributed storage platforms can address such expectations by implementing a replication policy to facilitate data redundancy and/or availability in case of a node and/or a disk failure. For example, a given replication policy might be described at least in part by a numeric replication factor (RF) such as “RF=3”, indicating that three replicas of certain data (e.g., metadata, user data, etc.) may be distributed among various available nodes in the network topology.
Unfortunately, legacy techniques for implementing replication policies in distributed storage platforms can be limited at least in their ability to be aware of availability domains. A replication policy implementation that is availability domain aware, also referred to as block aware or rack aware, is one that remains compliant upon failure of any one availability domain, which availability domain might be defined by a boundary that includes a certain set of physical and/or virtual components (e.g., one or more nodes, blocks, hosts, sites, appliances, racks, data centers, etc.). If the replication policy is violated upon failure of the availability domain, the implementation is availability domain unaware or block unaware. For example, if an RF of three (e.g., RF=3) is specified for a given replication policy and an availability domain failure results in two of the three replication nodes to fail, the replication policy will be violated.
Some legacy replication policy implementation techniques, for example, might facilitate selecting the replication nodes randomly or are gnostic to availability domain associations. In such cases, various availability domain failures can precipitate replication policy violations. The frequency of such violations can increase as the number of nodes and/or the RF increases. Other legacy techniques might decrease the RF in the replication policy at the risk of data inconsistency and/or data loss. Further, more availability domains (e.g., hardware appliances, hosts, racks, sites, data centers, etc.) might be added to reduce replication policy violations, imposing a significant hardware, facility, and/or implementation expense. For highly scalable and active distributed computing and storage systems having dynamic node topologies (e.g., node count, node allocation, etc.), the foregoing legacy techniques can be limited at least as pertains to ongoing maintenance of compliance to an availability domain aware replication policy.
What is needed is a technique or techniques to improve over legacy and/or over other considered approaches. Some of the approaches described in this background section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.