Distributed systems allow multiple clients in a network to access a pool of shared resources. For example, a distributed storage system allows a cluster of host computers to aggregate local disks (e.g., SSD, PCI-based flash storage, SATA, or SAS magnetic disks) located in or attached to each host computer to create a single and shared pool of storage. This pool of storage (sometimes referred to herein as a “datastore” or “store”) is accessible by all host computers in the cluster and may be presented as a single namespace of storage entities (such as a hierarchical file system namespace in the case of files, a flat namespace of unique identifiers in the case of objects, etc.). Storage clients in turn, such as virtual machines spawned on the host computers may use the datastore, for example, to store virtual disks that are accessed by the virtual machines during their operation. Because the shared local disks that make up the datastore may have different performance characteristics (e.g., capacity, input/output operations per second or IOPS capabilities, etc.), usage of such shared local disks to store virtual disks or portions thereof may be distributed among the virtual machines based on the needs of each given virtual machine.
This approach provides enterprises with cost-effective performance. For instance, distributed storage using pooled local disks is inexpensive, highly scalable, and relatively simple to manage. Because such distributed storage can use commodity disks in the cluster, enterprises do not need to invest in additional storage infrastructure. However, one issue with such a distributed system is in failure recovery for nodes that return to the cluster after being offline for a period. For example, if a cluster node goes offline (e.g., due to a power outage), active and visible nodes in the cluster still perform regular transactions as designed, but one consequence of this is that if the offline node returns to the cluster, the node and corresponding resource component objects of the node are not up-to-date with the current state of the cluster and the operations previously performed on the component objects. In that state, the previously offline node is unusable in the cluster, which is ultimately inefficient because the distributed resources system is not using all of the resources available in the cluster.