Virtualization provides datacenters with highly efficient and available resource, networking, and storage management to reduce infrastructure costs such as capital, power, space, cooling, labor, and the like. In particular, virtual datacenters can have numerous host machines each executing thousands of virtual machines (VMs) or other guest operating systems. In such virtual datacenters or other shared storage systems, multiple hosts may share the same set of storage devices. Each storage device may have one or more arrays of disks. When one of the disks in one of the arrays experiences a failure (e.g., a hardware failure), numerous hosts and VMs may be affected. In such instances, some of the existing systems failover the entire array (including VMs and datastores) to a backup or redundant array.
Further, hardware failures often cascade such that a single disk failure in a single array may spawn multiple additional failure events related to the original disk failure. As such, the existing recovery systems have to process numerous failure events around the same time. However, the existing systems lack a mechanism for recognizing that some of the failure events may be related to an original failure event. As such, to preserve data consistency and reduce disruption to end users, the existing systems process the numerous failure events serially or otherwise end-to-end, such that recovery for one of the affected arrays begins only after completion of recovery for another one of the affected arrays. As such, with the existing systems, the recovery time resulting from hardware failures can be excessive.