Existing systems use virtualization to share the resources of a modern datacenter. The datacenter may have a wide range of hardware components such as servers, storage devices, communication equipment, and the like, organized into clusters. Virtualization of the datacenter allows multiple guest operating systems to run in virtual machines (VMs) on a single host, sharing the underlying physical hardware of the host as well as sharing access to a datastore accessible to the host.
Some existing system monitor for host level failures and storage component failures, such as All Paths Down (APD) or Permanent Device Loss (PDL), in some of the clusters. In the event of such a failure, remediation may occur to restore functionality.
However, the existing systems lack a reliable and fast mechanism for detecting failures in host and VM level networking components. For example, VMs are typically configured to use virtual networks configured across hosts. If virtual network connectivity to a certain gateway or to a specific Internet Protocol (IP) address fails because of hardware issues, software networking configuration issues, or network outages on a router or switch connecting the hosts, the VMs on that virtual network (as well as the applications running in those VMs) experience network outages. The existing systems lack reliable and fast remediation workflows for network component failures that occur within a cluster for which high availability is desired.