The present invention relates to topology discoveries, and more specifically to topology discovery for fault finding in virtual computing environments.
There are now a plethora of cloud based software offerings. The underlying technology for these cloud based software offerings are virtual environments such as VMware, Kernel-based Virtual Machine (KVM), XEN and Microsoft Hyper-V (VMware is a trademark of VMware, Inc. and Microsoft and Hyper-V are trademarks of Microsoft Corporation). Standard operating systems may be run on Virtual Machines (VMs). The standard operating systems are in turn used to run applications that implement a range of services. Each VM directly replicates a physical computer but is run under a hypervisor on a physical host machine that can host several VMs. This is done to maximize host machine utilization and increase fault tolerance by running VMs on a cluster of host machines. If one host machine fails then the VMs can be moved, or migrated, to run on another host machine in the cluster. Higher level software is used to automatically provision VMs in order to provide scalable services on demand. However, these higher level tools increasingly rely on the fault tolerance, load balancing and scalability of the underlying virtual environment which must adapt to meet user demands.
Large scale virtual environments often change rapidly. Typical causes of such changes are VM migration between hosts for load balancing, fault tolerance, maintenance, host standby to save power, VM High Availability and the addition or removal of VMs. In a similar way to host migration VMs can also be migrated between the shared data stores they use, although this typically occurs more slowly and less frequently than VM migration. These functions are controlled centrally and tools to automatically adapt to problem conditions talk to a control center as do traditional fault management systems.
By the time a fault is looked at by an operator the virtual environment may have already adapted to compensate for the problem making it difficult to find what caused the problem and to fix it. Further, the relationship between VM faults and the underlying physical problem is often complex and intermittent making it very difficult to find and encode this relationship heuristically. It is not possible to solely rely on a host producing the same type of error as a VM so that the physical error can be made the root cause and the VM error the symptom. Whilst the virtual environment can adapt to faults, if the root cause is not found and fixed then the overall performance of the system will continue to be effected each time a problematic component is utilized in a way that will cause it to fail.