Management of a data center involves incident management. As capacity of the data center increases, physical machines, in which various applications, services, operating systems and the like reside, may be distributed at different geographical locations. In this case, remote incident management also becomes an important part of the data center management. When a fault incident happens in the data center, the fault incident will be diagnosed so as to provide a fault solution.
In an existing method for diagnosing fault incidents, an administrator of the data center employs a “test check” methodology to check each application, service and the like in the data center according to log files to find out a cause of the fault incident. However, as a user of the data center, the administrator of the data center can not be fully aware of the deployment in the data center and the dependency relationships between the applications and between the services, which would thus cause problems of low efficient and time-consuming fault incident diagnosis, or even not being able to determine the cause of the fault accurately. Further, for the distributed data center, since the physical machines may be located at different geographical locations, it would cause the time for fault incident diagnosis to be lengthy.
Additionally, when the fault incident happened in the data center, if a topology of the data center has changed, for example, a new virtual machine is created, it is also prone to result in error diagnosis.
Therefore, there is a need for a technical solution for automatically, rapidly and accurately diagnosing fault incidents in the data center.