The present invention relates to diagnostic problem resolution in computer systems.
Generally, information technology (IT) products and service offerings, such as cloud computing offerings, increasingly depend on more complex and integrated backend systems. For example, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS) and Data as a Service (DaaS) are forms of cloud computing that may integrate multiple internal and/or external distributed computing systems and applications from different vendors to deliver more dynamic and content rich services. Traditionally, end users and/or monitoring systems may report system or application problems associated with the integrated IT systems, and system administrators may be notified of the problems. Furthermore, the system administrators may launch an investigation to identify and resolve the system or application problems. In response to the system administrators not finding the cause and not resolving the problems, the system administrators may engage vendors of the affected systems and/or applications to find solutions.
Although diagnostic tools developed by IT companies can be used to automatically collect and visualize diagnostic information, many stages in the above mentioned troubleshooting process needs to be done manually in a heterogeneous distributed IT system, partly due to the fact that these diagnostic tools developed by different vendors may not be able to interact with each other. It leads to the long and tedious problem investigation process. Distributed systems with a significant number of interdependent subsystems and components, such as ones we may find in a cloud computing environment, will exacerbate this problem.
Some diagnostic tools with features like symptom database and diagnostic rule engine have been developed. However, due to the limited number of rules manually created by the product vendors, few of them has been widely used for troubleshooting issues of the complex enterprise IT systems.
The call stack comparison between a newly reported problem and the known issues has been proposed to detect whether the new problem matches an existing one. The algorithm used to conduct the comparison is simple and flawed. It is also limited to a stand-alone system or a product in a homogeneous environment.
There is a need in the art for improved methods and techniques to troubleshoot technical issues in a complex and distributed cloud computing environment.