1. Field of the Invention
Embodiments of the invention generally relate to computer systems. More specifically, embodiments of the invention provide techniques for troubleshooting computer systems using fault tree analysis.
2. Description of the Related Art
Modern computing systems are complex and may include multiple software applications working as an integrated system. For example, a commercial website may be provided by multiple servers, with each server including an operating system (OS), a web server, and other software applications (e.g., a database and/or an application server).
A computing system may suffer fault events, or errors that affect the functioning of the computing system. Such fault events may be resolved by troubleshooting, referring to a process of identifying and resolving the causes of the error. However, troubleshooting fault events in a complex computing system may be a difficult and time-consuming administration task. Further, if a fault causes the computing system to be unavailable, the time required to correct the fault may lead to interruption of critical functions of an organization (e.g., a temporary shut-down of a business, government agency, etc.). In such cases, a system administrator may seek to rapidly restore the computing system by applying a tactical solution that addresses the symptoms of the fault. However, if the root causes of the fault are not addressed, the fault may cause the system to fail again. One technique for identifying root causes of system faults is fault tree analysis (FTA). As is known in the art, FTA is a technique of applying expert knowledge in successive refinement for determining a root cause.