1. Technical Field
The present invention is generally directed to an improved data processing system. More specifically, the present invention is directed to a system and method for using root cause analysis to determine the dependencies of resources in a complex data processing system to thereby generate a representation of the resource dependencies.
2. Description of Related Art
Enterprises employ large, complex, computing environments that include a number of enterprise components, e.g., servers, routers, database, mainframes, personal computers, intelligent agents, business applications, etc. Systems that monitor complex enterprise computing environments are generally known in the art and may rely on enterprise components generating and reporting events when they experience problems, e.g., a disk crash, server crash, network congestion, database access failure, etc. However, when a first enterprise component experiences a problem, the problem may have a ripple effect that causes other enterprise components to experience problems. Therefore, a conventional monitoring system receives enterprise events from enterprise components where many of the events are symptomatic events, i.e. generated and/or reported as a result of other, more fundamental problem events, rather than being root cause events, i.e. fundamental problem events. Distinguishing between symptomatic events and root cause events has historically been difficult, requiring skilled operators and significant time commitments.
Relationships and dependencies existing between hardware and software components within an enterprise computing environment lead to a single root cause producing symptomatic events that may confuse operators and delay the identification of, and therefore, the resolution of, the root problem. For example, a software component like a database management program depends on at least two hardware components, like a processor and a disk, to perform database management functions. Therefore, if either the disk or the processor experiences a problem, in addition to the disk and/or processor generating and reporting enterprise events, e.g., a disk write failure, the database management program is likely to generate and report enterprise events when database access attempts fail, e.g., a database write failure.
Systems and methodologies have been devised for determining the root cause of problems in complex data processing systems. For example, International Patent Publication No. WO 03/005200 A1 to Howell et al., entitled “Method and System for Correlating and Determining Root Causes of System and Enterprise Events,” published Jan. 16, 2003, describes a method and system for correlating and determining root causes of enterprise events. The system and method described therein distinguishes between symptomatic events and root cause events based on the system's ability to establish a set correlation rules. The system includes a root cause determiner that receives events and initializes a timer that determines a period of time during which related events will be collected. Once the period of time has expired, a root cause determination can be made based on the set of collected events and correlation rules affected by such events.
As illustrated in the above exemplary known event management system, event management systems require that correlation rules be defined by the user so that root cause analysis can be performed. For example, with known event management systems, one event (E1) will be generated that states that DB2 has a problem on system ABC and another event (E2) may be generated that says that system ABC is unavailable. The event management system will get both events and may execute a pre-established user generated correlation rule that states that a system failure event is the root cause for any application failure events on the same system. As a result, root cause analysis will determine that system ABC is the root cause of the problem while DB2 having a problem is a symptomatic problem of system ABC being unavailable. This is but a simple example of the types of correlation rules that may be utilized. Much more complex rules may be created as the complexity of the system being managed increases.
Understanding the interaction of enterprise components is important in other management applications in addition to root cause analysis. By understanding the relationships between enterprise components, i.e. resources, the management system can better portray the existing computing environment to the user. Therefore, it would be beneficial to have a system and method for obtaining enterprise component relationship information for use by other management applications.