Businesses and organizations use Information Technology (IT) systems in the operation and management of the business. The delivery of IT business services often involves software applications, middleware, storage, system infrastructure and other managed objects that are closely connected. A problem in one domain can cause multiple failures in other domains, leading to many events about issues that may trigger uncoordinated actions in multiple teams. Businesses attempt to quickly identify the root cause of system failures and ensure that the right team starts fixing the problem as soon as possible.
IT management applications often use event correlation technologies to filter and process incoming events and assist in the identification of relevant events. Root cause analysis can be a competitive differentiator between management software vendors. However, many existing correlation systems are limited to using the information that is contained in the event attributes to identify relevant events. However, current correlation system cannot easily detect or identify causal relationships between events that are originating from different infrastructure elements. To detect and identify causal relationships between events in current systems often involves hard-coding an IT topology into correlation rules to represent how particular instances of managed objects are related to one another. For example, a rule in such a correlation system may define that a quota problem on logical volume instance “LV-ESS-1” may cause an extension problem on tablespace “TS-ESS-1”. The rule includes information on the related infrastructure elements of the managed environment. The inclusion of this information in the rules can lead to significant costs in maintaining the rules. Further, such an approach is not flexible enough to allow efficient handling of infrastructure changes that occur in the domain. Some systems involve virtualized IT environments which may undergo significant and/or frequent changes. Building correlation rules to address such a system can be time consuming and may be difficult to accomplish under current systems. Often businesses use event correlation specialists to address system correlation issues, which further increases the cost.
Accordingly, businesses desire the ability to quickly identify root causes of system failures in order to quickly begin addressing the failures while reducing the time, cost, and complexity of causal analysis systems. There is a desire for a system capable of correlating events in a way that is more easily adaptable to a dynamic IT environment and is also maintainable with minimum effort by domain experts rather than correlation specialists.