A networked computer environment typically comprises multiple interconnected elements, such as UNIX.RTM. and Windows NT.RTM. platforms. As real or potential problems occur with these networked computer elements, alarm messages are generated and sent to a centralized operations system(s) within the network for analysis. Alarm message can be, for example, generated autonomously by the affected element or generated in response to queries.
Upon receiving alarm messages, the alarm messages typically are displayed to at least one system operator. The system operator then interprets the alarm messages, isolates the corresponding event causing the alarm within the environment and resolves the event, all within the shortest time possible. The system operator can then consider the next alarm message.
Some types of networked computing problems, however, can generate multiple alarm messages for a single event. In such situations, the system operator may not be able to determine which alarm messages are associated with single networked computing events; the system operator can be overwhelmed by the high number of seemingly unrelated alarm messages. Due to varying levels of operator experience in dealing with networked computing problems, the problems can go undiagnosed or improperly diagnosed, and more time than is necessary elapses to solve the problems. Such delays can result in a costly waste of networked computing resources and availability.
Efficiency can be improved by automating a process by which related alarm messages are correlated to identify particular networked computing problems. For example, U.S. Pat. No. 5,388,189 by Kung and issued on Feb. 7, 1995 uses an expert system having a flow-chart-based knowledge representation scheme with a user interface and inference engine. The Kung system, however, suffers the shortcoming that is a very complex and expensive to implement.