Computer systems, such as servers and desktop personal computers, are expected to operate without constant monitoring. These computer systems typically perform various tasks without the user's knowledge. When performing these tasks, the computer system often encounters events that require a particular action (such as logging the event, generating an alert for a particular system or application, or performing an action in response to the event). Various mechanisms are available to handle these events.
A computing enterprise typically includes one or more networks, services, and systems that exchange data and other information with one another. The enterprise may include one or more security mechanisms to safeguard data and authenticate users and may utilize one or more different data transmission protocols. At any particular time, one or more networks, services or systems may be down (e.g., powered down or disconnected from one or more networks). Networks, services or systems can be down for scheduled maintenance, upgrades, overload or failure. Application programs attempting to obtain event data must contend with the various networks, services, and systems in the enterprise when they are down. Additionally, application programs must contend with the security and network topology limitations of the enterprise as well as the various protocols used in the enterprise.
Operating system components, services, and applications generate a variety of different events. A particular component or application may request to be informed of a particular event (e.g., when a server crashes or when a user logs on to the system). Other components or applications may want to be notified when a particular series of events occur within a particular time period. For example, a network administrator may want to know when a server crashes within three seconds of a user logging into the system. Server crashes alone may be relatively common and user logins may also be common such that the network administrator is not particularly interested in either event by itself. However, when these two events occur within a few seconds of one another, there may be a relationship between the two events (e.g., the user login was at least partially responsible for the server crash).
Existing systems provide predefined functions that allow a network administrator or other user to create a relationship between two events. This relationship between two events is commonly referred to as a “correlation” between the two events. The predefined correlation functions provided by existing systems require the user to select from one of the predefined functions. If the correlation function desired by the user has not already been created, the user must request that the developer or supplier of the functions create a new function to meet the user's needs. If the developer or supplier is willing to create a new correlation function, this custom development work may be very expensive. Depending on the expected demand for the new correlation function, the developer or supplier may not be willing to create the requested function.
If the developer is unwilling to create a new correlation function or the cost is too high, the user can attempt to use an existing correlation function that is “closest” to the user's requirements. Such a solution may result in a significant number of unwanted event notifications or may result in a failure to notify the user of a desired sequence of events.
The system and method described herein addresses these limitations by providing a flexible correlation system and method that allows a user to correlate multiple events and/or data.