1. Field of the Invention
The present invention relates to systems and methods for data processing, and in particular to a location-independent service for monitoring and alerting on an event log.
2. Description of the Related Art
Computer systems typically write messages describing errors and state changes to system event logs. However, a number of problems exist with system event logs.
These logs are often lengthy and contain very little information that is truly of interest to those who manage such systems. For example, to detect that a host system is behaving in a manner requiring the attention of an administrator, a large quantity of warning and error messages may have to be reviewed to find conditions truly worthy of attention.
In yet another example, while many events of either type A or type B may occur in a specified time period, it may only be of interest when an event (or specified number of events) of type A and type B occur in the same period, and prior art systems cannot identify such situations. Moreover, users are not automatically alerted of important events as they appear in the log.
Furthermore, users may not have access to the host computer on which the system event log is stored. Additionally, systems administrators may want to minimize work executed on the host computer.
Thus, there is a need in the art for the improved processing of system event logs that overcomes these and other problems.
To address the requirements described above, the present invention discloses a method, apparatus, and article of manufacture for monitoring and alerting on an event log. One or more alert policies is accessed, wherein each of the alert policies is comprised of one or more rules stored on a computer. An event log stored on a computer is accessed in a location-independent manner to gather one or more event messages stored therein. The event messages are filtered by comparing them to the rules of the alert policies to raise an alert and determine whether an alert action should be invoked.
Each rule includes one or more defined criteria selected from a group comprising one or more Event IDs, an Event Period, an Event Count, an Alert Any flag, and a Search Phrase. Event IDs are identifiers that semantically identify the event message. The Event Period indicates a duration within which the event messages must occur in the event log for an alert to be raised and an alert action to be invoked. The Event Count indicates a count of the event messages that must occur within the event log within the Event Period and corresponding to the Event IDs to raise an alert and trigger an alert action. The optional Search Phrase allows the user to specify a word or phrase that must be included within the text of a matching error message (where the match is based on Event ID) in order for that message to be counted.
The Alert Any flag determines whether or not there must be at least one occurrence of each and every Event ID that is specified by the rule (there may be multiple Event IDs). For example, the Alert Any flag may indicate that an alert is to be raised when the count of event messages in the event log equals or exceeds the Event Count within the Event Period for any combination of qualified Event IDs. Alternatively, the Alert Any flag may indicate that an alert is to be raised when the count of event messages in the event log equals or exceeds the Event Count within the Event Period and there has been at least one occurrence of each Event ID in the event messages.
Each rule of an alert policy also specifies one or more Alert Actions. The Alert Action specifies what is to be done when matching errors exceed the specified limit. For example, an Alert Action may comprise: sending an email to a user-defined address, sending a page with a user-defined message to a user-defined paging service, generating a trap, running a user-specified program, writing a message to log, and executing a script of database commands. Each Alert Action may comprise a single defined action, or may comprise a plurality of individual actions. In addition, the Alert Action may specify a period that must expire before the Alert Action is repeated for the same event. These aspects of the alert policies are defined using an alert policy editor.