1. Technical Field
The present invention relates generally to handling events in an information technology system, and relates more specifically to a system and method for generating throttling parameters from historical event log data.
2. Background Art
In order to ensure that an information technology (IT) infrastructure is operating efficiently, systems must be utilized that look for and report problems or potential problems from IT resources. These problems, referred to generically as “events,” generally comprise a message issued by an IT resource in accordance with some predefined protocol. For instance, an event can occur whenever file system utilization exceeds some predetermined threshold value, e.g., 85%.
One of the most dramatic issues facing implementers of IT infrastructure management systems is event volume. Many managers of IT systems report that the volume of events reaching their event management software exceeds one million or more events per day. One may assume that daily “mega-event” volumes is a normal characteristic of many users' IT operating environment. The result is that these high volumes of events require significant system resources to process. And, importantly, these event volumes significantly impact the response time and efficiency, and therefore the value, of users' event management systems. Accordingly, reducing event volumes is a high priority requirement for IT infrastructure managers.
One of the critical issues faced is the frequency with which individual events or sets of events may be reissued as a result of infrastructure failures. Many IT resources 30 are notorious for repetitively emitting the same event or sets of events tens, hundreds, or even thousands of times within very short time frames. It is not unusual for an IT resource to reissue the same event many times per second, flooding networks, systems, and event management software with a cascade of redundant and therefore unnecessary information. Accordingly, a key to reducing the volume of events flooding the system, and therefore enhancing the efficiency of the system, is addressing and reducing the numbers of redundant events reaching the management platforms.
The term “throttling” refers to the practice of recognizing and filtering redundant events from the event stream. A significant body of throttling logic must be designed and deployed to handle the issue of redundant events. Throttling logic for event correlation engines is notoriously difficult to design. This puts an enormous burden upon the event management design and maintenance process.
Moreover, it is well understood that the IT industry is guilty of forcing upon the user a broad range of proprietary and standardized event protocols, log file formats, and (even within a single protocol) syntax. The variety of formats adopted by event messages adds considerable complexity to the user's event environment and therefore adds to the effort required for “manual” analysis and determination of rules for throttling of redundant events.
To further exacerbate the challenge, the torrent of events generated across the user's IT environment is composed of thousands of unique event types, each requiring unique throttling logic and actions.
To summarize, many IT managers contend with more than a million events per day. Their event streams contain a multitude of differing data protocols and formats. The individual events within these event streams represent thousands of unique event types. Many of these event types are likely to be issued in high volume bursts of repetitive patterns.
The scale and complexity of this environment presents an enormous obstacle to the user when considering the effort required for manual analysis of event throttling parameters. Labor-intensive approaches to the analysis of this mass of event data over any meaningful analytical time frame will not produce significant reduction in event volumes. This environment dictates that the event throttling analysis be supported with intelligent, automated facilities for gigabyte data reduction, repetitive pattern recognition, and throttling parameter analysis.