The present invention relates generally to management of distributed systems and more specifically to visualizing and analyzing events with disparate formats and with patterns on different time scales.
As networked systems and applications became increasingly critical to the success of a business, effectively managing networked systems and applications becomes extremely important. In order to monitor networked systems and applications, a system manager (or a user) needs to monitor critical activities of systems and applications. One commonly used approach is to set up the monitored system or application to generate an event message when an important activity happens. This message, which can be generated by a device, an application or a system, is usually saved in a log file. For example, a router keeps a router log to track the status of each port and slot in the router. A networked system records important network activities, such as xe2x80x9ccold start,xe2x80x9d xe2x80x9crouter port up,xe2x80x9d xe2x80x9clink up,xe2x80x9d etc., in a system log file. Other examples include DHCP (Dynamic Host Configuration Protocol) servers, Lotus Notes Servers, and Web Servers.
Because logs record critical activities, they are very important for managing systems, devices and applications. However, extracting information from a log file is often difficult. First, log files are of ten quite large, often containing thousands of events per day. Indeed, a web server can serve millions of requests every day, and for each, there may be a log file entry.
A second difficulty with processing log files is that entries have different formats. For example, a HTTP (HyperText Transport Protocol) server log contains byte counts requested by a user. A router log reports events for each port and slot. As a result, an unstructured textual format is used by most log files. Although the textual format is flexible in length, format and meaning, it hinders the use of many analysis tools which can only analyze structured numerical data. To deal with this difficulty, a parsing mechanism is typically introduced to translate a textual message into structured data for analysis. However, there remains the problem of determining the parsing rules. Because a textual message contains a variety of information, the process of defining parsing rules has an interactive and iterative nature. That is, a user may want to see additional information as he analyzes the data Conventionally, this iterative process is done manually in an ad-hoc fashion.
Third, log entries have varied information content. Some entries may include information about thresholds (e.g., if the event reported is a threshold violation). Other entries may report information specific to events, such as the port on which a disconnect occurred. This situation creates the following quandary. To provide uniformity in the analysis of log data, one needs to have common information for all events. Yet, to extract the full scope of information from event logs, variability in information content must be allowed.
A consequence of the foregoing observations is that event data needs to be viewed in many ways so as to account for its diversity of formats and content.
A first viewing approach, and the most commonly used approach, is to view the raw log file via a text editor. In this way, a user reads event messages line by line, and can read each message in detail. Clearly, this approach places emphasis on each individual event message. Although this is an important step for understanding event messages and diagnosing a problem, this approach can hardly be used to analyze the relationship among events. For example, an event pattern may be that a group of events happen periodically within a one hour period. In addition, this approach is not very efficient and effective for analyzing a large volume of events which may easily consume several megabytes per day.
A second class of viewing approaches is to aggregate events and analyze summary information. Summarization is one popular technique in this class. That is, event counts are calculated and reported by defined categories, such as event counts by hostname, by event types and by time. Clearly, through summarization and categorization, a large volume of original textual data is reduced to a small amount of summarized numbers for each defined categories. This greatly improves the efficiency and eases the scalability issue of the first approach. However, summarization loses details of the original event messages. This is because the summarization depends on the defined categories. The information which is not categorized is therefore invisible. In addition, information of event patterns (e.g., event A happens periodically) and relationships among events (e.g., a group of events tends to happen together) are lost because of aggregation.
A third viewing approach is to use graphical displays, referred to as event plots. One example of a two-dimensional event plot may be a plot in which an event message is represented by a point whose horizontal axis corresponds to time of the event and vertical axis corresponds to host ID of the event. In this way, thousands of events can be displayed in one screen, and the relationship among events can be visually apparent. However, this approach can not reveal the detailed information which was not parsed from a message.
Thus, there are three conventional, but mutually exclusive, different ways to analyze event logs. Each of them has its own advantages. Directly reading the textual messages provides the most detailed information of event messages. The aggregated event analysis provides a nice scaling property and shows summarization. The event plot can reveal event patterns and relationship among events.
Most available products for analyzing a log file specialize on one type of log file. For example, there are many products on the market aiming to analyze HTTP log files (see Web Trend http://www.webtrends.com; Hit List:http://www.marketwave.com; ARIA: http://www.ANDROMEDIA.com; and Web Tracker: http://www.cqminc.com. All of these special log analyzers only support summarization analysis. None of them can be used to visualize event messages and/or see original messages.
On the other spectrum, there are many general graphical tools, such as Diamond, Data explorer, SAS, PowerPlay, etc. These tools aim to support either graphical analysis of numerical data, such as Diamond, Data explorer, SAS, etc., or aggregated level summarization such as PowerPlay and other OLAP (On Line Analytical Process) products. However, none of them provide both types of analysis. In addition, these tools usually only take structured data as inputs and can not handle textual data directly.
Therefore, it would be highly desirable to provide systems and methods which integrate two or more of these different analysis approaches, thus providing a user with the capability and flexibility to perform multiple types of analysis on raw data for event management purposes.
The present invention provides systems and methods for providing exploratory analysis of data for event management. In one aspect, the invention provides for a methodology and related system referred to hereinafter as an xe2x80x9cevent browserxe2x80x9d that provides an integrated environment for analysis of a large volumes of semi-structured or non-structured data, such as event logs.
In an illustrative embodiment of the invention, the event browser advantageously provides: (1) scalable analysis of large volumes of unstructured data with diverse content and data formats; (2) an architecture to support multiple types of views and analyses of such data; (3) mechanisms to support the iterative refinement of the information in the raw data that is included in the visualization and analysis environment; (4) several specific viewers for analysis of event data.
An event browser of the invention is implemented in a form which includes certain functional components. These components, as will be explained, may be implemented as one or more software modules on one or more computer systems. To deal with textual messages directly, the event browser of the invention integrates a parsing mechanism or engine and an analysis tool in one package. The role of the parsing engine is to translate an event message into a set of attribute values defined by parsing rules. For example, if parsing rules define information about host name, event type and time stamp, an event message is translated into a tuple of {host name, event type, time} through the parsing. In other words, the parsing engine translates semi-structured or non-structured textual data into structured data. The analysis tool, therefore, does not need to worry about the detailed message format, and can focus on analysis and the GUI (graphical user interface) to an end-user. As a result, the detailed textual format of a log file is preferably hidden from an end-user, until he wants to see it. This allows users to analyze log files with different formats in a unified and simple way.
Graphical visualization is a powerful tool for analyzing data in general. Although graphical representation of numerical data has been explored intensively, how to visualize a large volume of textual event messages has not been addressed in any conventional products. Advantageously, according to the invention, an event graph is provided to visualize a large amount of event messages. To understand what an event graph is, we recall that since a textual event message is translated into a set of attribute values through a parsing engine, an event can be represented by a tuple as {host name, event type, time stamp}. Here, host name and event type are strings for host and event type identification. Because the analysis is usually more efficient in handling numbers rather than strings, we further assign a unique number called ID to a host name and an event type, respectively. We maintain the correspondence between a name and its assigned ID in an attribute table. For example, host A may be assigned host ID 1. Event type B may be assigned to event type ID 3. A host attribute table thus is used to maintain the correspondence between host A and its ID 1. Likewise, the event attribute table maintains a link between event type B and its ID 3. Therefore, equivalently, the tuple {host name, event type name, time} for an event can be represented by {host ID, event type ID, time}. An event graph is therefore a plot to show these tuples for events. For example, events can be plotted on a two-dimensional graph using host ID and time as two axes. Clearly, the advantage of an event graph is that several days"" events can be shown in one plot. Furthermore, through standard visualization techniques, a user can browse events from coarse to fine time scales and can zoom into interesting areas by rubber-band techniques. It is to be understood that rubber-banding may include dragging a rectangle over points of interest in order to then perform operations in accordance therewith. This greatly improves the navigation capability of a user.
To take advantages of different analysis techniques, an event browser of the invention provides an extensible architecture to preferably integrate event graphs (xe2x80x9cplot viewerxe2x80x9d), event summarization (xe2x80x9cattribute viewerxe2x80x9d), detailed message (xe2x80x9cmessage viewerxe2x80x9d) and other possible viewers. Further, the event browser provides an infrastructure for exchanging information amongst the viewers. Each of these three viewers has its own advantages for viewing and manipulating data. The attribute viewer of the invention is good at summarization and query-type operations. From the attribute viewer, a user can conveniently select all events associated with a set of hosts and event types. It also summarizes events by their host types and event types and thus provides summarization of a log to a user. The plot viewer of the invention can display a large amount of individual events in one window. Therefore, a user can view event relationships and discover event patterns. In addition, through the use of standard visualization techniques, a user can zoom in for details and zoom out for a larger view, and rubber-band to select xe2x80x9cinterestingxe2x80x9d events. The message viewer of the invention provides the capability to view a set of raw event messages. This enables a user to further see detailed and application specific information which may be very difficult to parse out or not worth parsing out, but may be needed for diagnosis.
The event browser of the invention not only preferably provides these three individual viewers, but also combines and coordinates these viewers for analyzing events. For example, a user can very easily select a set of interesting events for a set of hosts and event types from the attribute viewer by highlighting these hosts and event types, then use the plot viewer to see the relationship among the selected events. From the plot viewer, he can further select a small set of suspicious events by dragging a rubber-band, and displaying the original textual messages related to the selected events in the message viewer. Further, by highlighting, coloring, or otherwise selecting events in one viewer, he can cause to have similarly modified presentations of these events in other viewers. We refer to this capability as xe2x80x9ccoordinated views.xe2x80x9d Accordingly, the event browser provides a novel event visualization and analysis platform and can assist a user in discovering xe2x80x9cusefulxe2x80x9d information which can not be revealed by any conventional tool.
In another aspect of the invention, the event browser provides interactive and iterative refinement of parsing rules. As previously mentioned, the role of a parsing engine is to pick out the important information from textual messages and translate the unstructured data into structured data for analysis. Therefore, the ability of analysis tools highly depends on what is parsed. In practice, finding the right information to parse out from a message is not an easy task, because a raw message contains various levels of details and a user certainly does not want to be flooded by this information. Usually, at first, a user is only interested in the most important information, such as host name and event type. But as the user analyzes the data, the user may want more details or other types of information, such as destination or severity level. Therefore, the parsing rule needs to be redefined to include additional information. Conventionally, the process of defining parsing rules is done in an isolated way. That is, if a user needs more information, the user has to use a separate tool to edit the parsing rule file, and then rerun the parsing and the analysis. Since the event browser integrates parsing engine and analysis, it provides a feedback loop for a user to modify parsing rules in an integrated environment.
Several extensions are possible to the systems and methods of the invention. For example, the plot viewer may not only provide a two dimensional plot, but also a three dimensional plot. Also, the attribute viewer may not only support a flat attribute structure, but also a hierarchical structure. As attributes can be organized in a more efficient hierarchical structure, an additional viewer can be designed based on the attribute viewer. The additional viewer to support hierarchical attribute structure may include an additional interface to reflect the hierarchical structure of attributes and additional methods to support OLAP operations (e.g., drill-down/up/through) to take the advantage of the hierarchical structure. Further, common functions, such as string searching, in a text editor can be incorporated into the message viewer. Still further, other types of analyses and analysis interfaces, such as association discovery and clustering algorithms in data mining, can be implemented as an additional viewer and work together with the plot, message and attribute viewers. Even further, because of the architecture of an event browser, functional components of the event browser such as, for example, a parsing engine, selection and control engine (SCE) and viewers, can run on physically different computers. To achieve this, the communication interface among these parts may be extended to support an unreliable networked communication environment.
These and other objects, features and advantages of the present invention will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.