System logs, such as Windows System logs or Linux system logs, are an important resource of information for computer system management. These logs hold text messages emitted from various sources in the computer system during its day-to-day operation. Emitted messages may be informational, or they can indicate a problem in the system, whether trivial or more serious.
Types of system logs include security logs, application logs and system logs. Security logs track information such as user login attempts and completions. Application logs track when an application (e.g. an antivirus program) started, operations performed by the application and when the application finished. System logs store operating system events, including notification of a component failure. If desired, different system logs can be combined to create a merged log. Logs are generally structured as a first in first out (FIFO) queue with the capability of storing thousands of messages. The queue structure prevents the log from growing to an unreasonable size, as the oldest entry is dropped when a new entry is added.
An example prior art merged log is shown in FIG. 1. The system log, generally referenced 10, comprises a table of log entries. Each log entry comprises a time stamp 12 indicating when the event occurred, a log name 14 indicating what type of event occurred and a message 16 which provides further detail on the event. Note that this is a single screen displaying nine entries from the merged log. Note that this particular log contains additional fields that are not displayed (e.g., message source). Since logs can contain thousands of entries, navigating the log can be a cumbersome task.
Periodic monitoring of system logs by system administrators allows the identification of anomalies and security breaches in the system. In addition, the information in system logs is vital for problem diagnosis. In reality, system logs hold a large number of messages, most of which are not interesting to the user. It is time-consuming and sometimes impossible to manually find the key messages in this abundance of information. For example, if a problem arises, a user would call a help desk, and send the merged system log for analysis. A technician working at the help desk would then analyze the system log and try to pinpoint the problem. This can be a difficult and time consuming task since logs typically contain thousands of entries.
There have been various approaches to finding an effective method to parse these system logs. One approach is to have a human expert define a set of message patterns to find, along with desired actions to be taken when encountering them. However, the effort invested in writing and maintaining these rules is proportional to the number of message types and the rate at which they change. Another approach for log analysis focuses on summarizing the log data in a meaningful way, for example by either showing a succinct representation of the log data, by graphically showing patterns in the data or by presenting time statistics of messages.
Other previous approaches to log file analysis include log data pattern detection, message frequency analysis, the grouping of time correlated messages and the use of text analysis algorithms to categorize messages. A limitation of these tactics is that their analysis is solely based on the log data of the inspected computer system and is therefore limited to analyzing that specific system. While these previous approaches to system log monitoring could be used to monitor a server farm, the limitation of these approaches require that the server farm consist of homogeneous computers all performing the same tasks by running the same software on the same hardware.
System log monitoring is becoming more time consuming as the number of systems proliferates. Aside from desktop computers, large scale computer networks and server farms include computers such as file servers, web servers, email servers, data base servers etc. In addition the increased implementation of virtualization enables multiple virtual operating systems (e.g., Windows and Linux) to run on a single computer simultaneously, with each virtual machine generating its own system logs.
Therefore, there is a need for a system log analysis mechanism that is able to automatically analyze system logs and detect events that may indicate potential problems. The mechanism should be fully autonomous, be operating system independent and provide a useful targeted summary of key events taking place on all of the monitored systems. In addition, the mechanism should allow new computers to be monitored automatically as they are installed on the network, without the need for a supervised step of appropriately categorizing system log messages for each computer. By automatically monitoring systems, the mechanism should be able to detect problems at an early stage and be capable of detecting systems that are not configured correctly.