Almost every piece of computer software writes human-readable, textual event messages, or simply “events” into event logs. Computer systems composed of many application (hardware and/or software) components, such as web services, complex enterprise applications and even complex printing presses and storage systems, collect such events from their many components into system event log files, or simply “logs”. These logs, which are typically stored on networked servers, can be used in system development and for debugging and understanding the behaviour of a system.
While logs hold a vast amount of information describing the behaviour of systems, finding relevant information within the logs can be a very challenging task; even modest systems can log thousands of messages per second. While it is possible to use traditional Unix “grep” for finding potentially relevant messages in event logs, existing commercial tools, such as those from Splunk, LogLogic and Xpolog, can collect and join logs from different sources and provide a more convenient search through the logs. However, the indexing provided by these tools still does not lead to automation in leveraging the logs for tasks such as automated problem debugging, process identification or visualization of the information in the logs.
Existing research in the area of automated log analysis focuses on discovery of temporal patterns, or correlation of event statistics, within the events. Such techniques are typically based on knowledge of which event messages can occur, or require access to the source code of software that generates the event messages in order to determine which event messages can occur. In general, the research does not accommodate the complexities of real world systems, in which logs may be generated by various different components in a complex system, leading to, for example, interleaving of sequences of events, asynchronous events and high dimensionality.