Many types of computer programs create and maintain log files. Log files provide a record of activities that have occurred with respect to the program that creates the log file. For instance, an electronic mail (“e-mail”) server application might maintain a log file that contains data describing the connections made to and from the server, the success or failure of messages sent to and from the server, and other information regarding the operation of the server. Other types of programs might also maintain log files containing relevant information regarding their operation.
Large distributed computing systems may maintain many different log files. For instance, a large distributed computing system might include multiple server computers running multiple different processes for performing various functions. Each of the processes might maintain a log file that has a format that is different from the format of the log files maintained by the other processes. Consequently, a large distributed computing system might maintain many different log files having many different formats. These log files might also be distributed across many different server computers.
In order to analyze the contents of log files, scripts are typically created to mine interesting data from the various log files. The scripts are generally configured to parse a log file to retrieve desired data and to then perform an analysis on the parsed data to answer a question. For instance, a script might be created to determine the number of connections made to a server during a particular time period. Creating and using scripts to analyze log files in this manner, however, can be challenging due to the difficulty in creating and maintaining a large number of scripts, the complexity in reusing script code spread across a large number of scripts, and due to the lack of a central component for configuration and management of the scripts.
Another mechanism for analyzing log files involves executing a parser to extract relevant information from a log file by executing queries similar to relational database queries against the contents of the log file. This mechanism is appropriate for certain types of log files. This mechanism is not appropriate, however, for certain types of complex log files such as where the format of each line varies significantly based on the event type to which the line belongs. A state machine is typically required to analyze these types of complex log files. As a result, previous parsers may be unable to analyze these types of complex log files.
It is with respect to these and other considerations that the disclosure made herein is presented.