Many types of computing systems and applications generate vast amounts of data pertaining to or resulting from the operation of that computing system or application. These vast amounts of data are stored into collected locations, such as log files/records, which can then be reviewed at a later time period if there is a need to analyze the behavior or operation of the system or application.
Server administrators and application administrators can benefit by learning about and analyzing the contents of the system log records. However, it can be a very challenging task to collect and analyze these records. There are many reasons for these challenges.
One significant issue pertains to the fact that many modern organizations possess a very large number of computing systems, each having numerous applications that run on those computing systems. It can be very difficult in a large system to configure, collect, and analyze log records given the large number of disparate systems and applications that run on those computing devices. Furthermore, some of those applications may actually run on and across multiple computing systems, making the task of coordinating log configuration and collection even more problematic.
Conventional log analytics tools provide rudimentary abilities to collect and analyze log records. However, conventional systems cannot efficiently scale when posed with the problem of massive systems involving large numbers of computing systems having large numbers of applications running on those systems. This is because conventional systems often work on a per-host basis, where set-up and configuration activities need to be performed each and every time a new host is added or newly configured in the system, or even where new log collection/configuration activities need to be performed for existing hosts. This approach is highly inefficient given the extensive number of hosts that exist in modern systems. Furthermore, the conventional approaches, particularly on-premise solutions, also fail to adequately permit sharing of resources and analysis components. This causes significant and excessive amounts of redundant processing and resource usage.
Conventional log analytics tools are also very inefficient when it comes to the construction of log parsers used by the log analytics tools. A log parser is a tool that understands how to parse the entries within a log. Conventionally, a log parser must be manually constructed by a person that must be both knowledgeable about the exact format of the log file to be analyzed, as well as skilled in the specific programming infrastructure that would be used to implement the parser.
One problem with the conventional approach of manually constructing log parsers is that this process requires significant amounts of both time and resources from skilled technology personnel to build the parser. In addition, this approach also requires an inordinate amount of manual resources to maintain the parsers in the event of changes to the format of a log file. Moreover, this manual approach necessarily requires a priori knowledge of the log file formats.
Some embodiments of the invention solve these problems by providing an approach to automatically construct a log parser. Instead of requiring a person to manually create the contents of the log parser, the log contents themselves are used to construct the parser. Other additional objects, features, and advantages of the invention are described in the detailed description, figures, and claims.