Many types of computing systems and applications generate vast amounts of data pertaining to or resulting from the operation of that computing system or application. These vast amounts of data are stored into collected locations, such as log files/records, which can then be reviewed at a later time period if there is a need to analyze the behavior or operation of the system or application.
Server administrators and application administrators can benefit by learning about and analyzing the contents of the system log records. However, mining knowledge from log files can be a very challenging task for many reasons. One challenge is that the size of the log data can be very large, making it inefficient and difficult to analyze the large number of records for the specific information items of interest. This may be particularly the case if the interesting entries in the log data are relatively sparse within the larger set of data—which is often the situation since severe problems are usually rare. However, when such problems appear they tend to require immediate action to be taken, which emphasizes the importance of being able to efficiently identify the rare or sparse records of interest within the overall large set of log data. Moreover, interesting trends may be hidden in sequences of events. The raw evidence to discover these specific sequences of events may exist in the log files, but combining the individual pieces of information together from among the vast set of log data to draw a meaningful conclusion can be a particularly non-trivial task.
The aforementioned problems become even more pronounced in large and complex ecosystems, such as complex enterprise-class database management systems. Such systems may produce very large volumes of data stored in hardware logs, operating system logs, application logs, application server logs, database server logs, and any other type of log that monitors the behavior of a large production system. Furthermore, a similar situation will also exist in a cloud environment, where multiple customers are sharing the same physical resources in a virtualized fashion. Mining knowledge from such log files may be comparable to looking for a needle in a haystack.
Therefore, there is a need for an improved approach that can be taken to analyze vast quantities of data, such as log data that are generated by computing systems and applications.
Other additional objects, features, and advantages of the invention are described in the detailed description, figures, and claims.