Almost all software systems, such as those present in data-processing systems, are programmed to generate textual logs. The messages that are written into the textual logs report on a wide variety of phenomena, such as execution speed, input/output faults, process initiation and termination, and so forth. Based on their formats, textual logs can be classified into two categories: i) structured logs and ii) unstructured logs. Structured logs follow a well-defined syntactic format, while unstructured logs have only a partial structure or no structure at all. Web access logs, transaction logs, and error logs are some examples of structured logs; in a typical product development, they are incorporated into the software while accounting for the related system requirements and in many cases conform to standard programming practices. Trace and debugging logs, in contrast, are examples of unstructured logs; they are incorporated into the software typically not as an outcome of any formalized system requirements, but more as a debugging aid to individual programmers. The messages in such logs do not follow strict patterns and can be classified as unstructured.
The data-processing systems that generate the message logs often comprise large, complex software systems with millions of lines of instructions that have evolved over many years of development. Examples of such data-processing systems in a telecommunications context are routers, switches, servers, and so forth. In turn, each data-processing system is often part of a larger system such as a telecommunications system, within which multiple, networked data-processing systems must work together to provide services to users of devices such as telecommunications terminals. It is important to understand the system behavior of these data-processing systems, in order to maintain or improve their reliability—particularly with respect to a failure condition, in which a software component, or the data-processing system itself, fails to perform as intended. Thus, it is not surprising that some efforts have been made to analyze message logs for the purpose of understanding the behavior of a system.
Most of the previous work on log analysis is based on searching for and mapping a set of pre-defined patterns in a structured log. As an example, web-log mining relies on searching well-known document retrieval patterns from a structured log file. With pre-defined patterns in mind, structured error logs or call logs in some telecommunications systems have been used to analyze and understand the failure process. In particular, user search patterns and user navigation behavior has been studied to improve web site usage and also to provide users with targeted product advertising. Additionally, system anomalies that are detectable through pre-defined patterns in the logs have been used to detect intrusions in deployed systems.
In contrast, unstructured message logs have been largely unused in attempting to understand system behavior, mainly because unstructured logs do not easily lend themselves to automated analysis, because of both the unstructured nature of the messages and the volume of messages that can be generated. For example, consider that in an enterprise Voice over Internet Protocol (VoIP) environment, a data-processing system that provides the call control can generate a million status messages or more per hour as part of the message logs, in which there can be over 100,000 distinct messages. And in many systems, a large number of log files generated are, in fact, unstructured rather than being structured. Consequently, a sizeable portion of log messages overall do not have any pre-defined tags that can be monitored, so the pre-defined pattern techniques in the prior art are somewhat useless here.
What is needed is a technique for leveraging unstructured logs, including partially-structured logs, in order to understand and characterize the behavior of a processing system, specifically with respect to the failure behavior of the system, without some of the disadvantages in the prior art.