System logs contain messages emitted from several modules within a computing system and comprise valuable information about the status of various tasks, execution paths, error conditions, system states, device changes, device drivers, system changes, events, system operations, and more. Often, system logs comprise messages, for example, unstructured text messages, logged by several modules of a system. Users of system logs may include administrators, customer support engineers, and developers. In practice, log users analyze information provided by the logged messages and use that information for proactive repair, alerting, problem forecasting, and diagnosis of the system. For example, messages of the system log such as “disk failure” or “network interconnect failure” messages may cause an administrator to perform urgent system intervention. Further, messages such as “SCSI adapter encountered an unexpected bus phase” may help developers determine a bug location and remedy the bug.
Log users may be local or remote to the system. For example, storage enterprises, for example NetApp®, offer live and remote customer support by periodically collecting system logs from live deployments at customer locations. For customers with remote support, the customer system collects system logs from buffers in the client system and transmits the logs to a central enterprise repository, where the logs are monitored for alerts, problem forecasting, and troubleshooting. As systems grow in complexity, system logs grow in size. A study disclosed in Xu, Wei, Ling Huang, Armando Fox, David Patterson, and Micheal Jordon, Experience Mining Google's Production Console Logs. In Proc. Of SLAML (2010), showed that the number of log messages in a product increased by hundreds per month at all stages of product development. The ever increase in system log size are causing an increase in the costs of system log collection, monitoring, transmission, and archival.
Large system logs impact several resources. For example, large system logs overload log users with too much information. Often, log users manually inspect system logs when performing troubleshooting and other log related activities. For an enterprise with thousands of deployments, the cost of human resources to manually inspect system logs may be burdensome. Thus, a reduction in the number of messages a log user reads in order to render their services would be desirable.
In another example, system logs are stored on buffers of finite memory space. As a result, systems have had to place strict upper limits on the size of their system logs in order to prevent buffer overflow, which leads to unintentional message dropping. Buffer memory is costly and ultimately limited in size. As such, it would be desirable to reduce the size of the system logs to accommodate the buffers' finite size and prevent buffer overflow.
Another example of the impact that large system logs have on resources involves bandwidth. As explained above, for customers with remote support, the customer system collects system logs and transmits the logs to a central repository. The transmission of system logs from customer sites to enterprise repositories consume substantial network bandwidth effecting the customers, the remote support enterprise, the network provider, and unaffiliated third parties who happen to share the network bandwidth. Further, receiving large system logs directly increases the storage costs for the remote enterprise providing the support. Therefore, a reduction in the amount of data transmitted from the customer's system to the enterprise's system would be advantageous.
The desire to reduce resource drain caused by large log sizes has been recognized in the industry; however, industry has thus far primarily focused on reducing the expense caused by information overload on log users who manually inspect system logs. Cisco router logs, as described in Tim Kramer, Effective Log Reduction and Analysis Using Linux and Open Source Tools, have developed a solution that reduces the amount of information that a log user views at any one time. With the use of Cisco router logs, several tools and utilities are provided for fast parsing and searching through log files. The tools are effective for sorting through large amounts of log messages and presenting the log user with a subset of the system log's messages, which is easier to view and more manageable to work with. A similar approach to dealing with log sizes involves the aid of visualization, as described in Tetsuji Takada and Hideki Koike. Meilog: A highly interactive visual log browser using information visualization and statistical analysis. In In Proc. USENIS Cof on System Administration, 2002, which helps reduce the recognition load and to pin point unusual log messages in an interactive and visual log browser.
While the proposed solutions may present the system log messages in a manner that is easier for a human user to digest, the proposed solutions do not delete messages or reduce the overall size of the system log. Rather, the proposed solutions merely change the manner in which the messages are viewed by a human user. As such, the proposed solutions fails to address many of the resource costs created by large system logs.
Another proposed solution is described in Yinglung Liang, Yanyong Zhang, Hui Xiong and Ramendra Sahoo, An Adaptive Semantic Filter for Blue Gene/L Failure Log Analysis, In Proceedings of the Third International Workshop on System Management Techniques, Processes, and Services (SMTPS), 2007. This proposed solution filters the centralized system log of a super computer or the centralized system log of a multi-node computing system based on message redundancy by targeting and eliminating redundant log records that identify the same unit of information (e.g., the same event), but differ with reference to location (which node of a plurality of nodes sent the message) or time (the time that the message originated from a particular node).
However, the system proposed by Yinglung Liang et al. has limited log size reduction capability in contexts outside centralized logging systems. In centralized logging systems, there is a high likelihood that redundant messages will be received from the large number of different nodes. Therefore, limiting message removal to only redundant messages in such a system may yield a large log size reduction. However, in other systems, such as a system log which locally logs messages for a single node system, there is a low likelihood that the technique will find many duplicate messages. As such, Yinglung Liang et al. techniques do not yield much log size reduction because the technique is limited to only removing the few duplicative messages that may be found in a local log. For example, if the message of interest is redundant, then it is deleted; if the message of interest is not redundant, then the message is not deleted. The system's algorithm is equitable to a rudimentary if/then solution and is not designed to make nuanced filtering decisions regarding the value of one non-redundant message as compared to another non-redundant message. Thus, when the message of interest is non-redundant, the solution is unable to intelligently determine the informational value of a non-redundant message. Accordingly, the solution is unable to determine whether to delete a non-redundant message based the value of the information in the non-redundant message.