Real-time log analysis (“RTLA”) can allow an organization to monitor the service and error logs of a number of host computers and/or devices in real or near-real time in order to identify trends in service performance as well as to troubleshoot potential problems. An RTLA system can collect log data from the host computers and/or devices, process and collate the collected data, and analyze the collated data to generate service metrics. These metrics and/or the log data itself can then be published to host management systems, alarming and alerting services, reporting and graphing services and support services.
The generated metrics can include fatal error counts/rates, page views, service availability, host access rates, hardware performance measures and the like. Management and technical support personnel can utilize the published metrics and the processed and collated log data to be alerted to potential problems or failures, troubleshoot host or service problems, determine additional resources that need to be made available to meet growing demand, spot trends in service or product demand and the like.
Log data that describes a problem or failure with a host computer or device does not, however, typically provide significant insight into the root cause of the problem or failure. For example, and without limitation, a host computer might experience errors immediately following the deployment of a software update to the host. In a scenario such as this, it can be very difficult and time consuming to determine that the software deployment was the root cause of the errors appearing in the log files on the host computer.
The disclosure made herein is presented with respect to these and other considerations.