1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, apparatus, and products for administering correlated error logs in a computer system.
2. Description of Related Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely complicated devices. Today's computers are much more sophisticated than early systems such as the EDVAC. Computer systems typically include a combination of hardware and software components, application programs, operating systems, processors, buses, memory, input/output devices, and so on. As advances in semiconductor processing and computer architecture push the performance of the computer higher and higher, more sophisticated computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
The combination of hardware and software components in computer systems today has progressed to the point that computer systems can be highly reliable. Reliability in computer systems may be provided by using redundant components in the computer system. In some computer systems, for example, components such as node controllers that manage hardware error requests in nodes of the computer system are provided in redundant pairs—one primary node controller, one redundant node controller. When such a primary node controller fails, the redundant node controller takes over the primary node controller's operations.
From time to time a single point of failure may cause a loss of communications between many devices in the computer system. Each device typically generates an error log corresponding to the failure the device experienced. The error logs are used by a system administrator to identify and debug the single point of failure that caused the loss of communications. Current processes of debugging a failure require manual parsing of the error logs and manual arrangement of the error logs in an order in which actual events occurred. Error logs, however, may not be received at or near the same time, nor is there any guarantee the error logs include a time-stamp indicating the same time of the failure. Moreover, many distinct failures may occur in a computer system during a short period of time, resulting in a collection of many error logs for each distinct failure. Using the disparate collection of error logs to debug and identify the distinct failures under current processes is time consuming, an inefficient, requiring a manual and inexact correlation of the error logs to a particular underlying failure. Readers of skill in the art will recognize therefore that there exists room for improvement in administering correlated error logs in a computer system.