1. Field of the Invention
The field of the invention is data processing, or, more specifically, methods, apparatus, and products for administering a system dump on a redundant node controller in a computer system.
2. Description of Related Art
The development of the EDVAC computer system of 1948 is often cited as the beginning of the computer era. Since that time, computer systems have evolved into extremely complicated devices. Today's computers are much more sophisticated than early systems such as the EDVAC. Computer systems typically include a combination of hardware and software components, application programs, operating systems, processors, buses, memory, input/output devices, and so on. As advances in semiconductor processing and computer architecture push the performance of the computer higher and higher, more sophisticated computer software has evolved to take advantage of the higher performance of the hardware, resulting in computer systems today that are much more powerful than just a few years ago.
The combination of hardware and software components in computer systems today has progressed to the point that computer systems can be highly reliable. Reliability in computer systems may be provided by using redundant components in the computer system. In some computer systems, for example, components such as node controllers that manage hardware error requests in nodes of the computer system are provided in redundant pairs—one primary node controller, one redundant node controller. When such a primary node controller fails, the redundant node controller takes over the primary node controller's operations.
From time to time a redundant node controller loses communication with other components in the computer system. A redundant node controller, after losing communication, typically generates a system dump and reboots. The devices with which the redundant node controller loses communication, generate an error log. System administrators attempt to correlate the system dump from the redundant controller and the error logs from the other devices to identify and debug the underlying error that caused the loss of communication. Typically, however, many error logs and system dumps are created in a computer system during a period of time and it is often difficult to correlate a system dump with the correct corresponding error logs. Moreover, in situations where an application running on a redundant node controller is unaware of the communication loss, the redundant node controller is typically incapable of creating a system dump before being forced to reboot. Readers of skill in the art will recognize therefore that there exists room for improvement in administering a system dump on a redundant node controller in a computer system.