1. Field of the Invention
This invention relates to the field of computer systems management and, in particular, to methods, systems, and computer program products for providing context sensitive detection of failing I/O devices.
2. Description of Background
Large computing systems typically include a plurality of processor nodes and I/O devices. The nodes are capable of executing an operating system. A subset of these nodes are designated to act as server nodes. The remaining nodes, designated as non-server nodes, may perform input/output (I/O) operations on an I/O device, such as a data storage device or disk drive, through a server node or over a local path. The operating system is provided with a function to detect when an I/O request to a device has not completed within a reasonable amount of time. This approach is problematic because the concept of a “reasonable” amount of time might vary from situation to situation, and the user does not have sufficient information from which to determine an appropriate waiting time. Oftentimes, the actual length of time that a user waits for an I/O device to respond is much too long. For example, the wait may be caused by a I/O device performing its local recovery. If the local device is successful, then the I/O device is usable, but if the recovery is not successful or takes an excess period of time then the I/O device is unable to perform the necessary function. This results in I/O devices which are not functional being left in the configuration longer than is needed. Work is stalled longer than necessary waiting for the I/O request to complete. Accordingly, what is needed is an improved technique for detecting missing I/O interrupts and failures in I/O devices.