1. Field of the Invention
The present invention relates generally to data processing, and in particular to a computer implemented method, apparatus, and computer usable program code for evaluating system health using an I/O device.
2. Description of the Related Art
Reliance on data processing systems has grown exponentially in recent years because of the increased use of computing devices in every aspect of business and society. Because of their importance, data processing systems are expected to be operational all the time. However, in the real-world, data processing systems frequently experience failures due to hardware or software errors. In some cases, these failures cause the system to hang or otherwise fail.
System down-time is especially damaging to many real time applications that rely on the data processing system, such as a dedicated server that performs business transactions through the Internet. When a system hangs or fails, the condition of the system needs to be detected as soon as possible so the system may be recovered or a back-up activated. A system that hangs may be especially hard to detect because the whole system stops and no processes are running to detect the problem.
Some current systems use a heart beat mechanism to detect the health of a monitored system via a network connection. System recovery actions are undertaken when the heart beat stops. The user of a heart beat has some limitations. For example, the heart beat may stop both because of a system hang and because of a network malfunction. Additionally, if the failure is part of the monitoring network, both a primary and standby server may be active simultaneously. As a result, data integrity problems may occur if both systems are, for example, responding to client requests. In some cases, the heart beat monitoring mechanism may take a long time to detect a system hang or failure.