1. Field of the Invention
This invention relates in general to computer networks, and in particular to managing computer networks.
2. Description of Related Art
In distributed computing environments comprising interconnected mainframes, minicomputers, servers and workstations, when processes abnormally terminate, there is no way currently to detect this and then initiate an automatic failure recovery mechanism. There is a need in the art for an efficient mechanism for monitoring processes and executing a failure recovery scenario in response to the state of the process.
The present invention discloses a method, apparatus, and article of manufacture for failure recovery in a computer network. The invention provides a system monitor that allows a process to register itself for monitoring and dictate a failure recovery mechanism if the process terminates abnormally. The system monitor according to the present invention continuously monitors the process and detects when the process abnormally terminates. A failure recovery mechanism is then executed.
The failure recovery mechanism can include the execution of an executable, a command line script or the starting of an operating system service. These actions allow virtually any failure recovery scenario such as re-starting the failed process, cleaning up after the failed process, notifying other processes of the failure, or sending a notification of the failure.
An application programming interface performed by the computer manages the system monitor of this invention to perform the monitoring and failure recovery functions.