1. Field of the Invention
The present invention relates to software monitoring processes. More particularly, it relates to watchdog processes, which monitor the operation of other processes and restart the other processes, as necessary, to maintain proper operation.
2. Discussion of Related Art
Computer processes have been known to occasionally have operating problems. Errors in operation can cause a process to fail or cease to execute. A process may enter a non-exiting loop, or may lose data and cease operation. In order to maintain proper operation, monitoring processes, called watchdog processes, have been used to track operation of another process. When the watchdog process determines that there is a operating problem, it will interrupt the watched process and restart it. In this manner, the main process will be maintained as operating.
Known watchdog processes have been implemented using circuits separate from those implementing the main process. Typically, these circuits include a counter which is periodically reset by the main process. If the main process fails, then the timer is not reset. Once the timer expires, the watchdog process determines that the main process has failed and operates to restart the process. Such watchdog processes are implemented using a hardware circuit or a separate processor and appropriate software. While these processes assist in preventing the total loss of the main process, they lack the ability to adequately determine or resolve various processing problems. For example, a main process could hang in a loop which resets the timer. Thus, even though the main process has failed, it would not trigger the watchdog process. Therefore, a need exists for a watchdog process which can monitor a main process independent of the type of error.
Furthermore, the watchdog process cannot determine or correct the error which caused the problem. This can result in the main process failing again after it is restarted. Therefore, a need exists for a watchdog process which can monitor and correct errors which cause the main process to fail.
Furthermore, for known watchdog processes, the main program must reset the timer of the watchdog process. Thus, the main program must be designed to operate with the watchdog process. The watchdog process cannot operate to monitor other programs. Also, each watchdog timer can only be used to monitor a single program. Therefore, a need exists for a watchdog process which can monitor any program and multiple programs.
Finally, the watchdog process itself may fail. If the watchdog process fails, the main process could also fail without being monitored. Therefore, a need exists for a watchdog process which can also be monitored.