During startup and running of software systems, faults such as system hardware hanging, operating system crash, task abnormity, task dead loop, frequent interruption and so on, often occur, causing the systems not to work. For software in a communication system, when a fault occurs in the software system, the software system can automatically identify status of task abnormity, and correspondingly alarm fault abnormity, perform recording and system recovery according to user configuration policy, which are essential functions. Especially for those voice service enabled systems with high real-time requirement, if any fault occurs at any stage of system operation, the system is required to be able to completely and accurately identify the abnormity, record abnormity information and perform automatic recovery processing.
In existing methods of fault detection and automatic recovery in the software system, a hardware watchdog or a software watchdog is generally employed. A hardware watchdog is a simple timer reset device, for which software is required to regularly generate a pulse kicking dog signal. Once a timing threshold (usually 1-2 seconds) is exceeded without generating a pulse kicking dog signal, it will automatically generate a hardware reset signal to trigger the system to reset. The software watchdog aims to solve the problem of short timing in the hardware watchdog. The software watchdog may increase reset time of the hardware watchdog by means of some simple heartbeat messages or a synchronous monitoring mechanism. These methods are simple to be implemented and relatively reliable. They, however, have their own drawbacks. For example, not all abnormities arising in the system can be detected, some special applications in the system cannot be monitored and the types of the system faults cannot be classified in a log record.