The present invention relates generally to fault detection and recovery and, more particularly, relates to a system and method for automatically detecting and recovering from software and/or hardware faults in microprocessor-based systems.
Microprocessor-based systems are used in an increasing number of applications, in part, because present day microprocessors are inexpensive and extremely powerful. Many of these systems are sophisticated and have complex software for driving the operation of the microprocessor and other hardware components. Since many of these systems, such as a router in a computer network, must operate continuously and unattended, the systems must be designed to operate in the presence of faults. These faults can be hardware faults or software faults resulting from hardware or software malfunctions.
In most microprocessor-based systems, fault detection and recovery is not implemented. In those rare cases where fault detection and recovery is implemented, the implementation is relatively primitive and informal. Specifically, it is typically left to the discretion of the hardware and software developers to design fault detection and recovery into their software processes which creates many problems. For example, any fault detection and recovery that does exist is tightly coupled and intertwined with the software process so re-use is difficult or impossible. This is especially true since software and hardware faults are typically handled by separate modules and not by one integrated module. Additional problems arise since many software processes are designed to exit when a fault occurs requiring the system to be manually restarted or rebooted to resume operation.
From the foregoing, it will be appreciated that a need exists for a more formal and comprehensive approach to hardware and fault detection and recovery. There is also a need for a fault detection and recovery method that can be easily re-used by any process or module in a system product. Finally, there is a need for fault recovery that is automatic in the sense that manual intervention is not required to recover from the fault.