The term “high availability system” is used in the telecommunications industry to specify a system meeting an availability requirement of 99.999%. Frequently, the system availability requirement is met through the use of multiple redundant components which may include multiple processors. Typically, processors possess three activation states; active, standby and reset. A processor in an active state is fully functioning and processing all of its assigned operations. A processor in a standby state is usually a redundant processor which is available to replicate the activities of an active processor if needed. A processor in a standby state performs only a small percentage of the operations of which it is capable. For example, both an active processor and a standby processor may receive the same information from a system component, but ordinarily only the active processor will act on the information while the standby processor merely monitors the information. However, there are exceptions to the general rule that only the active processor acts on received information. A processor in a reset state is frozen such that it is treated as functionally removed from the system by the other system components. In the event that one of the processors in the system fails, the failure of the processor must be detected and the processor put in a reset state so that it can cause no other failures in the system.
A number of techniques have been devised to detect and reset failed processors in a high availability system. In one technique, a hardware watchdog timer periodically verifies that a processor is capable of performing a defined task in a pre-determined amount of time. If the watchdog timer determines that a processor has failed to perform the defined task in the predetermined amount of time, the timer generates a reset signal to the processor. This technique is deficient for a couple of different reasons. The defined task is usually a very simple task that is not indicative of the requirements for the processor, such as performing a simple math operation in a given amount of time. Also, the watchdog timer is built into the hardware which limits its adaptability for different tests. A second technique that has been used in high availability systems is the use of a second processor to determine whether a first processor has failed. In the event the second processor determines the first processor has failed, the second processor generates a reset signal to the first processor. The problem associated with this technique is that it raises the question of who is watching the watcher. In other words, if the second processor fails rather than the first processor, the second processor's failure may be undetected and the first processor may be reset erroneously.