1. Field
The present invention applies to the field of fault diagnostics in computing systems using detectors and policies.
2. Description of Related Art
Comprehensive fault management plays an important role in keeping critical computing systems in a continuous highly available mode of operation. These systems must incur minimum downtime, typically in the range of seconds or minutes per year. In order to meet this goal every critical component (a critical component is one that, upon failing, fails the entire corresponding system) must be closely monitored for both occurring faults and potentially occurring faults. In addition it is important that these faults be handled in real time and within the system rather than remotely as is done in many monitoring systems today. An example of a remote monitoring system is a system that follows the Simple Network Management Protocol (SNMP). For the foregoing reasons there is a need for a fast, small footprint, real time system to detect and diagnose problems. In addition it's preferred that this system also be cross-platform, extensible and modular.