1. Field of the Invention
The present invention generally relates to testing and verification of a system. More particularly, the present invention relates to verifying a monitoring and responsive infrastructure of a system.
2. Related Art
Monitoring and responding to fluctuations in the physical environment (e.g., temperature, fan speed, voltage levels, etc.) of a system, such as a server, is complex, but essential to maintaining a high level of reliability. As a result, the Intelligent Platform Management Interface (IPMI) specification was developed to define a monitoring and responsive infrastructure for a system, as well as other capabilities. In general, the monitoring and responsive infrastructure compliant with the IPMI specification accumulates information about the system. This information represents system health and system status information. Sensors are utilized to monitor the various system voltages, temperatures, fan speeds, bus errors, power supplies, physical security, etc. The sensors are periodically polled to receive the output of the sensors. Various thresholds (e.g., critical, non-critical, warning, non-recoverable, etc.) and ranges can be set for each sensor to distinguish normal conditions of the system from abnormal conditions. Moreover, events can be defined, whereas an event represents the occurrence of a condition of interest (e.g., triggering a threshold) that necessitates the performance of particular responsive action.
When the event is identified, the corresponding response is invoked. For example, if the temperature of the processor passes a certain warning threshold, the response could be to increase the fan speed, possibly illuminate an appropriate LED (light emitting diode), and properly log the event in a system event log for later examination and diagnosis by support resources.
While the IPMI specification defines the features of the monitoring and responsive infrastructure, the IPMI specification fails to address testing and verification of the monitoring and responsive infrastructure. As a result, testing and verification is manually performed, a task that is time consuming, labor intensive, and sometimes difficult to perform. This requires use of hardware (e.g., temperature guns to increase/decrease temperature at specific locations monitored by sensors, special power supplies to vary the voltages to the voltage rails or the processor, special push buttons to initiate certain actions, etc.) to physically change the physical environment of the system. As described above, the various thresholds can be set for each sensor. However, some of these thresholds are triggered by abnormal conditions that may harm, damage, or destroy the system. In sum, the current manner for testing and verifying the monitoring and responsive infrastructure is costly, time consuming, inefficient, and could jeopardize the health of the system.