The electronic and electromagnetic components of a computer system require a stable environment to ensure proper operation. The components within a computer system generate a great deal of heat during use. Absent proper environmental control, including active heat dissipation, components can and will overheat, causing erratic behavior, malfunctions, or total component failure.
The computer system market demands that state of the art systems have extremely high reliability and availability. Thus, systems are typically designed with one or more cooling components. The cooling components can consist of passive heat sinks and/or fans or blowers designed to move air over the components. Simple, active cooling can be accomplished by placing a single fan at an opening to an apparatus enclosure and blowing air in or out of the enclosure on a continuous basis. Naturally, the failure of such a fan will result in overheating leading to component failure.
More sophisticated cooling systems feature various sensors for detecting environmental and power supply problems and providing appropriate error messages to inform users of problems upon occurrence. In addition, more sophisticated systems will include redundant components, for example redundant fans or power supplies, so that the failure of a single component does not necessarily result in unacceptable environmental conditions within an enclosure.
For example, Walker, U.S. Pat. No. 6,418,539, CONTINUOUSLY AVAILABLE COMPUTER MEMORY SYSTEMS, teaches a memory storage system having a logical controller subsystem interfaced with a power supply subsystem and a fan subsystem. In addition, each of the subsystems is reproduced in triplicate. Thus, if the primary fan subsystem fails, a secondary fan subsystem takes over cooling operations. Similarly, if the primary controller fails, the duplicate backup controller takes over operations previously performed by the primary controller.
Simple redundancy systems typically do not have the autonomic capability to compensate for the elective withdrawal of a select component from the overall system. For example, a simple redundancy system may not provide for backup control of a cooling system when a primary controller is electively removed for routine maintenance, replacement or for another purpose. Also, a simple redundancy system may not detect and return control to the primary controller when it is returned to the system after the elective withdrawal.
In addition, simple redundancy systems such as described in Walker require that the backup subsystems be functional upon failure of the primary subsystem. This reliance can be problematic since dormant backup subsystems are typically not in use throughout the period of time prior to failure of the primary subsystem, thus, there is no guarantee to the system operator that the backup will perform as required at the time it is called upon.
In addition, in some system configurations it is not possible to implement designs where the failure of a component can be completely compensated for by simply providing a redundant component that has its own independent control functions. Certain components are required to share common independent control functions. In such a setting, it is required that a failure in one control function or the removal of the second control function not negatively impact the shared functionality.
The present invention is directed to overcoming one or more of the problems discussed above.