High-availability computing systems use redundancy to guarantee continuous operation even when a component fails. When a failure of a component is detected, a system administrator can be notified and the component replaced. However, until the failed component is replaced, the computer system is subject to an increased risk of failure due to the reduction in redundancy.
If the failed component is a data-handling component, e.g., a data processor, a data storage device, or a communication device, a workload that had been using that component can be reconfigured or migrated to use alternative hardware. If the failure is of a non-data component, e.g., a power supply or a cooling component such as a fan, 1) the system can continue to run at increased risk of interruption due to a second failure, 2) the system can be shutdown immediately, 3) components can be monitored for problems, and the system can be shutdown if problems (e.g., excessive heating) are detected, or 4) performance can be throttled to within the capability of the remaining non-data components.
Herein, related art is described to facilitate understanding of the invention. Related art labeled “prior art” is admitted prior art; related art not labeled “prior art” is not admitted prior art.