1. Technical Field
The present invention relates in general to improved hardware redundancy and in particular to a method and system for providing a system for improving reliability of hardware or data processing systems employing redundant units. Still more particularly, the present invention relates to a method and system for improving reliability of hardware or data processing systems by providing a system for management of redundant units.
2. Description of the Related Art
As the demand for reliability of electronic equipment and other hardware increases, the use of hardware redundancy has become more common. Hardware redundancy may take the form of (1) complete redundancy, which doubles the hardware, or (2) N+1 redundancy, where several units share the load and one unit beyond what is required to service the maximum load is provided. Thus, if one unit fails, the remaining N units can normally handle the load or demand until a repair or replacement of the failed unit can be made. These units may be identical in nature or may have differing capabilities and features.
One draw back or disadvantage to redundant systems is an increase in system failure rate due to the increased hardware present within the system; N+1 units are present instead of N units. As a result, more units are present for failure. Many times, more repairs are required to maintain a redundant hardware system. Another drawback in systems which attempt to guarantee redundancy is that N must be large enough to handle the worst case loading or usage within a system. Consequently, an excess of hardware may be supplied for the normal case loading or usage, especially in a system that provides for selectable features which is not normally configured to utilize a maximum capability.
For example, in a power regulator system implemented in a data processing system, parallel power supplies may share an output current to support an N+1 environment in the data processing system. In designing the power regulator configuration, the maximum load that may be required by the data processing system is utilized to determine the number of power regulators required for an N+1 environment. Depending on the various features and devices that may be selected for utilization in conjunction with the data processing system, the actual power required by the data processing system may vary, resulting in an inefficient use of the power regulators.
In an N+1 design, a failure of a power regulator may result in a N environment, requiring replacement of the failed power regulator. In many cases, however, the power regulators still functioning are sufficient to support an N+1 environment for the currently selected features. In known design redundant hardware systems, however, such a situation is not taken into account.
Therefore, it would be desirable to have a method and system for managing hardware redundancy within a hardware or data processing system to accurately determine when redundant units should be replaced or added.