1. Field
Implementations of the invention relate to selection of status data from synchronous redundant devices.
2. Description of the Related Art
Some computing systems may be connected to two redundant devices. Redundant devices refer to multiple (i.e., two or more) units of a same device (e.g., multiple power supplies). With redundant devices, if one of the redundant devices fails, the computing system may be able to rely on one or more other redundant devices that have not failed. For example, a computing system may be connected to two power supplies. If one power supply fails, the computing system continues to function using power from the other power supply.
The term “redundant path” or “redundant view” may be used to describe a snapshot of status data gathered from a device that is obtained through a given means (e.g., over a given communication path between each redundant device and the computing system to which the redundant device is connected or by accessing global status data).
Each redundant device may provide status data to the computing system via status registers. When the redundant devices are functioning correctly, the status data received from the redundant devices should be the same. When a computing system simultaneously receives status data from the status registers of two or more redundant devices, there is the possibility that status data in one of the status registers (i.e., one redundant view) is different from status data in another status register (i.e., another redundant view), which signals that one of the redundant views has incorrect status data.
For certain computing systems, a Longitudinal Redundancy Check (LRC) or Cyclic Redundancy Check (CRC) value may be added by the redundant device to status data before transfer and then checked upon receipt by the computing system. Nevertheless, LRC is an expensive solution for simple or inexpensive devices, involves overhead on both send and receive, and does nothing for cases where bad data is encoded with a good LRC value. Furthermore, because of the move to use more off-the-shelf devices (e.g., power supply devices), it is becoming more desirable to have devices that do not require LRC or CRC encoding capability.
Some other computing systems arbitrarily select status data from one of the redundant views when the status data reported from redundant devices is different. Although this technique is used often, even on some enterprise-class systems, this technique is not sufficiently intelligent for a highly available system. A highly available system is one that provides availability of the computing system when one device fails by using redundant devices.
For example, consider two power supply devices reporting battery status for a large disk system where the battery status disagrees across redundant views. In this example, one power supply device reports a battery status showing that battery power is high (e.g., power is available), while the other power supply device reports a battery status showing that battery power is low. In this case, the selection of the battery status that shows that power supply is high when this is incorrect (i.e., actually battery power is low) results in a risk of leaving volatile data unprotected, while the selection of the battery status that shows that power supply is low when this is incorrect (i.e., actually battery power is high) results in a risk of shutting down the computing system unnecessarily.
Yet other computing systems implement a third technique in which a single characteristic or a small group of characteristics are used to determine which redundant view to use in selecting status data when redundant devices report different status data. For example, one redundant view may be selected based on which redundant view has more active interrupts (i.e., a status change), or which redundant view shows status data whose values correspond to more severe or critical conditions (e.g., a power supply on fire is more critical than a power supply that is low on battery power).
While single characteristic decisions provide improved accuracy over an arbitrary selection, there are rarely just one or two criteria that correctly define a “good” or “preferred” redundant path for all cases. Furthermore, such techniques put too much weight in one or two characteristics and no weight on other characteristics. Thus, there is a continued need in the art for improved selection techniques.