1. Field of the Invention
This invention relates to data storage and, more particularly, to techniques for failure recovery in storage systems.
2. Description of the Related Art
Many business organizations and governmental entities rely upon applications that access large amounts of data, often exceeding a terabyte or more of data, for mission-critical applications. Often such data is stored on many different storage devices, which may be centrally located or distributed throughout an enterprise. Such storage devices may be heterogeneous in nature, including many different types of devices from many different manufacturers.
In some systems, storage devices may be arranged in a storage area network (SAN) that includes a SAN switch configured to interconnect storage devices with host systems that may be configured to execute applications dependent on the storage devices. SAN switches may generally provide flexibility in the design of a storage system, as they may be configured to provide many hosts with access to many different storage devices without requiring direct coupling of each of the hosts and storage devices. However, in the event of a failure of a SAN switch port, data inconsistency may occur, which may result in incorrect application execution.
For example, to provide for a degree of security against loss of critical data, such data may be stored on several storage devices connected to a SAN switch, where one storage device is configured to mirror another. One of the mirrored devices may thus provide a backup source of data in case another one of the mirrored devices fails. However, if a failure of a SAN switch port occurs during system operation, mirrored storage devices may not reflect the same data (i.e., may become inconsistent), for example if data is written to one mirrored device but not another as a consequence of the failure. In some cases, consistency may need to be restored to inconsistent storage devices following a failure in order for applications to continue operating properly. However, such consistency recovery may be a time-consuming task, for example if recovery is performed with respect to entire devices. Often, applications must be prevented from using inconsistent devices until consistency has been restored, which may result in large application downtimes or unacceptable performance degradation.