The present invention relates generally to data consistency in data storage systems, and more specifically, to a method for grouping logical units to provide proper ordering of I/O operations and failure consistency in a controller-based remote data replication system.
It is desirable to provide the ability for rapid recovery of user data from a disaster or significant error event at a data processing facility. This type of capability is often termed xe2x80x98disaster tolerancexe2x80x99. In a data storage environment, disaster tolerance requirements include providing for replicated data and redundant storage to support recovery after the event. Unlike local environments, where a single host system is associated with the storage, disaster tolerant environments often use a completely replicated system for error recovery. As such, data stored on all of the volumes must be consistent at a point in time in order for an application to start, identify where the application left off, and to continue, after an error event. In order to provide a safe physical distance between the original data and the data to backed up, the data must be migrated from one storage subsystem or physical site to another subsystem or site. It is also desirable for user applications to continue to run while data replication proceeds in the background. Data warehousing, xe2x80x98continuous computingxe2x80x99, and Enterprise applications all require remote copy capabilities.
Storage controllers are commonly utilized in computer systems to off-load from the host computer certain lower level processing functions relating to I/O operations, and to serve as interface between the host computer and the physical storage media. Given the critical role played by the storage controller with respect to computer system I/O performance, it is desirable to minimize the potential for interrupted I/O service due to storage controller malfunction. Thus, prior workers in the art have developed various system design approaches in an attempt to achieve some degree of fault tolerance in the storage control function. One such prior approach requires that all system functions be xe2x80x9cmirroredxe2x80x9d. While this type of approach is most effective in reducing interruption of I/O operations and lends itself to value-added fault isolation techniques, it has previously been costly to implement and heretofore has placed a heavy processing burden on the host computer.
One prior method of providing storage system fault tolerance accomplishes failover through the use of two controllers coupled in an active/passive configuration. During failover, the passive controller takes over for the active (failing) controller. A drawback to this type of dual configuration is that it cannot support load balancing, as only one controller is active and thus utilized at any given time, to increase overall system performance. Furthermore, the passive controller presents an inefficient use of system resources.
Another approach to storage controller fault tolerance is based on a process called xe2x80x98failoverxe2x80x99. Failover is known in the art as a process by which a first storage controller, coupled to a second controller, assumes the responsibilities of the second controller when the second controller fails. xe2x80x98Failbackxe2x80x99 is the reverse operation, wherein the second controller, having been either repaired or replaced, recovers control over its originally-attached storage devices. Since each controller is capable of accessing the storage devices attached to the other controller as a result of the failover, there is no need to store and maintain a duplicate copy of the data, i.e., one set stored on the first controller""s attached devices and a second (redundant) copy on the second controller""s devices.
U.S. Pat. No. 5,274,645 (Dec. 28, 1993), to Idleman et al. discloses a dual-active configuration of storage controllers capable of performing failover without the direct involvement of the host. However, the direction taken by Idleman requires a multi-level storage controller implementation. Each controller in the dual-redundant pair includes a two-level hierarchy of controllers. When the first level or host-interface controller of the first controller detects the failure of the second level or device interface controller of the second controller, it re-configures the data path such that the data is directed to the functioning second level controller of the second controller. In conjunction, a switching circuit re-configures the controller-device interconnections, thereby permitting the host to access the storage devices originally connected to the failed second level controller through the operating second level controller of the second controller. Thus, the presence of the first level controllers serves to isolate the host computer from the failover operation, but this isolation is obtained at added controller cost and complexity.
Other known failover techniques are based on proprietary buses. These techniques utilize existing host interconnect xe2x80x9chand-shakingxe2x80x9d protocols, whereby the host and controller act in cooperative effort to effect a failover operation. Unfortunately, the xe2x80x9chooksxe2x80x9d for this and other types of host-assisted failover mechanisms are not compatible with more recently developed, industry-standard interconnection protocols, such as SCSI, which were not developed with failover capability in mind. Consequently, support for dual-active failover in these proprietary bus techniques must be built into the host firmware via the host device drivers. Because SCSI, for example, is a popular industry standard interconnect, and there is a commercial need to support platforms not using proprietary buses, compatibility with industry standards such as SCSI is essential. Therefore, a vendor-unique device driver in the host is not a desirable option.
U.S. patent application Ser. No. 08/071,710 to Sicola et al., describes a dual-active, redundant storage controller configuration in which each storage controller communicates directly with the host and its own attached devices, the access of which is shared with the other controller. Thus, a failover operation may be executed by one of the storage controller without the assistance of an intermediary controller and without the physical reconfiguration of the data path at the device interface.
However, none of the above references disclose a system having a remote backup site connected to a host site via a dual fabric link, where the system provides a mechanism for grouping logical units for logging and failover purposes. Furthermore, the prior technology does not provide for proper ordering of I/O operations during logging across multiple volumes.
Therefore, there is a clearly felt need in the art for a disaster tolerant data storage system capable of associating a group of logical units so that they share a set of properties which provides in-order operations during transaction logging and merge-back as well as failure consistency across the associated units.
Accordingly, the above problems are solved, and an advance in the field is accomplished by the system of the present invention which provides a completely redundant configuration including dual Fibre Channel fabric links interconnecting each of the components of two data storage sites, wherein each site comprises a host computer and associated data storage array, with redundant array controllers and adapters. The present system is unique in that each array controller is capable of performing all of the data replication functions, and each host xe2x80x98seesxe2x80x99 remote data as if it were local. The array controllers also perform a command and data logging function which stores all host write commands and data xe2x80x98missedxe2x80x99 by the backup storage array during a situation wherein the links between the sites are down, the remote site is down, or where a site failover to the remote site has occurred.
The present system includes an additional novel aspect of grouping logical units, into xe2x80x98association setsxe2x80x99, for logging and failover purposes. The concept of association sets allows the present system provides for proper ordering of I/O operations during logging across multiple volumes. In addition, association sets are employed by the present invention to provide failure consistency by causing the group of logical units/volumes to all fail at the same time, ensuring a point in time consistency on the remote site.
The xe2x80x98mirroringxe2x80x99 of data for backup purposes is the basis for RAID (xe2x80x98Redundant Array of Independent [or Inexpensive] Disksxe2x80x99) Level 1 systems, wherein all data is replicated on N separate disks, with N usually having a value of 2. Although the concept of storing copies of data at a long distance from each other (i.e., long distance mirroring) is known, the use of a switched, dual-fabric, Fibre Channel configuration as described herein is a novel approach to disaster tolerant storage systems. Mirroring requires that the data be consistent across all volumes. In prior art systems which use host-based mirroring (where each host computer sees multiple units), the host maintains consistency across the units. For those systems which employ controller-based mirroring (where the host computer sees only a single unit), the host is not signaled completion of a command until the controller has updated all pertinent volumes. The present invention is, in one aspect, distinguished over the previous two types of systems in that the host computer may associate multiple volumes, but the data replication function is performed by the controller. Therefore, a mechanism is required to communicate the host required association between volumes to the controller. To maintain this consistency between volumes, the system of the present invention provides a mechanism of associating a set of volumes to synchronize the logging to the set of volumes so that when the log is consistent when it is xe2x80x9cplayed backxe2x80x9d to the remote site.
Each array controller in the present system has a dedicated link via a fabric to a partner on the remote side of the long-distance link between fabric elements. Each dedicated link does not appear to any host as an available link to them for data access, however, it is visible to the partner array controllers involved in data replication operations. These links are managed by each partner array controller as if being xe2x80x98clusteredxe2x80x99 with a reliable data link between them.
The fabrics comprise two components, a local element and a remote element. An important aspect of the present invention is the fact that the fabrics are xe2x80x98extendedxe2x80x99 by standard e-ports (extension ports). The use of e-ports allow for standard Fibre Channel cable to be run between the fabric elements or the use of a conversion box to covert the data to a form such as telco ATM or IP. The extended fabric allows the entire system to be viewable by both the hosts and storage.
The dual fabrics, as well as the dual array controllers, dual adapters in hosts, and dual links between fabrics, provide high-availability and present no single point of failure. A distinction here over the prior art is that previous systems typically use other kinds of links to provide the data replication, resulting in the storage not being readily exposed to hosts on both sides of a link. The present configuration allows for extended clustering where local and remote site hosts are actually sharing data across the link from one or more storage subystems with dual array controllers within each subsystem.
The present system is further distinguished over the prior art by other additional features, including independent discovery of initiator to target system and automatic rediscovery after link failure. In addition, device failures, such as controller and link failures, are detected by xe2x80x98heartbeatxe2x80x99 monitoring by each array controller. Furthermore, no special host software is required to implement the above features because all replication functionality is totally self contained within each array controller and automatically done without user intervention.
An additional aspect of the present system is the ability to function over two links simultaneously with data replication traffic. If failure of a link occurs, as detected by the xe2x80x98initiatorxe2x80x99 array controller, that array controller will automatically xe2x80x98failoverxe2x80x99, or move the base of data replication operations to its partner controller. At this time, ail transfers in flight are discarded, and therefore discarded to the host. The host simply sees a controller failover at the host OS (operating system) level, causing the OS to retry the operations to the partner controller. The array controller partner continues all xe2x80x98initiatorxe2x80x99 operations from that point forward. The array controller whose link failed will continuously watch that status of its link to the same controller on the other xe2x80x98farxe2x80x99 side of the link. That status changes to a xe2x80x98goodxe2x80x99 link when the array controllers have established reliable communications between each other. When this occurs, the array controller xe2x80x98initiatorxe2x80x99 partner will xe2x80x98failbackxe2x80x99 the link, moving operations back to newly reliable link. This procedure re-establishes load balance for data replication operations automatically, without requiring additional features in the array controller or host beyond what is minimally required to allow controller failover.
Because the present system provides grouping logical units, into xe2x80x98association setsxe2x80x99, the system provides for proper ordering of I/O operations during logging across multiple volumes. A further benefit of association sets is providing failure consistency across the logical unit group to ensure that all the volumes fail if one member fails, so that the remote site will have a consistent view of the data up to the point of failure.