Data replication involves a variety of techniques and mechanisms that operate to copy or replicate data between locations in a distributed computing system. By creating multiple copies of data, the data remains available in the event of a disaster at one of the locations. This is typically referred to as “fault tolerance” and is very important to databases. Moreover, in systems where each copy of the data can support data access operations such as read, write, or both, data can be accessed more quickly and by more users at the same time, thereby improving performance. The improved performance is useful in a variety of information technology applications such as file serving, application serving, and the like.
Data replication management generally involves systems and methods for creating storage devices to contain data, organizing the storage devices into replication groups, and determining when and how data will be copied between the devices. This includes replication protocols, mechanisms for ensuring timely synchronization, failover, data access request redirection, and the like. In many systems, data replication management may be performed by a storage controller that offloads the operations related to processing data access operations and data replication operations from host processors that use the data.
From the perspective of a host computer using stored data, it is desirable that the replication mechanism be as invisible as possible. To this end, storage controllers present a single logical unit (LUN) of storage even though the storage is physically implemented in more than one location. The host will conduct operational data transfers by addressing a read or write operation to the desired LUN, and the controller implements processes that execute the read or write operation appropriately. A DRM system typically designates one controller as “active” meaning that it handles the read/write request in the first instance, and a second controller as “passive” in that it acts as a backup to the active controller, but otherwise does not participate in operational data transactions. This “active-passive” architecture simplifies implementation and implies an order for executing every operation so that data at the active and passive locations remain consistent. Upon failure of an active controller, the passive controller is placed in the active role, and handles subsequent access request in the first instance until a second controller can be brought online and data replication completed.
Compaq Corporation introduced a data replication management product called the HSG80, described in U.S. patent application Ser. No. 09/539,745 and U.S. patent application Ser. No. 09/538,680 assigned to the assignee of the present application and incorporated herein by reference, that implemented an architecture with redundant storage controllers. While effective, each of the controllers could only interact with a single other controller. With respect to a given data set, each controller was either in a role of being primary, or a secondary, and switching from a secondary role to a primary role was a non-trivial event that took place at failover. Controllers were set up as primary or secondary when initially configured, and changing that configuration at failover involved several manual tasks at the controller and at the host level. This switchover typically required rebooting the host, and sometimes rebooting the secondary controller to change its role, a disruptive process. Because of this rigid role assignment, a primary controller could not operate with multiple secondary controllers, and a secondary controller could not, in turn, act as a primary controller with respect to other controllers.
The rigid role assignment made it difficult to have two controllers that were active with respect to a given copy set. While the Ser. No. 09/538,680 application describes a configuration that is nominally active-active, only one controller was active with respect to a given host for a copy set at any instant in time, hence only one controller would process that host's write requests. This is useful in that a given storage controller could be active for a first host and another storage controller active for a second host, thereby efficiently using the storage controllers' resources.
However in this system, a given host could not see more than one active controller for a given data set. Each data set included one or more LUNs, some of which were primary LUNs and others of which were secondary LUNs from the perspective of each controller. Each LUN had a unique identification called a world wide LUN identifier (WWLID) and controllers were configured such that one WWLID would identify the initiator (primary) LUN, and another WWLID would identify the target (secondary) LUN. The controller only presented the WWLID of the initiator LUN to the host. Hence, a given host was unaware, until failover, that the target LUN existed. At failover, the controllers would be altered such that the source and destination LUN WWIDs were the same (i.e., taking on the value of the non-failing LUN).
While this architecture allowed both controllers to handle operational data access requests from hosts, it retains a paradigm in which for a given data transaction from a host, a single specified LUN was in a rigid role of a initiator and another specific LUN was in a rigid role of the target. A host could not see all of the LUNs involved in a particular copy set, only the single LUN designated as an initiator for that host. A host had to direct a request to the initiator LUN until a failure condition occurred. In practice, the architecture did not allow scaling to copy sets at more than two locations. Extending a bi-directional system to perform multi-directional replication increases complexity significantly. Hence, the protocols for data replication operations are not directly applicable to a system where more than one replica exits in a copy set.
A particular operation that has been difficult to manage in conventional systems involves reservations, such as SCSI reservations, that manage exclusive access to a LUN or a portion of a LUN. Reservations are used to enable multiple hosts to share access to a LUN while maintaining integrity of the data in the event that two hosts attempt to write to the same block of data substantially simultaneously. SCSI provides two methods for managing reservations. A conventional reservation is managed by an initiator device that places a reservation or lock on other LUNs, then releases that reservation when it is no longer needed. A persistent reservation effects similar data protection, but is intended to survive failure of the initiator device. Hence, in a persistent reservation the reservation must be maintained in a persistent database that can be accessed in the event of a device failure.
The SCSI reservation mechanism was designed for storage systems with multiple hosts accessing a single shared storage resource, hence, persistent reservations could be implemented by appropriate communication between hosts that shared the LUN, or by a data structure storing a persistent reservation database that was independent of a controller. However, in a data replication system a significantly different environment exists, namely, there are multiple LUNs and multiple hosts. Conventional systems would allow only one LUN to be active in a copy set at any time, therefore solving the reservation issue by ensuring that reservations would be handled by a particular controller until failover. However, this solution does not extend to an environment where any LUN in a copy set may be active, and therefore a reservation received by any LUN must be propagated to all replicas to ensure exclusive access performance expected by the hosts. It is desirable to relieve the hosts of responsibility for ensuring that reservations are performed against all replicated LUNs. Moreover, it is desirable that the storage system handle persistent reservations between the various replicas such that if one controller fails or becomes unavailable, the reservation is properly implemented
The term ‘site failover’ is used in the lexicon of disaster tolerant storage systems to describe operations executed by the storage network that permit the network to remain operational to a user in the event of a failure or unplanned downtime of a primary storage site. Existing storage network systems require manual intervention to implement a site failover, which may be unacceptable for users that require little or no downtime.
Therefore, there remains a need in the art for a data storage system capable of providing flexible data replication services without the direct involvement of the host computer. Moreover, a data storage system is needed that is readily extensible to provide multiple replication, load balancing, and failover to support disaster tolerance without limitations imposed by designating rigid roles for the system components.