Recent years have seen a proliferation of computers and storage subsystems. Early computer systems relied heavily on direct-attached storage (DAS) consisting of one or more disk drives coupled to a system bus. More recently, network-attached storage (NAS) and storage area network (SAN) technology is used to provide storage with greater capacity, higher reliability, and higher availability.
Mass data storage systems are implemented in networks or fabrics that provide means for communicating data between systems that use data, and the storage systems that implement the physical storage. In many cases, host computers act as storage servers and are coupled to the network and configured with several disk drives that cumulatively provide more storage capacity or different storage functions (e.g., data protection) than could be implemented by a DAS system. For example, a server dedicated to data storage can provide various degrees of redundancy and mirroring to improve access performance, availability and reliability of stored data. Collecting storage sub-systems, where a separate server manages each sub-system, can form a large storage system.
More recently, virtualized storage systems such as the Storageworks Enterprise Virtual Array announced by Compaq Corporation in October, 2001 provide storage controllers within a fabric or network that present virtualized storage to hosts that require data storage in a manner that enables the host to be uninvolved in the physical configuration, allocation and management of the storage devices. In this system, hosts simply access logical units of storage that appear to the host as a range of logical address space. Virtualization improves performance and utilization of storage.
SAN systems enable the possibility of storing multiple copies or “replicas” of data at various physical locations throughout the system. Data replication across multiple sites is desirable for a variety of reasons. To provide disaster tolerance, copies of data stored at different physical locations is desired. When one copy becomes unavailable due to equipment failure, a local network outage, natural disaster or the like, a replica located at an alternate site can allow access to the data. Replicated data can also theoretically improve access in normal operation in that replicas can be accessed in parallel, avoiding bottlenecks associated with accessing a single copy of data from multiple systems.
However, prior storage systems were organized such that one site had a primary role and another site was a replica. Access requests were handled by the primary site until failure, at which time the replica became active. In such an architecture, the replica provided little benefit until failure. Similarly, the resources allocated to creating and managing replicas provided minimal load balancing benefit that would enable data access requests to be directed intelligently to replicas such that resources were used more efficiently. Moreover, when multiple replicas are distributed throughout a network topology, it would be beneficial if network delays associated with accessing a topologically remote storage subsystem could be lessened.
It is desirable to provide the ability for rapid recovery of user data from a disaster or significant error event at a data processing facility. This type of capability is often termed ‘disaster tolerance’. In a data storage environment, disaster tolerance requirements include providing for replicated data and redundant storage to support recovery after the event. In order to provide a safe physical distance between the original data and the data to be backed up, the data is migrated from one storage subsystem or physical site to another subsystem or site. It is also desirable for user applications to continue to run while data replication proceeds in the background. Data warehousing, ‘continuous computing’, and enterprise applications all benefit from remote copy capabilities.
Compaq Corporation introduced an array controller referred to as the HSG80, that implemented Data Replication Management (DRM) features, as described in U.S. Pat. No. 6,601,187 assigned to the assignee of the present application and incorporated herein by reference, that implemented an architecture utilizing redundant storage controllers. While effective, each of the controllers comprised one port that was dedicated to user data, and a separate port that was dedicated to data replication functions. In general, the HSG80 architecture defined relatively constrained roles for each network element. That is to say, data replication was managed between a defined pair of sites, where one element of the pair was designated in a primary role, and the other element of the pair was designated in a replica role. Despite the fact that each controller had two ports for communicating with other controllers, one of the ports was constrained in the role of handling user data, and the other port was constrained in the role of handling data replication. While easing implementation, these designated roles limited the flexibility and functionality with which the data replication could be performed.
Similarly, prior data replication management solutions simplified the management problems by assigning fixed roles to storage locations. A particular storage site would be designated as a primary when it handled operational data traffic, and another site would be designated only as a secondary or backup site. Such architectures were unidirectional in that the backup site was not available for operational data transactions until the failure of the primary site. Such rigidly assigned roles limited the ability to share storage resources across diverse users and applications. Moreover, configuration of such systems was complex as it was necessary to access and program storage controllers at both the primary and secondary sites specifically for their designated roles. This complexity made it impractical to expand data replication to more than two sites.
Therefore, there remains a need in the art for a data storage system capable of providing flexible data replication services without the direct involvement of the host computer. Moreover, a data storage system is needed that is readily extensible to provide multiple replication, load balancing, and disaster tolerance without limitations imposed by designating rigid roles for the system components.