Recent years have seen a proliferation of computers and storage subsystems. Demand for storage capacity grows by over seventy-five percent each year. Early computer systems relied heavily on direct-attached storage (DAS) consisting of one or more disk drives coupled to a system bus. More recently, network-attached storage (NAS) and storage area network (SAN) technology are used to provide storage with greater capacity, higher reliability, and higher availability. The present invention is directed primarily at SAN systems that are designed to provide shared data storage that is beyond the ability of a single host computer to efficiently manage.
Mass data storage systems are implemented in networks or fabrics that provide means for communicating data between systems that use data, and the storage systems that implement the physical storage. In many cases, host computers act as storage servers and are coupled to the network and configured with several disk drives that cumulatively provide more storage capacity or different storage functions (e.g., data protection) than could be implemented by a DAS system. For example, a server dedicated to data storage can provide various degrees of redundancy and mirroring to improve access performance, availability and reliability of stored data. Collecting storage sub-systems, where a separate server manages each sub-system, can form a large storage system. More recently, virtualized storage systems such as the StorageWorks® Enterprise Virtual Array announced by COMPAQ Corporation in October, 2001 provide storage controllers within a fabric or network that present virtualized storage to hosts that require data storage in a manner that enables the host to be uninvolved in the physical configuration, allocation and management of the storage devices. StorageWorks is a registered trademark of COMPAQ Computer Corporation. In this system, hosts simply access logical units of storage that appear to the host as a range of logical address space. Virtualization improves performance and utilization of storage.
SAN systems enable the possibility of storing multiple copies or “replicas” of data at various physical locations throughout the system. Data replication across multiple sites is desirable for a variety of reasons. To provide disaster tolerance, copies of data stored at different physical locations are desired. When one copy becomes unavailable due to equipment failure, a local network outage, natural disaster or the like, a replica located at an alternate site can allow access to the data. Replicated data can also theoretically improve access in normal operation in that replicas can be accessed in parallel, avoiding bottlenecks associated with accessing a single copy of data from multiple systems. However, prior systems were organized such that one site had a primary role and another site was a replica. Access requests were handled by the primary site until failure, at which time the replica became active. In such architecture, the replica provided little benefit until failure. Similarly, the resources allocated to creating and managing replicas provided minimal load balancing benefit that would enable data access requests to be directed intelligently to replicas such that resources were used more efficiently. Moreover, when multiple replicas are distributed throughout a network topology, it would be beneficial if network delays associated with accessing a topologically remote storage subsystem could be lessened.
In the past, managing a data replication system required significant time and expense. This time and expense was often related to tasks involved in setting up and configuring data replication on a SAN. Physical storage devices between original and replica locations had to be closely matched which could require knowledge at the spindle level to set up a storage site to hold a replica. Similarly detailed knowledge of the physical devices at a storage site were required to set up logging of replication operations. Moreover, the logical structures used to represent, access and manage the stored data had to be substantially identically reproduced at each storage site. Many of these operations required significant manual intervention, as prior data replication architectures were difficult to automate. This complexity made it difficult if not impossible to expand the size of a replicated volume of storage, as the changes on one site needed to be precisely replicated to the other site. A need exists to provide data replication systems in a SAN that enable functions involved in setup and configuration of a replication system to be automated, and allow the configuration to be readily expanded.
It is desirable to provide the ability for rapid recovery of user data from a disaster or significant error event at a data processing facility. This type of capability is often termed “disaster tolerance”. In a data storage environment, disaster tolerance requirements include providing for replicated data and redundant storage to support recovery after the event. In order to provide a safe physical distance between the original data and the replicated data, the data is migrated from one storage subsystem or physical site to another subsystem or site. It is also desirable for user applications to continue to run while data replication proceeds in the background. Data warehousing, “continuous computing”, and enterprise applications all benefit from remote copy capabilities.
Originally, data replication involved pairs of physical storage devices at a source location and a destination location. The source location operated as a primary data store to handle operational data transactions, whereas the destination location operated as a secondary data store to store copies of data from the source location. The destination location was configured with storage devices of exactly the same capacity and configuration as the source location so that a write transaction to the source location could be duplicated at the destination location.
The necessity of having similarly sized and configured physical devices imposed significant constraints. For example, the physical storage at the destination was dedicated to the corresponding source device, and could not be readily allocated to another source location. Changing the size of the source device required changing the size of the destination device. The destination devices had to be large enough to hold the entire source device capacity, which meant in most cases that large amounts of the destination devices were not used. Moreover, the process of copying data from the source to destination involved byte-by-byte copying of the entire source volume, even where the source volume was sparsely populated. For large volumes in the gigabyte range, this process could easily take hours (or days), during which time the source volume remained unavailable for operational data transactions.
COMPAQ Corporation introduced a data replication management product in its Array Controller Software (ACS) operating on an HSG8O storage controller and described in U.S. Pat. No. 6,601,187 assigned to the assignee of the present application and incorporated herein by reference. This system implemented architecture with redundant storage controllers at each site. Two sites could be paired to enable data replication. While effective, the HSG8O architecture did not virtualize storage at the controller level. Storage virtualization is the transparent abstraction of storage at the block level. Virtualization separates logical data access from physical per-disk data access. Virtualization can occur at any level of a SAN including the server level, fabric level, and storage system level. The inability to virtualize storage at the controller level in prior data replication systems resulted in some inflexibility and inefficiency.
For example, it was prohibitively difficult to increase the size of a replicated volume of storage as the increase had to be implemented precisely at each site. Increasing the size of a logical unit of storage could not be entirely automated. Moreover, the process of copying data from the source to the destination involved byte-by-byte copying of the entire source volume, even where the source volume was sparsely populated. For large volumes in the gigabyte range, this process could take hours (or days).
In such systems, the storage capacity was rigidly allocated at the time a data replication set was created, and data was copied from source to destination before the source was allowed to continue operation. Hence, such systems inherited many of the limitations of non-virtual storage. For example, destination volumes were fully allocated such that unused storage in the source disks was replicated in the destination disks. Also, the source and destination disks required identical configuration such that the data protection level, for example, had to match between the source and destination devices.
Therefore, there remains a need in the art for a data storage system capable of providing data replication services in an expeditious manner with little operational downtime. Moreover, a data storage system is needed that copies data efficiently between locations and allows different members of a data replication set to implement various levels of data protection to meet the needs of a particular application.