The present invention relates to a storage system, more particularly, to a storage system distributed in multiple sites.
Data is the underlying resources on which all computing processes are based. With the recent explosive growth of the Internet and e-business, the demand on data storage systems has increased tremendously. Types of storage systems include a network-attached storage (NAS) or storage area network (SAN). A NAS uses IP over Ethernet to transports data in file formats between storage servers and their clients. In NAS, an integrated storage system, such as a disk array or tape device, connects directly to a messaging network through a local area network (LAN) interface, such as Ethernet, using messaging communications protocols like TCP/IP. The storage system functions as a server in a client-server system.
Generally, a SAN is a dedicated high performance network to move data between heterogeneous servers and storage resources. Unlike NAS, a separate dedicated network is provided to avoid any traffic conflicts between client and servers on the traditional messaging network. A SAN permits establishment of direct connections between storage resources and processors or servers. A SAN can be shared between servers or dedicated to a particular server. It can be concentrated in a single locality or extended over geographical distances. SAN interfaces can be various different protocols, such as Fibre Channel (FC), Enterprise Systems Connection (ESCON), Small Computer Systems Interface (SCSI), Serial Storage Architecture (SSA), High Performance Parallel Interface (HIPPI), or other protocols as they emerge in the future. For example, the Internet Engineering Task Force (IETF) is developing a new protocol or standard iSCSI that would enable block storage over TCP/IP, while some companies are working to offload the iSCSI-TCP/IP protocol stack from the host processor to make iSCSI a dominant standard for SANs.
A SAN is commonly used with distributed storage systems having storage sites distributed in a plurality of locations. These sites or data centers may be provided in relatively close proximity, e.g., within 10 miles, or far apart, e.g., 100 miles or more apart. The distributed storage system may be used to store redundant data for data security or to place the data centers in close proximity to the distributed business centers of an enterprise. The distributed or clustered systems are also used provide high speed data access to users of online services.
Currently, two operational modes are commonly used by storage systems to transfer data from one storage system (primary system) to another storage system (secondary system): synchronous mode and asynchronous mode. In synchronous mode, a write request from a host to the primary storage system completes only after write data are copied to the secondary storage system and acknowledge thereof has been made. Accordingly, this mode guarantees no loss of data at the secondary system since the write data from the host is stored in the cache of the primary system until the acknowledgement has be received from the secondary system. In addition, the primary volume (PVOL) in the primary storage system and the secondary volume (SVOL) in the secondary storage system are identically maintained, so that the SVOL can be used promptly used to replace the PVOL if the PVOL experiences failure. However, the primary and secondary storage systems cannot be placed too far apart, e.g., over 100 miles, under this mode. Otherwise, the storage system may not efficiently execute write requests from the host.
In asynchronous mode, a write request from a host to the primary storage system completes upon storing write data only to the primary system. The write data is then copied to the secondary storage system. That is, the data write to the primary storage system is an independent process from the data copy to the secondary storage system. Accordingly, the primary and secondary systems may be placed far apart from each other, e.g., 100 miles or greater. However, data may be lost if the primary system does down since the PVOL and SVOL identically maintained. Accordingly, it would be desirable to provide a data storage system or remote copy system that provides the benefits of the synchronous and asynchronous modes, i.e., enables the primary and secondary systems to be placed far apart while guaranteeing no data loss. An exemplary asynchronous remote copy method is disclosed in U.S. Pat. No. 6,408,370, to Yamamoto et. al, which is incorporated by reference.
In order to manage the distributed system above, it is important to know whether or not the sites are functioning properly or experiencing problem. This information is obtained using heartbeat signals. Accordingly, the reliable transmission of heartbeat signals is needed. However, the heartbeat signals are communicated using TCP/IP links by most storage units at this time. TCP/IP links, although widely used for its scalability and flexibility, are considered to be less reliable than other types of communication links, e.g., FibreChannel or ESCON. The heartbeat signals or information preferably should detailed, so that appropriate actions may be taken with minimal delay.