1. Field of the Invention
The present invention generally relates to data backup systems. More particularly, the invention concerns a data storage system with primary and redundant backup storage, where the system automatically switches to the mirroring backup storage when an error occurs at the primary storage, and any reservation of the primary storage to a particular host is honored by the secondary storage.
2. Description of the Related Art
Many data processing systems require a large amount of data storage, for use in efficiently accessing, modifying, and re-storing data. Data storage is typically separated into several different levels, each level exhibiting a different data access time or data storage cost. A first, or highest level of data storage involves electronic memory, usually dynamic or static random access memory (DRAM or SRAM). Electronic memories take the form of semiconductor integrated circuits where millions of bytes of data can be stored on each circuit, with access to such bytes of data measured in nanoseconds. The electronic memory provides the fastest access to data since access is entirely electronic.
A second level of data storage usually involves direct access storage devices (DASD). DASD storage, for example, includes magnetic and/or optical disks. Data bits are stored as micrometer-sized magnetically or optically altered spots on a disk surface, representing the "ones" and "zeros" that comprise the binary value of the data bits. Magnetic DASD includes one or more disks that are coated with remnant magnetic material. The disks are rotatably mounted within a protected environment. Each disk is divided into many concentric tracks, or closely spaced circles. The data is stored serially, bit by bit, along each track. An access mechanism, known as a head disk assembly (HDA) typically includes one or more read/write heads, and is provided in each DASD for moving across the tracks to transfer the data to and from the surface of the disks as the disks are rotated past the read/write heads. DASDs can store gigabytes of data, and the access to such data is typically measured in milliseconds (orders of magnitudes slower than electronic memory). Access to data stored on DASD is slower than electronic memory due to the need to physically position the disk and HDA to the desired data storage location.
A third or lower level of data storage includes tapes, tape libraries, and optical disk libraries. Access to library data is much slower than electronic or DASD storage because a robot is necessary to select and load the needed data storage medium. An advantage of these storage systems is the reduced cost for very large data storage capabilities, on the order of terabytes of data. Tape storage is often used for backup purposes. That is, data stored at the higher levels of data storage hierarchy is reproduced for safe keeping on magnetic tape. Access to data stored on tape and/or in a library is presently on the order of seconds.
Having a backup data copy is mandatory for many businesses for which data loss would be catastrophic. The time required to recover lost data is also an important recovery consideration. With tape or library backup, primary data is periodically backed-up by making a copy on tape or library storage. One improvement over this arrangement is "dual copy," which mirrors contents of a primary device with a nearly identical secondary device. An example of dual copy involves providing additional DASDs so that data is written to the additional DASDs substantially in real time along with the primary DASDs. Then, if the primary DASDs fail, the secondary DASDs can be used to provide otherwise lost data. A drawback to this approach is that the number of required DASDs is doubled.
A different data backup alternative that avoids the need to provide double the storage devices involves writing data to a redundant array of inexpensive devices (RAID). In this configuration, the data is apportioned among many DASDs. If a single DASD fails, then the lost data can be recovered by applying error correction procedures to the remaining data. Several different RAID configurations are available.
The foregoing backup solutions are generally sufficient to recover data in the event that a storage device or medium fails. These backup methods are useful only for device failures since the secondary data is a mirror of the primary data, that is, the secondary data has the same volume serial numbers (VOLSERs) and DASD addresses as the primary data. Data recovery due to system failures or storage controller failures, on the other hand, is not available using mirrored secondary data. Hence still further protection is required for recovering data if the entire system or even the site is destroyed by a disaster such as an earthquake, fire, explosion, hurricane, etc. Disaster recovery requires that the secondary copy of data be stored at a location remote from the primary data. A known method of providing disaster protection is to periodically backup data to tape, such as a daily or weekly basis. The tape is then picked up by a vehicle and taken to a secure storage area usually located kilometers from the primary data location. Nonetheless, this backup plan has its problems. First, it may take days to retrieve the backup data, and additional data is lost waiting for the backup data to be recovered. Furthermore, the same disaster may also destroy the storage location. A slightly improved backup method transmits data to a backup location each night. This allows the data to be stored at a more remote location. Again, some data may be lost between backups since backups do not occur continuously, as in the dual copy solution. Hence, a substantial amount of data may still be lost and this may be unacceptable to some users.
More recently introduced data disaster recovery solutions include "remote dual copy," where data is backed-up not only remotely, but also continuously (either synchronously or asynchronously). In order to communicate duplexed data from one host processor to another host processor, or from one storage controller to another storage controller, or some combination thereof, a substantial amount of control data is required for realizing the process. A high overhead, however, can interfere with a secondary site's ability to keep up with a primary site's processing, thus threatening the ability of the secondary site to be able to recover the primary in the event a disaster occurs.
Disaster recovery protection for the typical data processing system requires that primary data stored on primary DASDs be backed-up at a secondary or remote location. The physical distance separating the primary and secondary locations can be set depending upon the level of risk acceptable to the user, and can vary from several kilometers to thousands of kilometers. The secondary or remote location, in addition to providing a backup data copy, must also have enough system information to take over processing for the primary system should the primary system become disabled. This is due in part because a single storage controller does not write data to both primary and secondary DASD strings at the primary and secondary sites. Instead, the primary data is stored on a primary DASD string attached to a primary storage controller while the secondary data is stored on a secondary DASD string attached to a secondary storage controller.
The secondary site must not only be sufficiently remote from the primary site, but must also be able to backup primary data in real time. The secondary site needs to backup primary data in real time as the primary data is updated, with some minimal delay. Additionally, the secondary site has to backup the primary data regardless of the application program (e.g., IMS, DB2) running at the primary site and generating the data and/or updates. A difficult task required of the secondary site is that the secondary data must be "order consistent," that is, secondary data is copied in the same sequential order as the primary data (sequential consistency) which requires substantial system considerations. Sequential consistency is complicated by the existence of multiple storage controllers each controlling multiple DASDs in a data processing system. Without sequential consistency, secondary data inconsistent with primary data would result, thus corrupting disaster recovery.
Remote data duplexing falls into two general categories, synchronous and asynchronous. Synchronous remote copy involves sending primary data to the secondary location and confirming the reception of such data before ending a primary DASD input/output (I/O) operation (e.g., providing a channel end (CE) and device end (DE) to the primary host). Synchronous copy, therefore, slows the primary DASD I/O response time while waiting for secondary confirmation. Primary I/O response delay is increased proportionately with the distance between the primary and secondary systems, a factor that limits the remote distance to tens of kilometers. Synchronous copy, however, provides sequentially consistent data at the secondary site with relatively little system overhead.
Synchronous remote copy for disaster recovery also requires that paired DASD volumes form a set. The DASD volumes at the secondary site essentially form a "duplex pair" with the corresponding DASD volumes at the primary site. Forming such a set further requires that a sufficient amount of system information be provided to the secondary site for identifying those DASD volumes (VOLSERs) that pair with DASD volumes at the primary site. The secondary site must also recognize when a DASD volume is "failed duplex," i.e., when a DASD at the secondary site is no longer synchronized with its primary site counterpart. The primary site can suspend remote copy to allow the primary site to continue locally implementing data updates while these updates are queued for the secondary site. The primary site marks these updates to show the secondary site is no longer synchronized.
Synchronous remote copy disaster recovery systems have the desired ability to suspend the remote copy pair and queue the updates to be subsequently transferred to the secondary site because of their synchronous design. The host application at the primary site cannot start the next I/O transfer to the primary storage controller until the previous I/O transfer has been synchronized at the secondary site. If the previous I/O was not successfully transmitted to the secondary site, the remote copy pair must be suspended before the subsequent I/O transfer is started. Subsequent I/O transfers to this remote copy pair are queued for later transmittal to the secondary site once the remote copy pair is reestablished.
In contrast to synchronous remote copy, asynchronous remote copy provides better primary application system performance because the primary DASD I/O operation is completed (providing a channel end (CE) and device end (DE) to the primary host) without waiting for data to be confirmed at the secondary site. Therefore, the primary DASD I/O response time is not dependent upon the distance to the secondary site and the secondary site can be thousands of kilometers remote from the primary site. A greater amount of system overhead is required, however, to ensure data sequence consistency since data received at the secondary site can be out of order with respect to the primary updates. Also, a failure at the primary site can result in some data being lost that was in transit between the primary and secondary locations.
Further, certain errors in the data processing system at the primary site, either in the host application or in the storage subsystem, can cause the termination of the remote copy function. Unlike synchronous remote copy designs, most asynchronous remote copy systems cannot suspend the remote copy duplex pair. Once remote copy has been terminated, resumption of the remote copy function requires all data from the primary DASDs to be copied to the secondary DASDs to ensure re-synchronization of the two sites.
One recent development in the area of remote data duplexing has been seamless "switching"(also called "swapping") of host directed I/O operations from a primary storage device to a secondary storage device when a failure occurs on the primary storage controller or a primary storage device. This development was made by IBM engineers, and is known as peer-to-peer dynamic address switching (PDAS). PDAS operates in a "peer-to-peer environment" where the primary storage site transfers its received updates directly to a mirroring backup storage site (the primary's peer). The peer-to-peer environment contrasts with backup environments that use an independent processor, called a "data mover," to retrieve and transfer data between primary and the secondary site.
PDAS operates by first quiescing all I/O operations and record updates targeted to the primary data storage device from application programs of a primary host processor. This technique further verifies that the primary and secondary data storage devices form a remote copy duplex pair in full duplex mode ensuring data integrity in that the secondary data storage is an exact replica of the primary data storage device. Next, the secondary data storage device is swapped with the primary data storage device by terminating the remote copy duplex pair, establishing an opposite direction remote copy duplex pair such that the secondary data storage device is a primary device of the remote copy duplex pair and the primary data storage device is a shadowing device, and then updating the application programs running in the primary host processor with a device address of the secondary data storage device substituted as a device address of the primary data storage device. Finally, PDAS resumes all I/O operations and record updates from the application programs running in the primary host processor such that all subsequent I/O operations and record updates targeted for the primary data storage device are directed through a secondary storage controller to the secondary data storage device. PDAS is more thoroughly discussed in U.S. application Ser. No. 08/614,588, entitled "Concurrent Switch to Shadowed Device for Storage Controller and Device Errors," which was filed on Mar. 13, 1996, in the names of Robert Kern et al., and assigned to IBM. Contents of the foregoing application are hereby incorporated by reference into the present application.
Peer-to-peer dynamic address switching (PDAS) has proven to be a useful addition to peer-to-peer remote copy systems, assisting with the smooth and error-free transition between a failed primary storage site and its mirroring secondary storage site. Even though this development represents a significant advance and enjoys some commercial success today, IBM continually strives to improve the performance and efficiency of their products, including the IBM backup storage systems. In this respect, one possible area of focus concerns the operation of PDAS when the primary storage device is subject to a "reserve" state. Generally, hosts issue reserve commands to logical devices to exclude other hosts from writing to the reserved device. By using reserve commands, the host can protect its ability to update the reserved storage device "atomically" (i.e., without any intervening reads or writes by other hosts). However, the seamless transition between the failed (reserved) primary storage device and its backup counterpart is difficult or impossible when a failure occurs and the primary device is reserved. In some cases where the failed device is reserved, the PDAS operation may even fail. Even if the PDAS operation succeeds, the backup device (now operating as the primary device) will fail to honor any reserves that were active on the primary device upon failure, possibly causing uncompleted operations of the reserving host to fail. Consequently, due to certain unsolved problems, peer-to-peer dynamic address switching (PDAS) may not be completely satisfactory for some particular applications where device reservations are involved.