Data processing systems typically require a large amount of data storage. Effective data processing systems efficiently access, modify, and re-store data within the data storage. Data storage is typically separated into several different levels depending on the time to access the data or the cost to store the data. A first, or highest level of data storage involves electronic memory, usually dynamic or static random access memory (DRAM or SRAM). Electronic memories take the form of semiconductor integrated circuits wherein millions of bytes of data can be stored on each circuit, with access to such bytes of data measured in nanoseconds. The electronic memory provides the fastest access to data since access is entirely electronic.
A second level of data storage usually involves direct access storage devices (DASD). DASD storage, for example, can comprise magnetic and/or optical disks. Data bits are stored as micrometer sized magnetically or optically altered spots on a disk surface which represent the "ones" and "zeros" that comprise the binary value of the data bits. Magnetic DASD includes one or more disks that are coated with remnant magnetic material. The disks are rotatably mounted within a protected environment. Each disk is divided into many concentric tracks, or closely spaced circles. The data is stored serially, bit by bit, along each track. An access mechanism, known as a head disk assembly (HDA), typically includes one or more read/write heads, and is provided in each DASD for moving across the tracks to transfer the data to and from the surface of the disks as the disks are rotated past the read/write heads. DASDs can store gigabytes of data with the access to such data typically measured in milliseconds (orders of magnitudes slower than electronic memory). Access to data stored on DASD is slower due to the need to physically position the disk and HDA to the desired data storage location.
A third or lower level of data storage includes tapes, tape libraries, and optical disk libraries. Access to data is much slower in a library since a robot is necessary to select and load the needed data storage medium. An advantage of these storage systems is the reduced cost for very large data storage capabilities, on the order of terabytes of data. Tape storage is often used for back-up purposes. That is, data stored at the second level of the data storage hierarchy is reproduced for safe keeping on magnetic tape. Access to data stored on tape and/or in a library is presently on the order of seconds.
Having a back-up data copy is mandatory for many businesses as data loss could be catastrophic to the business. The time required to recover data lost at the primary storage level is also an important recovery consideration. An improvement in speed over tape or library back-up, includes dual copy. An example of dual copy involves providing additional DASD's so that data is written to the additional DASDs (sometimes referred to as mirroring). Then if the primary DASDs fail, the secondary DASDs can be depended upon for data. A drawback to this approach is that the number of required DASDs is doubled.
Another data back-up alternative that overcomes the need to provide double the storage devices involves writing data to a redundant array of inexpensive devices (RAID). In this configuration, the data is written such that the data is apportioned amongst many DASDs. If a single DASD fails, then the lost data can be recovered by using the remaining data and error correction procedures. Currently there are several different RAID configurations available.
The aforementioned back-up solutions are generally sufficient to recover data in the event that a storage device or medium fails. These back-up methods are useful only for device failures since the secondary data is a mirror of the primary data, that is, the secondary data has the same volume serial numbers (VOLSERs) and DASD addresses as the primary data. Data recovery due to system failures or storage controller failures, on the other hand, is not available using mirrored secondary data. Hence still further protection is required for recovering data if a disaster occurs destroying the entire system or even the site, for example, earthquakes, fires, explosions, hurricanes, etc. Disaster recovery requires that the secondary copy of data be stored at a location remote from the primary data. A known method of providing disaster protection is to periodically back-up data to tape, such as a daily or weekly basis. The tape is then picked up by a vehicle and taken to a secure storage area usually located kilometers from the primary data location. This back-up plan has problems: it could take days to retrieve the back-up data, additional data is lost waiting for the back-up data to be recovered, or the same disaster could also destroy the storage location. A slightly improved back-up method would transmit data to a back-up location each night. This allows the data to be stored at a more remote location. Again, some data may be lost between back-ups since back-up does not occur continuously, as in the dual copy solution. Hence, a substantial amount of data could still be lost and this may be unacceptable to some users.
More recently introduced data disaster recovery solutions include remote dual copy wherein data is backed-up not only remotely, but also continuously (either synchronously or asynchronously). In order to communicate duplexed data from one host processor to another host processor, or from one storage controller to another storage controller, or some combination thereof, a substantial amount of control data is required for realizing the process. A high overhead, however, can interfere with a secondary site's ability to keep up with a primary site's processing, thus threatening the ability of the secondary site to be able to recover the primary in the event a disaster occurs.
Disaster recovery protection for the typical data processing system requires that primary data stored on primary DASDs be backed-up at a secondary or remote location. The distance separating the primary and secondary locations depends upon the level of risk acceptable to the user, and can vary from several kilometers to thousands of kilometers. The secondary or remote location, in addition to providing a back-up data copy, must also have enough system information to take over processing for the primary system should the primary system become disabled. This is due in part because a single storage controller does not write data to both primary and secondary DASD strings at the primary and secondary sites. Instead, the primary data is stored on a primary DASD string attached to a primary storage controller while the secondary data is stored on a secondary DASD string attached to a secondary storage controller.
The secondary site must not only be sufficiently remote from the primary site, but must also be able to back-up primary data in real time. The secondary site needs to back-up primary data as the primary data is updated with some minimal delay. Additionally, the secondary site has to back-up the primary data regardless of the application program (e.g., IMS, DB2) running at the primary site and generating the data and/or updates. A difficult task required of the secondary site is that the secondary data must be order consistent, that is, secondary data is copied in the same sequential order as the primary data (sequential consistency) which requires substantial system considerations. Sequential consistency is complicated by the existence of multiple storage controllers each controlling multiple DASDs in a data processing system. Without sequential consistency, secondary data inconsistent with primary data would result, thus corrupting disaster recovery.
Remote data duplexing falls into two general categories, synchronous and asynchronous. Synchronous remote copy involves sending primary data to the secondary location and confirming the reception of such data before ending a primary DASD input/output (I/O) operation (providing a channel end (CE) and device end (DE) to the primary host). Synchronous copy, therefore, slows the primary DASD I/O response time while waiting for secondary confirmation. Primary I/O response delay is increased proportionately with the distance between the primary and secondary systems--a factor that limits the remote distance to tens of kilometers. Synchronous copy, however, provides sequentially consistent data at the secondary site with relatively little system overhead.
Synchronous remote copy for disaster recovery also requires that paired DASD volumes form a set. The DASD volumes at the secondary site essentially form a "duplex pair" with the corresponding DASD volumes at the primary site. Forming such a set further requires that a sufficient amount of system information be provided to the secondary site for identifying those DASD volumes (VOLSERs) which pair with DASD volumes at the primary site. The secondary site must also recognize when a DASD volume is "failed duplex", when a DASD at the secondary site is no longer synchronized with its primary site counterpart. The primary site can suspend remote copy to allow the primary site to continue transferring data updates while these updates are queued for the secondary site. The primary site marks these updates to show the secondary site is no longer synchronized.
Synchronous remote copy disaster recovery systems have the desired ability to suspend the remote copy pair and queue the updates to be subsequently transferred to the secondary site because of their synchronous design. The host application at the primary site cannot start the next I/O transfer to the primary storage controller until the previous I/O transfer has been synchronized at the secondary site. If the previous I/O was not successfully transmitted to the secondary site, the remote copy pair is suspended before the subsequent I/O transfer is started. Thus, the subsequent I/O transfers to this remote copy pair can be queued for later transmittal to the secondary site once the remote copy pair is re-established.
Asynchronous remote copy provides better primary application system performance because the primary DASD I/O operation is completed (providing a channel end (CE) and device end (DE) to the primary host) before data is confirmed at the secondary site. Therefore, the primary DASD I/O response time is not dependent upon the distance to the secondary site and the secondary site could be thousands of kilometers remote from the primary site. A greater amount of system overhead is required, however, for ensuring data sequence consistency since data received at the secondary site will often not be in order of the primary updates. A failure at the primary site could result in some data being lost that was in transit between the primary and secondary locations.
Further, certain errors in the data processing system at the primary site, either in the host application or in the storage subsystem, can cause the termination of the remote copy function. Unlike synchronous remote copy designs, current asynchronous remote copy systems typically cannot suspend the remote copy duplex pair. Once remote copy has been terminated, resumption of the remote copy function requires all data from the primary DASDs to be copied to the secondary DASDs to ensure re-synchronization of the two sites.
While remote data duplexing provides sufficient solutions for disaster recovery, remote data duplexing does not provide an efficient means for recovery when the storage controller at the primary site becomes inaccessible. Two common reasons for the primary storage controller becoming inaccessible are an error occurs in the storage controller or the storage controller is temporarily disabled for planned maintenance. In either a dual copy or remote data duplexing environment, either planned maintenance or an error in the storage controller causes the applications running in the attached host processors to terminate their I/O to all data storage devices connected to the storage controller. In addition, all duplex pairs associated with the primary data storage devices connected to the primary storage controller are failed and reported to the associated host processor applications as failed duplex. Thus, when the storage controller has recovered from its error or planned maintenance, all primary data storage devices must be re-synchronized with their corresponding shadowed data storage devices at the secondary site before the data processing system may continue as a synchronized disaster recovery system across the two sites.
Accordingly, a method in a remote data duplexing system is needed to provide access to data stored on a primary storage device from an application running in the primary host processor when an error occurs at the primary storage controller. In addition, such method may also provide more direct access to data on a failed primary storage device for an application running in the primary host processor.