Authorization Pursuant to 37 C.F.R xc2xa71.17(e)
A portion of the disclosure of this patent document contains command formats and other computer language listings all of which are subject to copyright protection. The copyright owner, EMC Corporation, has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
This invention relates to data storage, and more particularly, to a system and method for automatically providing and maintaining a copy or mirror of data stored at a location geographically remote from the main or primary data storage device.
Nearly all data processing system users are concerned with maintaining back-up data in order to insure continued data processing operations should their data become lost, damaged, or otherwise unavailable.
Large institutional users of data processing systems which maintain large volumes of data such as banks, insurance companies, and stock market traders must and do take tremendous steps to insure back up data availability in case of a major disaster. These institutions recently have developed a heightened awareness of the importance of data recovery and back-up in view of the many natural disasters and other world events including the bombing of the World Trade Center in New York City.
Currently, data processing system users often maintain copies of their valuable data on site on either removable storage media, or in a secondary xe2x80x9cmirroredxe2x80x9d storage device located on or within the same physical confines of the main storage device. Should a disaster such as fire, flood, or inaccessibility to a building occur, however, both the primary as well as the secondary or backed up data will be unavailable to the user. Accordingly, more data processing system users are requiring the remote storage of back up data.
One prior art approach at data back-up involves taking the processor out of service while back-up tapes are made. These tapes are then carried off premises for storage purposes. Should access to the backed up data be required, the proper tape must be located, loaded onto a tape drive, and restored to the host system requiring access to the data. This process is very time consuming and cost intensive, both in maintaining an accurate catalog of the data stored on each individual tape, as well as storing the large number of tapes required to store the large amounts of data required by these institutions. Additionally and most importantly, it often takes twenty-four hours before a back-up tape reaches its storage destination during which time the back-up data is unavailable to the user.
Additionally, today""s systems require a significant amount of planning and testing in order to design a data recovery procedure and assign data recovery responsibilities. Typically, a disaster recovery team must travel to the test site carrying a large number of data tapes. The team then loads the data onto disks, makes the required network connections, and then restores the data to the xe2x80x9ctestxe2x80x9d point of failure so processing can begin. Such testing may take days or even weeks and always involves significant human resources in a disaster recovery center or back-up site.
Some providers of prior art data storage systems have proposed a method of data mirroring whereby one host Central Processing Unit (CPU) or processor writes data to both a primary, as well as a secondary, data storage device or system. Such a proposed method, however, overly burdens the host CPU with the task of writing the data to a secondary storage system and thus dramatically impacts and reduces system performance.
Accordingly, what is required is a data processing system which automatically and asynchronously, with respect to a first host system, generates and maintains a back-up or xe2x80x9cmirroredxe2x80x9d copy of a primary storage device at a location physically remote from the primary storage device, without intervention from the host which seriously degrades the performance of the data transfer link between the primary host computer and the primary storage device.
This invention features a system which controls storing of primary data received from a primary host computer on a primary data storage system, and additionally controls the copying of the primary data to a secondary data storage system controller which forms part of a secondary data storage system, for providing a back-up copy of the primary data on the secondary data storage system which is located in a geographically remote location from the primary data storage system. For remote copying of data from one storage system to the other without host involvement, the primary and secondary data storage system controllers are coupled via at least one high speed communication link such as a fiber optic link driven by LED""s or laser. The high speed communication link also permits one data storage system to read or write data to or from the other data storage system.
At least one of the primary and secondary data storage system controllers coordinates the copying of primary data to the secondary data storage system and at least one of the primary and secondary data storage system controllers maintains at least a list of primary data which is to be copied to the secondary data storage device.
Additionally, the secondary data storage system controller provides an indication or acknowledgement to the primary data storage system controller that the primary data to be copied to the secondary data storage system in identical form as secondary data has been received or, in another embodiment, has actually been written to a secondary data storage device.
Accordingly, data may be transferred between the primary and secondary data storage system controllers synchronously, when a primary host computer requests writing of data to a primary data storage device, or asynchronously with the primary host computer requesting the writing of data to the primary data storage system, in which case the remote data copying or mirroring is completely independent of and transparent to the host computer system.
At least one of the primary data storage system controller and the secondary data storage system controller maintains a list of primary data which is to be written to the secondary data storage system. Once the primary data has been at least received or optionally stored on the secondary data storage system, the secondary data storage system controller provides an indication or acknowledgement of receipt or completed write operation to the primary data storage system.
At such time, the primary and/or secondary data storage system controller maintaining the list of primary data to be copied updates this list to reflect that the given primary data has been received by and/or copied to the secondary data storage system. The primary or secondary data storage system controllers and/or the primary and secondary data storage devices may also maintain additional lists for use in concluding which individual storage locations, such as tracks on a disk drive, are invalid on any given data storage device, which data storage locations are pending a format operation, which data storage device is ready to receive data, and whether or not any of the primary or secondary data storage devices are disabled for write operations.
In accordance with one aspect of the invention, the remote mirroring facility can operate in a specified one of a number of different remote mirroring operating modes for each volume. The operating modes include a synchronous mode, a semi-synchronous mode, an adaptive copy-write pending mode, and an adaptive copy-disk mode. The operating mode for each logical volume can be specified to best suit the purposes of the desired remote mirroring, the particular application using the volume, and the particular use of the data stored on the volume.
In the synchronous mode, data on the primary (R1) and secondary (R2) volumes are always fully synchronized at the completion of an I/O sequence. The data storage system containing the primary (R1) volume informs the host that an I/O sequence has successfully completed only after the data storage system containing the secondary (R2) volume acknowledges that it has received and checked the data. All accesses (reads and writes) to the remotely mirrored volume to which a write has been performed are suspended until the write to the secondary (R2) volume has been acknowledged.
In the semi-synchronous mode, the remotely mirrored volumes (R1, R2) are always synchronized between the primary (R1) and the secondary (R2) prior to initiating the next write operation to these volumes. The data storage system containing the primary (R1) volume informs the host that an I/O sequence has successfully completed without waiting for the data storage system containing the secondary (R2) volume to acknowledge that it has received and checked the data. Thus, a single secondary (R2) volume may lag its respective primary volume (R1) by only one write. Read access to the volume to which a write has been performed is allowed while the write is in transit to the data storage system containing the secondary (R2) volume.
The adaptive copy modes transfer data from the primary (R1) volume to the secondary (R2) volume and do not wait for receipt acknowledgment or synchronization to occur. The adaptive copy modes are responsive to a user-configurable skew parameter specifying a maximum allowable write pending tracks. When the maximum allowable write pending tracks is reached, then write operations are suspended, and in a preferred arrangement, write operations are suspended by defaulting to a predetermined one of the synchronous or asynchronous modes. In the adaptive copy-write pending mode, the write pending tracks accumulate in cache. In the adaptive copy-disk mode, the write pending tracks accumulate in disk memory.
In accordance with another aspect of the invention, there are provided a number of automatic and non-automatic recovery mechanisms. The recovery mechanism can be also selected on a logical volume basis for a desired level of data integrity and degree of operator or application program involvement. The invention also provides various options that provide a tradeoff between the degree of data integrity, cache loading, processing speed, and link traffic.
In one embodiment, cache loading and processing speed is enhanced by queuing pointers to data in cache for transmission to the link, and permitting pending write data to be overwritten in cache. Link traffic can also be reduced in this case, since obsolete write pending data need not be transmitted over the link. However, unless the remote mirroring is operated in the synchronous mode, data integrity is subject to the possibility of a xe2x80x9crolling disaster.xe2x80x9d In the rolling disaster, a remote mirroring relationship exists between the two data storage systems. All links break between the sites, and application processing continues using the primary (R1) volumes. The links are restored, and resynchronization commences by copying data from the primary (R1) volumes to the secondary (R2) volumes. Before resynchronization is finished, however, the primary volumes are destroyed, and the attempt at resynchronization has further corrupted the secondary volumes, due to the cache overwrite option.
The invention provides options other than the synchronous and semi-synchronous operating modes to avoid the xe2x80x9crolling disasterxe2x80x9d possibility when performing automatic recovery. One option is to suspend processing whenever the host requests a write to write-pending data in cache. Another option is to log multiple versions of tracks containing remote pending data.
Another aspect of the present invention provides mechanisms for selectively inhibiting automatic or manual recovery when automatic or manual recovery would be inappropriate. In one embodiment, each write request transmitted over the link between the data storage systems includes not only the data for at least one track in the secondary (R2) volume to be updated but also the current xe2x80x9cinvalid trackxe2x80x9d count for the secondary (R2) volume as computed by the data storage system containing the corresponding primary (R1) volume. Therefore, once a disaster occurs that destroys the data storage system containing the primary (R1) volume, the data storage system containing the secondary (R2) volume has an indication of the degree of consistency of the secondary (R2) volume. The xe2x80x9cinvalid tracksxe2x80x9d count can be used to determine an appropriate recovery operation for the volume, and can be used to selectively restrict read/write access to the volume when the user decides that synchronization should be required for a write access.
In a preferred embodiment, direct write access to a secondary (R2) volume is denied if remote mirroring is not suspended. When remote mirroring is suspended, direct write access to the secondary volume is still denied if a xe2x80x9csync requiredxe2x80x9d attribute is set for the volume and the volume is not synchronized.
In accordance with another aspect of the invention, automatic recovery is selectively inhibited by domino modes. If a xe2x80x9cvolume domino modexe2x80x9d is enabled for a remotely mirrored volume pair, access to a volume of the remotely mirrored volume pair is denied when the other volume is inaccessible. In a xe2x80x9clinks domino mode,xe2x80x9d access to all remotely mirrored volumes is denied when remote mirroring is disrupted by an all-links failure.
The domino modes can be used to initiate application-based recovery in lieu of automatic recovery. In one application-based recovery scheme, an application program maintains a log file of all writes (xe2x80x9cbeforexe2x80x9d or xe2x80x9cafterxe2x80x9d images) to a data file. To ensure recovery, the application program always writes data to the primary (R1) copy of the log file before it is written to the primary (R1) copy of the data file. The degree of synchronization between the secondary (R2) and primary (R1) copies is selected so that the remote mirroring facility always writes data to the secondary (R2) copy of the log file before it is written to the secondary (R2) copy of the data file.
Therefore, in the case of an all-links failure where host processing continues so far beyond the failure that all versions of the following updates are not retained, the secondary (R2) copy of the data file can be recovered if the primary (R1) copies are destroyed. In this case, if the secondary (R2) copy of the data file is corrupted, it is recovered using the changes recorded in the secondary (R2) copy of the log file.
In accordance with another aspect of the invention, the remote mirroring facility is provided with a migration mode which is active during host processing of a primary (R1) volume and iteratively copies updates from the primary (R1) volume to a secondary (R2) volume. Initially all data elements (tracks or records) of the secondary (R2) volume are marked as invalid. During each iteration, the data elements of the volume, such as tracks or records, are scanned for data elements that are invalid on the secondary (R2) volume. The next iteration copies from the primary (R1) volume to the secondary (R2) volume data elements having been invalidated by writes from the host during the previous iteration. A count of the number of data elements transferred during each iteration, or a count of the invalid data elements in the secondary volume, is kept in order to monitor convergence toward synchronization of the primary (R1) and secondary (R2) volumes. Host processing of the primary volume is suspended for a last iteration to obtain complete synchronization.
In accordance with another aspect of the invention, the host processor sends chains of channel commands to the data storage system containing a primary (R1) volume of a remotely mirrored volume pair. The data storage system containing the primary (R1) volume bundles the write data for all write commands in the chain into a single write command for transmission over a link to the secondary data storage system containing the secondary (R2) volume. The data storage system containing the primary (R1) volume decodes the channel commands to determine when it has received the last channel command in the chain, and once the last channel command in the chain is received, it transmits the bundle of write data for the chain over the link to the data storage system containing the secondary (R2) volume.
In accordance with yet another aspect of the invention, there is provided host remote mirroring software for permitting a system operator or host application program to monitor and control remote mirroring, migration, and recovery operations. The host remote mirroring software provides the capability of changing the configuration of the remotely mirrored volumes in the data processing system, suspending and resuming remote mirroring for specified remotely mirrored volume pairs, synchronizing specified remotely mirrored volume pairs and notifying the system operator or host application program when synchronization is achieved, invalidating or validating specified remotely mirrored volume pairs, and controlling or limiting the direction of data transfer between the volumes in a specified remotely mirrored pair.
The present invention therefore provides a data storage system which achieves nearly 100 percent data integrity by assuring that all data is copied to a geographically remote site, and in those cases when a back-up copy is not made due to an error of any sort, an indication is stored that the data has not been copied, but instead must be updated at a future time. The system operator or application programmer is free to choose a variety of remote mirroring and recovery operations best suited for a desired processing speed and level of data integrity.
Such a system is provided which is generally lower in cost and requires substantially less manpower and facilities to achieve than the prior art devices.