1. Field of the Invention
This invention generally relates to backup systems for computer storage devices and more particularly to a method and apparatus for performing concurrent backups in a computer system with geographically remote redundant computer storage. devices.
2. Description of Related Art
Maintaining the integrity of data in computer storage devices has been and continues to be an important area of computer development. Systems today generally maintain integrity by using redundant storage devices or by using periodic backup procedures that transfer data onto a removable media. Many systems incorporate both redundancy and periodic backup procedures to benefit from the known advantages of each and to minimize the effect of the disadvantages of each.
There are several ways to implement redundancy that have a variety of names. Generally, however, the popular methods are known as RAID (Redundant Array of Independent Disks) methods that are further defined by different levels. These levels extend from a RAID-1 level in which one data storage device mirrors the data in another data storage device to striping in accordance with RAID-0 procedures and to variants of redundant storage of data and parity information in accordance with RAID-3 through RAID-5 procedures. These systems are all characterized by performing the corresponding redundant operation concurrently with the execution of application programs in the main system.
RAID procedures are particularly useful in preventing the loss of data due to hardware failures. When a particular disk storage device fails, the data either resides on or can be reconstructed from data on other disk storage devices. However, if an event occurs, such as major damage caused by fire or the like or if an application program corrupts data, it is not possible to reconstruct the data as it existed prior to the event because redundant systems generally do not save information on an historical basis. Tape backup systems, that now also include optical disks and other media, provide a method of moving data offsite to avoid destruction as by a major physical catastrophe. They also provide an historical record because each backup generally seeks to obtain a snapshot of the entire data storage system at a particular point in time. However, tape backups must be scheduled and are not made continuously.
Combining both redundancy and external backups provides the potential for achieving all the advantages of the individual integrity systems and eliminating many of the disadvantages of both. However, needs of such a system have become more difficult to satisfy in recent years. For example, demands on the use or availability of the data storage devices for applications programs have increased. The size of those data storage devices has increased from capacities measured gigabytes (109) to terabytes (1012). In computer systems with a single data storage facility, data storage devices in the facility or some portion of them are taken out of service during the backup operation. In many systems the time for such backups cannot be tolerated by the applications running on the system. Several systems that have been proposed for providing concurrent backups while avoiding these problems are disclosed in the following United States Letters Patent:
U.S. Pat. No. 5,212,784 (1993) Sparks
U.S. Pat. No. 5,241,668 (1993) Eastridge et al.
U.S. Pat. No. 5,241,670 (1993) Eastridge et al.
U.S. Pat. No. 5,473,776 (1995) Nosaki et al.
U.S. Pat. No. 5,212,784 to Sparks discloses an automated concurrent data backup system in which a Central Processing Unit (CPU) transfers data to and from storage devices through a primary controller. The primary controller connects through first and second independent buses to first and second mirrored storage devices respectively (i.e., a primary, or mirrored device and a secondary or mirroring data storage device). A backup controller and device connect to the secondary storage device through its bus. Normally the primary controller writes data to both the primary and secondary data storage devices. The CPU initiates a backup through the primary controller. In response the primary controller then writes only to the primary data storage device and enables the backup controller to take control of the second bus and transfer data from the secondary data storage device to the backup media. After a backup operation is completed, the primary controller resynchronizes the storage devices by updating any changes that occurred to the primary data storage device while the backup operation was underway. Examples are also disclosed in which the primary controller connects to three and four storage devices that enable the system to operate with redundancy by mirroring two storage devices while the backup occurs with a third storage device.
U.S. Pat. Nos. 5,241,668 and 5,241,670 to Eastridge et al. disclose different aspects of concurrent backup procedures. In both systems a request for a backup copy designates a portion of the stored data called a data set. For example, if the data storage devices contain a plurality of discrete data bases, a data set could include files associated with a corresponding data base. In a normal operation, the application program is suspended to allow the generation of an address concordance for the designated data sets. Execution of the application program then resumes. A resource manager is established to manage all input and output functions between the storage sub-systems and associated memory and temporary memory. The backup copy is formed on a scheduled and opportunistic basis by copying the designated data sets from the storage sub-systems and updating the address concordance in response to the copying. Application updates are processed during formation of the backup copy by buffering the updates, copying the affected uncopied designated data sets to a storage sub-system memory, updating the address concordance in response to the copying, and processing the updates. The designated data sets can also copy to the temporary storage memory if the number of designated data sets exceeds some threshold. The designated sets are also copied to an alternate memory from the storage sub-system, storage sub-system memory and temporary host memory utilizing the resource manager and the altered address concordance to create a specified order backup copy of the designated data sub-sets from the copied portions of the designated sub-sets without user intervention.
If an abnormal event occurs requiring termination of the backup, a status indication is entered into activity tables associated with the plurality of storage sub-systems and devices in response to the initiation of the backup session. If an external condition exists that requires the backup to be interrupted, the backup copy session terminates and indications within the activity tables are reviewed to determine the status of the backup if a reset notification is raised by a storage sub-system. This enables the track extents which are active for a volume associated with a particular session to be determined. A comparison is then made between the track events which are active and volume and track extents information associated with a physical session identification. If a match exists between the track extents which are active and the volume of and track extent information associated with a physical session identification, the backup session resumes. If the match does not exist, the backup terminates.
U.S. Pat. No. 5,473,776 to Nosaki et al. discloses a concurrent backup operation in a computer system having a central processing unit and a multiple memory constituted by a plurality of memory devices for on-line storing data processed by tasks of the central processing unit.
A data backup memory is provided for saving data of the multiple memory. The central processing unit performs parallel processing of user tasks and a maintenance task. The user tasks include those that write currently processed data into the multiple memory. The maintenance task stops any updating of memory devices as a part of the multiple memory and saves the data to a data backup memory.
Each of the foregoing references discloses an approach for performing backup operations concurrently with the execution of applications programs in a computer system. However, in each, the system operates in the environment of a single computer system under common control. For example, in the Sparks patent the CPU connects through a primary controller to the first and second memories and to the backup controller. The Eastridge et al. and the Nosaki et al. patent references disclose systems in which the execution of applications programs is also involved in the backup operation. Further the components required for the backup operation and for maintaining redundancy are all located at a common site in each of the systems.
More recently, redundancy has come to include a concept by which an array of disks at one location (i.e., a local data facility at a local site) are mirrored by a second array of disks at a remote location (i.e., a remote data facility at a remote site). The remote site may be in a common building with the local site or up to hundreds of miles away from the local site. None of the foregoing systems suggest a viable solution for providing data integrity by combining redundancy and physical tape backup in such systems particularly given the apparent dependence of each of those systems on operations within the CPU that is performing applications programs.
Therefore it is an object of this invention to provide a computer system that enables redundant storage at a remote data facility and incorporates a provision for backup into an independent media at that remote data facility.
Another object of this invention is to provide a system adapted to provide backup in a remote data facility that provides a point in time backup without interfering with the operations on a data processing system at a local site.
Still another object of this invention is to provide a method and apparatus for backing up data in a remote data facility that is fully transparent to operations at a local site.
In accordance with one aspect of this invention it is possible to produce a point-in-time backup of data in a data processing system having a host computer and a first data storage facility that stores data at predetermined locations in data blocks, a second data storage facility and a data backup facility. During a normal operating mode the second data storage facility mirrors the first data storage facility in response to a copy program. The copy program is disabled thereby isolating the second data storage facility from the first data storage facility while enabling the first data processing system to continue its operations with the first data storage facility. This allows the backup of the data in the data the second data storage facility onto the backup facility. While the backup is proceeding, a recording takes place at the first data processing system to identify each data block in the first data storage facility that changes as a result of the operation of the data processing system. Upon completion of the backup operation, the copy program is enabled to copy data blocks from the first data storage facility to the second data storage facility corresponding to the recorded identifications thereby reestablishing the second data storage facility as a mirror of the first data storage facility.
In accordance with another aspect of this invention, a point-in-time backup of data in a first disk storage facility associated with a data processing system is achieved by providing a backup facility and a second disk storage facility for operating normally as a mirror for the first disk storage facility. A backup operating mode is established whereby the second disk storage facility is isolated from the first disk storage facility. This enables the data processing system and the first disk storage facility to continue normal operations and to record changes to the data in the first disk storage facility, and enables the operation of the backup facility simultaneously with the operation of the data processing system with its first disk storage system to produce a backup of the data in the second disk storage facility. When the backup operation is complete, any data in the first disk storage facility that was altered during the backup operation is copied to the second disk storage facility whereby the second disk storage facility resumes its operation as a mirror for the first disk storage facility.
In accordance with still another aspect of this invention, data redundancy is provided for a first data storage facility in a data processing system by providing a backup facility using a backup medium and a second data storage facility that operates in.a first mode for producing on the second data storage facility a redundant copy of the data stored in the first storage facility. A second operating mode is enabled thereby isolating the second data storage facility from the first data storage facility, continuing normal operations between the first data storage facility and the data processing system, and transferring data from the isolated second data storage facility to the medium in the backup facility simultaneously with and independently of the operation of the data processing system with the first data storage facility. After completing the backup operation and independently of the data processing system, the first operating mode is reestablished whereby the second data storage facility updates the copy of the data stored therein by transferring data from the first data storage facility changed during the second operating mode.
In accordance with yet another aspect of this invention, backup is provided for data in a first data storage facility in a first data processing system by providing a backup system including a second data storage facility and a backup facility using a backup medium for receiving data from the second data storage facility and a program for effecting a backup operation. A path is established between the first and second data storage facilities to enable the second data storage facility to mirror the first data storage facility. In response to a backup command the backup system interrupts communications over the path between the first and second data storage facilities without disrupting normal operations between the first data storage facility and the data processing system, enables the backup program to transfer data from the isolated second data storage facility to the medium in the backup facility simultaneously with and independently of the operation of the data processing system and, after completion of the backup operation and independently of the data processing system, reestablishes the path between the first and second data storage facilities whereby the second data storage facility is reestablished as a mirror for the first data storage facility.