1. Field of the Invention
This invention generally relates to backup systems for computer storage devices and more particularly to a method and apparatus for performing concurrent backups in a computer system with geographically remote redundant computer storage devices.
2. Description of Related Art
Maintaining the integrity of data in computer storage devices has been and continues to be an important area of computer development. Systems today generally maintain integrity by using redundant storage devices or by using periodic backup procedures that transfer data onto a removable media. Many systems incorporate both redundancy and periodic backup procedures to benefit from the known advantages of each and to minimize the effect of the disadvantages of each.
There are several ways to implement redundancy that have a variety of names. Generally, however, the popular methods are known as RAID (Redundant Array of Independent Disks) methods that are further defined by different levels. These levels extend from a RAID-1 level in which one data storage device mirrors the data in another data storage device to striping in accordance with RAID-0 procedures and to variants of redundant storage of data and parity information in accordance with RAID-3 through RAID-5 procedures. These systems are all characterized by performing the corresponding redundant operation concurrently with the execution of application programs in the main system.
RAID procedures are particularly useful in preventing the loss of data due to hardware failures. When a particular disk storage device fails, the data either resides on or can be reconstructed from data on other disk storage devices. However, if an event occurs, such as major damage caused by fire or the like or if an application program corrupts data, it is not possible to reconstruct the data as it existed prior to the event because redundant systems generally do not save information on an historical basis. Tape backup systems, that now also include optical disks and other media, provide a method of moving data offsite to avoid destruction as by a major physical catastrophe. They also provide an historical record because each backup generally seeks to obtain a snapshot of the entire data storage system at a particular point in time. However tape backups must be scheduled and are not made continuously.
Combining both redundancy and external backups provides the potential for achieving all the advantages of the individual integrity systems and eliminating many of the disadvantages of both. However, needs of such a system have become more difficult to satisfy in recent years. For example, demands on the use or availability of the data storage devices for applications programs have increased. The size of those data storage devices has increased from capacity measured gigabytes (10.sup.9) to terabytes (10.sup.12). In computer systems with a single data storage facility, data storage devices in the facility or some portion of them are taken out of service during the backup operation. In many systems the time for such backups cannot be tolerated by the applications running on the system. Several systems that have been proposed for providing concurrent backups while avoiding these problems are disclosed in the following U.S. Pat. Nos.:
5,212,784 (1993) Sparks PA1 5,241,668 (1993) Eastridge et al. PA1 5,241,670 (1993) Eastridge et al. PA1 5,473,776 (1995) Nosaki et al.
U.S. Pat. No. 5,212,784 to Sparks discloses an automated concurrent data backup system in which a Central Processing Unit (CPU) transfers data to and from storage devices through a primary controller. The primary controller connects through first and second independent buses to first and second mirrored storage devices respectively (i.e., a primary, or mirrored device and a secondary or mirroring data storage device). A backup controller and device connect to the secondary storage device through its bus. Normally the primary controller writes data to both the primary and secondary data storage devices. The CPU initiates a backup through the primary controller. In response the primary controller then writes only to the primary data storage device and enables the backup controller to take control of the second bus and transfer data from the secondary data storage device to the backup media. After a backup operation is completed, the primary controller resynchronizes the storage devices by updating any changes that occurred to the primary data storage device while the backup operation was underway. Examples are also disclosed in which the primary controller connects to three and four storage devices that enable the system to operate with redundancy by mirroring two storage devices while the backup occurs with a third storage device.
U.S. Pat. Nos. 5,241,668 and 5,241,670 to Eastridge et al. disclose different aspects of concurrent backup procedures. In both systems a request for a backup copy designates a portion of the stored data called a data set. For example, if the data storage devices contain a plurality of discrete data bases, a data set could include files associated with a corresponding data base. In a normal operation, the application program is suspended to allow the generation of an address concordance for the designated data sets. Execution of the application program then resumes. A resource manager is established to manage all input and output functions between the storage sub-systems and associated memory and temporary memory. The backup copy is formed on a scheduled and opportunistic basis by copying the designated data sets from the storage sub-systems and updating the address concordance in response to the copying. Application updates are processed during formation of the backup copy by buffering the updates, copying the affected uncopied designated data sets to a storage sub-system memory, updating the address concordance in response to the copying, and processing the updates. The designated data sets can also copy to the temporary storage memory if the number of designated data sets exceeds some threshold. The designated sets are also copied to an alternate memory from the storage sub-system, storage sub-system memory and temporary host memory utilizing the resource manager and the altered address concordance to create a specified order backup copy of the designated data sub-sets from the copied portions of the designated sub-sets without user intervention.
If an abnormal event occurs requiring termination of the backup, a status indication is entered into activity tables associated with the plurality of storage sub-systems and devices in response to the initiation of the backup session. If an external condition exists that requires the backup to be interrupted, the backup copy session terminates and indications within the activity tables are reviewed to determine the status of the backup if a reset notification is raised by a storage sub-system. This enables the track extents which are active for a volume associated with a particular session to be determined. A comparison is then made between the track events which are active and volume and track extents information associated with a physical session identification. If a match exists between the track extents which are active and the volume of and track extent information associated with a physical session identification, the backup session resumes. If the match does not exist, the backup terminates.
U.S. Pat. No. 5,473,776 to Nosaki et al. discloses a concurrent backup operation in a computer system having a central processing unit and a multiple memory constituted by a plurality of memory devices for on-line storing data processed by tasks of the central processing unit. A data backup memory is provided for saving data of the multiple memory. The central processing unit performs parallel processing of user tasks and a maintenance task. The user tasks include those that write currently processed data into the multiple memory. The maintenance task stops any updating of memory devices as a part of the multiple memory and saves the data to a data backup memory.
Each of the foregoing references discloses an approach for performing backup operations concurrently with the execution of applications programs in a computer system. However, in each, the system operates in the environment of a single computer system under common control. For example, in the Sparks patent the CPU connects through a primary controller to the first and second memories and to the backup controller. The Eastridge et al. and the Nosaki et al. patent references disclose systems in which the execution of applications programs is also involved in the backup operation. Further the components required for the backup operation and for maintaining redundancy are all located at a common site in each of the systems.
More recently, redundancy has come to include a concept by which an array of disks at one location (i.e., a local data facility at a local site) are mirrored by a second array of disks at a remote location (i.e., a remote data facility at a remote site). The remote site may be in a common building with the local site or up to hundreds of miles away from the local site. None of the foregoing systems suggest a viable solution for providing data integrity by combining redundancy and physical tape backup in such systems particularly given the apparent dependence of each of those systems on operations within the CPU that is performing applications programs.