1. Field of the Invention
The present invention relates in general to data backup in networked computer systems; and, in particular, to simultaneous data backup with multiple destinations on one or more hosts in data processing systems.
2. Description of the Background Art
Data processing systems, particularly networked computer systems, manipulate large sets of data and typically employ large amounts of data storage. Physical data storage devices are not infallible; therefore, most large data processing systems have provisions to generate and store one or more copies of data sets. Data sets are contained in files used by the computer system. The data sets can contain lists of instructions in a program, contents of a data base, portions of the operating system, and the like. It is customary to create multiple copies of important data sets to insure redundancy and to enhance the ability to recover from a disaster. If a primary data set is damaged or destroyed by a failure of a storage device, operator error, or other causes, the copied version of the data set may be summoned. A robust data processing system with accessible copies of important data allows the user to quickly recover from these failure mechanisms with minimal disruption to normal operations.
One known backup method is to periodically copy data from a primary storage pool such as a group of disk drives to a secondary storage pool such as another group of disk drives, tape drives, or optical drives. This method is reasonably effective, but both writing the copied data and retrieving the copied data is typically somewhat slow. Data mirroring is also a known method for generating a copy of data. For data mirroring to be effective, the storage devices for both the primary and the copied data should be the same type and format. For example, if a 40 gigabyte SCSI disk drive contains data and is to be efficiently used with data mirroring, then the device for the copied data should also be a 40 gigabyte SCSI disk drive. Contemporary storage systems usually have a large number of different storage device types. Thus the use of mirroring is more and more constrained as storage systems become more diverse in storage device types. Furthermore, mirroring is usually implemented for storage devices which are in close proximity. Therefore mirroring is not adequate protection against a loss of data due to a disaster.
Client-server systems have been developed to satisfy many computing needs including management of stored sets of primary and copied data. The client in a client-server system usually generates data sets or modifies existing data sets. The server in a client-server system typically manages backup functions including generating copies of data sets, sending data to a designated storage device, and retrieval of data sets. Servers may be configured to be less dependent on having identical storage device types. Typically, a server system contains several storage devices types and has a hierarchical storage manager program to manage the storage devices. However, client-server systems are usually connected with a local area network (LAN), and moving large sets of data over the LAN can have a significant undesirable impact on client-server performance. An improvement in the efficiency of creating backup data sets within a server-client system is obtained by copying only those portions of data sets which have been modified by the user.
Storage area networks (SAN) are rapidly becoming a preferred system to manage data storage in network systems. A SAN is a network of storage devices and one or more hosts connected to those storage devices. A typical SAN may be connected with a LAN client-server, or may be connected in a LAN-free environment. A SAN can be built using a number of technologies, such as a multi-host SCSI chain, SCSI over Fibre Channel, iSCSI or any other connection technology that meets the technical definition. LAN-free typically refers to backup/archive operations where the data to be stored is transferred over the SAN directly from the client host system to one or more storage devices. One of the advantages of a SAN is that the transfer of large data sets to and from storage devices is relatively efficient. Using a SAN to move large amounts of data effectively releases LAN resources and leads to better LAN performance. Another advantage of SAN systems is that management of the physical storage devices is simplified. A SAN enables the transfer of data sets in a more direct path from one data storage device to another while consuming fewer network resources. SAN systems can make use of a storage agent to offload some of the routine tasks of data set storage management from a SAN controller or server. A storage agent typically has a limited subset of functionality of a server or host system.
A disadvantage of existing systems is that a storage agent may not have physical access to all the storage devices required to create and store multiple copies. The physical devices may reside on another SAN or may be directly attached to a server. Therefore a system is needed which has the ability to offload the backup functions to a storage agent and perform simultaneous backup operations for storage devices connected to other SANs or LANs. Also a method and system is needed to create multiple copies of data sets during backup or archive operations where the destinations of the storage volumes reside on multiple hosts.