1. Field of the Invention
This invention is related in general to the field of data management systems. In particular, the invention consists of a system for fast replication of multiple data sets using a double allocation process.
2. Description of the Prior Art
Data storage libraries are used for providing cost effective storage and retrieval of large quantities of data. In a data storage library, data is stored on data storage media. This data storage media may comprise any type of media on which data may be stored, including but not limited to magnetic media (such as magnetic tape or disks), optical media (such as optical tap or disks), electronic media (such as PROM, EEPROM, flash PROM, Compactflash™, Smartmedia™ Memory Stick™, etc.), or other like media.
Typically, the data stored in a data storage library are segregated into data sets. These data sets may comprise physical data storage device, such as one or more hard disk drives. Alternatively, the data sets may include virtual storage devices, such as one or more partitions residing on one or more physical hard disk drive. It is customary to make copies, i.e., back-up data to prevent loss or corruption. The process of backing up data usually requires significant allocation of the data storage libraries resources such as processor capacity and communication bandwidth. A large portion of this resource allocation is dedicated to setting up and managing the transfer of each data set. Because a set-up process is traditionally required for each and every data set to be transferred, the utilization of system resources is compounded when multiple data sets are to be backed up. Accordingly, it is desirable to have a system for making copies of multiple data sets that reduces the cumulative demand for system resources associated with setting up and managing the data transfer process.
One approach to improving the replication of data is disclosed by Midgley et al. in U.S. Pat. No. 6,847,984. Here, Midgley teaches a system and method for continuous back up of data stored on a computer network. To the end, Midgley utilizes a synchronization process that replicates selected source data files stored on the network and creates a corresponding set of replicated data files, referred to as target data files, that are stored on a back up server. This produces a baseline data structure of target data files. Additionally, the Midgley invention utilizes a plurality of agents to monitor a portion of the source data files to detect and capture changes to the source data files. However, the invention, as disclosed by Midgley, is a process for mirroring data from the source to the target and does not address reducing the system requirements (overhead) necessary to initiate and manage transfers of complete data sets. In fact, because Midgley's invention captures changes to the source data set at the byte level, the number of communication sessions initiated to transfer data to the target data set is much higher than envisioned by the instant invention.
Another approach to the replication of data sets is disclosed by Briam et al. in U.S. Pat. No. 6,775,676. Here, Briam teaches deferring dataset creation by first creating database objects at a computer connected to a data storage device. Initially, a command to create a database object is received. Next, a database object definition for the database object is recorded. When the database object is accessed, a dataset for the dataset object is created from its database object definition. However, as with the Midgley device, Briam does not address reducing the overhead required to establish communication channels and managing the transfer of multiple data sets.
Yet another approach to data replication is explored by Buckingham in U.S. Pat. No. 6,833,970. Here, Buckingham discloses a data reader that reads a medium holding user and non-user data that includes information relating to the user data. The reader includes a read head that generates a data signal comprising user and non-user data. The user data is arranged into plural sets interspersed with the on-user data that identifies the user data within the sets. Processing circuitry receives and processes the data signal and obtains the user data from the data signal by using the non-user data to identify the user data within the data signal. While Buckingham teaches reading both data and meta data without relying on separation markers placed on the data storage medium, Buckingham also does not teach reducing the processor and communication system overhead when copying multiple data sets. Accordingly, it is desirable to have a system for replicating multiple data sets while reducing the system requirements for initiating multiple communication sessions.