1. Field of the Invention
The present invention relates to computer data management and, more specifically, to a system and method that create a snapshot copy of data from a source disk to a virtual tape implemented with a destination disk.
2. Discussion of Related Art
Conventionally, computer systems for processing and maintaining large amounts of data generally include different types of mass data storage devices. Two examples of such mass data storage devices are (1) direct access storage devices, such as disk devices; and (2) sequential access storage devices, such as tape devices. To improve reliability and maintainability, data stored on a disk device is backed-up (i.e., copied) to tape devices on regular intervals. In this fashion, if the disk device fails, the tape copy may be used to re-create the data to the image it had at the time of the back-up.
One technique for backing up data employs xe2x80x9cvirtual tapexe2x80x9d technique in which a disk device is used to simulate, or emulate, a tape device. In short, this technique scans a source disk for data to be backed up, location by location, and sequentially writes the scanned data to the virtual tape created on a destination disk. As such, the technique stores data on the destination disk as if the data is being stored on a physical tape. In particular, data stored in the virtual tape disk includes tape overhead information (e.g., tape header and trailer labels) in addition to the data itself. The data stored in the virtual tape is also arranged as a series of sequentially organized tape records. As data is read from the source disk, this technique creates a corresponding tape record data structure and stores that onto the virtual tape disk. Access to the data stored on the virtual tape disk is also sequential, just as it is to real tape. An advantage of this technique is that it is simple to copy the data stored on the virtual tape to physical tapes, and what is more, the same backup tools used with real tapes can be used with virtual tapes too.
When the virtual tape is to be copied to a physical tape, it is possible to fit more than one virtual tape thereto, thereby using tapes more efficiently. In addition, restore operations (i.e., copying data from the virtual tape disk back to the source device) can be completely automated since no human intervention is required when using the virtual tape technique, whereas a restore operation from physical tapes requires human interventions (e.g., locating and mounting correct tapes).
A disadvantage of the above-described technique is that it may require a long backup window period during which the source disk cannot be used by the computer system. That is, while data is being backed up to a tape/virtual tape, the source disk is made inaccessible so that data copied to the destination disk is identical to the data stored on the source disk at one instant in time, i.e., the time at which the backup began. This is needed to maintain data coherency, as the copied data corresponds to a given instant.
Another backup technique developed for data backup that reduces the length of the backup window is called a snapshot technique, which includes two stages in backing up the data. During the first stage, an exact duplicate of the disk to be backed up is created on another disk called a snapshot disk. This exact duplicate is called a snapshot copy. This technique creates a disk image of data as it existed when the snapshot copy was initiated, while permitting new updates to the xe2x80x9cto be backed upxe2x80x9d disk during the creation of the snapshot disk. These updates do not propagate to the snapshot disk. In the second stage, the data stored in the snapshot disk is copied to a tape device. The backup window is shortened because the source disk may be accessed and updated as soon as the snapshot copy operation is initiated.
To permit updates while the snapshot copy is in progress snapshot copy is created by the following two processes:
1. A normal data copy process. Data is read, in the order as it is scanned from the source disk, and copied to the snapshot disk. As part of this process, a data structure, created to keep track of which portions of the data have been copied, is accessed to determine whether a portion of the data to be copied, has already been copied to the snapshot disk. The process does not copy data that has already been copied to the snapshot disk as part of an xe2x80x9cout-of-order data copy,xe2x80x9d which is described below.
2. Out-of-order data copy. The out-of-order data copy process takes place when an update is requested by the computer during the normal data copy process. More specifically, upon receiving the update request, it is determined whether the update request is targeting a data location that has already been copied as part of the normal data copy process. If the data has already been copied, the update is performed and the normal data copy process resumes. If the targeted location has not yet been copied as part of the normal data copy process, the data at the location to be updated is first copied to the snapshot disk (out-of-order), and once the copy has been performed the update to the data proceeds. Subsequently, the data structure is updated to indicate that the location has been copied. In this fashion, when the normal data copy process resumes, the normal data copy process will not copy the xe2x80x9cupdatedxe2x80x9d data.
The above described processes permit updates to the source disk while allowing the creation of the snapshot disk at the same time without corrupting the snapshot disk with the updated data. However, this technique relies on direct (not sequential as in tape devices) access to the disk that the snapshot copy disk resides to perform the out-of-order update, thus the result is an exact image of the source disk, and as such has yet to be copied to a virtual tape.
Embodiments of the present invention provide a method and system that overcome the above-described shortcomings of the virtual tape and snapshot methods. In particular, the method of the present invention includes, the acts of receiving information identifying a set of data that is to be copied from the first direct access storage device and mapping destination locations in a second direct access storage device for each element of the set. The destination locations are in a sequence emulating a tape copy. The method also includes the acts of iterating through the set of data. For the each element of the set, the method of the present invention also includes the acts of determining if the element has already been copied to the second direct access storage device, and, if the element has not already been copied, then copying the element to its mapped location in the second direct access storage device.
The method of the present invention may also include the act of, during the iterating act, intercepting a write command to an element that has not yet been copied. If such a write command is intercepted, the method may also copy the element from the first direct access storage device to its mapped location in the second direct access storage device, then execute the write command.
The method of the present invention may retrieve the set of data stored in the first direct access storage device using a first input-output (I/O) access protocol. The first file access protocol may be Enterprise Systems Connection (ESCON) protocol.
The method of the present invention can store the set of data into the second direct access storage device using a second I/O access protocol (SCSI/FC). The second file access protocol can be Open System protocol.
The method of the present invention may also include the acts of identifying the elements of the set of data and calculating computer memory size information of each of the elements in the first direct access storage device.
In addition, the method of the present invention can also include the act of creating an ordered list in order to extract the size information from the first direct access storage device. In this embodiment, each entry of the ordered list is associated with one of the elements of the set. The method may also include the act of storing into each entry physical block addresses of one or more memory blocks that store the element associated with the entry.
The method of the present invention may also include the act of creating an ordered list in order to extract the size information from the first direct access storage device. In this embodiment, each entry of the ordered list is associated with one of the elements of the set. The method may also include the act of storing into each entry physical cylinder and head (CH) addresses of one or more tracks that store the element associated with the entry.
The method may also include the act of creating a file system size table. In this embodiment, each entry of the size table includes information relating to at least one of fields, key fields and data field for one or more records on one of a plurality of tracks. The method may also include the act of updating the file system size table each time a format write I/O command is administered to one of the plurality of tracks.
The method may also include the act of calculating a computer memory size required in the second direct access storage device to copy each element of the set from the first direct access storage device. The method may also include the acts of creating a bit array, each bit of the bit array associated with one of the elements of the set and initializing each bit of the bit array to a first state, wherein the first state of each bit designates that the element associated therewith is not yet copied.
The method may also include the act of changing the first state of one of bits in the bit array, when the element associated with the one of bits has been copied from the first direct access storage device to the second direct access storage device.
The present invention also includes a system of creating a snapshot copy of data stored on a first direct access storage device. The system includes means for receiving information identifying a set of data that is to be copied from the first direct access storage device and means for mapping destination locations in a second direct access storage device for each element of the set, wherein the destination locations are in a sequence emulating a tape copy. The system also includes means for iterating through the set of data that includes means for determining if the each element of the set has already been copied to the second direct access storage device, and means for copying the element to its mapped location in the second direct access storage device if the element has not already been copied.
The system of the present invention may also include means for intercepting a write command to an element that has not yet been copied, wherein, if such a write command is intercepted, copying the element from the first direct access storage device to its mapped location in the second direct access storage device, then executing the write command.
The system of the present invention may also include means for retrieving the set of data stored in the first direct access storage device using a first file access protocol. The first file access protocol can be Enterprise Systems Connection (ESCON) protocol.
The system of the present invention may also include means for storing the set of data into the second direct access storage device using a second I/O access protocol. The second file access protocol is Open System protocol.
The system of the present invention may also include means for identifying the elements of the set of data and means for calculating computer memory size information of each of the elements in the first direct access storage device.
The system of the present invention may also include means for creating an ordered list in order to extract the size information from the first direct access storage device. In this embodiment, each entry of the ordered list is associated with one of the elements of the set. The system may also include means for storing into each entry physical block addresses of one or more memory blocks that store the element associated with the entry.
The system of the present invention may also include means for creating an ordered list in order to extract the size information from the first direct access storage device. In this embodiment, each entry of the ordered list may be associated with one of the elements of the set. The system may also include means for storing into each entry physical cylinder and head (CH) addresses of one or more tracks that store the element associated with the entry.
The system of the present invention may also include means for creating a file system size table. In this embodiment, each entry of the size table may include information relating to at least one of fields, key fields and data field for one or more records on one of a plurality of tracks. The system may also include means for updating the file system size table each time a format write I/O command is administered to one of the plurality of tracks.
The system of the present invention may also include means for calculating a computer memory size required in the second direct access storage device to copy each element of the set from the first direct access storage device.
The system of the present invention may also include means for creating a bit array, each bit of the bit array associated with one of the elements of the set, and means for initializing each bit of the bit array to a first state, wherein the first state of each bit designates that the element associated therewith is not yet copied.
The system of the present invention may also include means for changing the first state of one of bits in the bit array, when the element associated with the one of bits has been copied from the first direct access storage device to the second direct access storage device.