1. Field of the Invention
The present invention relates to a method and system for backing up data from a plurality of disks, and particularly to interleaving the data from the disks during backup, thereby decreasing the time to perform a backup as well as a restore.
2. Related Art
Backing up data from one or more computer disks is typically performed to recover from inadvertent user deletions/overwrites or from disk hardware failure. In the case of inadvertent user deletions/overwrites, only the data corresponding to the destroyed data is copied to the original disk from the backup. In the case of disk hardware failure, the user can restore all files to the original disk from the most recent backup.
In most computer systems, the backup device is a tape drive, which can accommodate large amounts of data at a relatively low cost per byte of storage. However, because a tape drive is fundamentally a sequential access medium, random access or adjusting backward/forward takes significantly longer for the tape drive compared to the disk drive. Therefore, the most efficient way to use a tape drive is to xe2x80x9cstreamxe2x80x9d the data, i.e. ensure that the tape drive does not have to stop until the backup or restore is complete.
Generally, conventional backup methods provide for either file-by-file backup or image backup. In a file-by-file backup, the backup program copies one file at a time from the disk to the tape. Specifically, the program places all pieces of data for each file, irrespective of actual locations on the disk, into a single sequential block that is stored on the tape. Thus, a file-by-file backup can provide an incremental backup (wherein only those files that have changed since the last backup are written to tape), but is extremely time consuming for a full backup.
In an image backup, the data image is read sequentially from the disk and written to the tape. Thus, in prior art systems in which the disk drive is substantially faster than the tape drive, an image backup can keep the tape drive streaming. However, current technology has significantly improved tape drive speed. In fact, in state of the art systems, the tape drive speed is actually equal to or greater than the disk drive speed. In these systems, the tape drive cannot stream and thus begins to degrade both backup and restore performance.
Therefore, a need arises for backup and restore operations that can take advantage of technology improvements in tape drive speed.
In accordance with the present invention, the data from a plurality of primary data sources are interleaved and captured in a secondary data source during a backup operation. The interleaving of data allows the overlap of read/write operations performed by the plurality of primary data sources, thereby optimizing the performance of the backup as well as the restore. Typical primary data sources could include disk drives or disk arrays. A typical secondary data source could include a tape drive.
The present invention recognizes the advantages of substantially equal data transfers if most disk drives have substantially the same amount of used bits. Specifically, substantially equal data transfers allows multi-tasking both during backup and restore, thereby improving performance of those operations. For example, backup performance can be optimized if data from a plurality of disk drives are transferred to the tape drive in parallel. In a similar manner, restore performance can be optimized if data from the tape drive are transferred to the plurality of disk drives in parallel. To provide this optimization in one embodiment, the maximum size of a data block to be analyzed from each disk drive during a backup transfer is determined. The used bits from one or more data blocks of a disk drive are read and written to the tape drive until the amount of captured used bits for that disk drive is equal to or greater than the largest amount of used data bits captured from any disk drive previous to that point in time. The disk drives can be read in a round robin sequence. When all the used bits from one disk drive are written to the tape drive, that disk drive is eliminated from the sequence. The remaining disk drives are accessed in a modified sequence. This data interleaving continues until all used bits from all disk drives are written to the tape drive.
During a restore operation, the interleaved data is read from the tape drive and written to the plurality of disk drives. In one embodiment, the interleaved data includes information regarding the original configuration of the data, thereby allowing reconstruction of the original data (i.e. both used and unused bits) for each disk drive.
In another embodiment, a set number of bits to be read from each disk drive during a backup transfer is determined. These bits could include only used bits or could include both used and unused bits. The disk drives are typically read in a round robin sequence. When all the bits from one disk drive are written to the tape drive, that disk drive is eliminated from the sequence. The remaining disk drives are accessed in a modified sequence. This data interleaving continues until all bits from all disk drives are written to the tape drive.
The present invention also recognizes the advantages of data transfer based on percentage bandwidth if multiple disk drives have substantially less than the amount of used bits of other disk drives. In this embodiment, a percentage bandwidth associated with each disk drive can be generated by dividing the amount of used bits from each disk drive by the total number of used bits from all disk drives. The amount of used bits read from each disk drive and written to the tape drive during a transfer is based on the percentage bandwidth. The disk drives can be read in a round robin sequence. When all the used bits from one disk drive are written to the tape drive, that disk drive is eliminated from the sequence. The remaining disk drives are accessed in a modified sequence. This data interleaving continues until all used bits from all disk drives are written to the tape drive. In this weighted interleaved embodiment, each disk drive participates in a transfer in direct proportion to its total used bits. In this manner, disk drives having comparatively large amounts of information are given more data bandwidth, and thus given more opportunity to run at full speed. Therefore, this embodiment can improve the performance of the disk drives having comparatively large amounts of information.