1. Field of the Invention
The present invention relates generally to data duplication, and in particular to a method and system for duplicating data at a block level from a first storage medium to a second storage medium.
2. Description of the Related Art
The importance of data to businesses and other organizations necessitates ensuring the preservation of the data. One way to preserve data is to back it up to multiple separate sources of backup storage media. Typically, when data is written from one storage medium to another, it is written on a file-by-file basis. This method of writing on a file-by-file basis may result in a slow duplication process.
The duplication of data may be time-consuming due to factors relating to the data being replicated and how it is stored on a source storage medium. One of these factors is reading from or writing to a file system that is fragmented. Another factor that may impact performance is the size of the metadata in relation to the size of the data. A third factor impacting performance is the density of images stored in blocks on the source storage medium. One or more of these factors may contribute to a slow duplication operation as compared to the peak read and write speeds of the backup system's storage media. In some cases, when writing data on a file-by-file basis, metadata may be written to a catalog of metadata after each file is written to the second storage medium. When many files are being duplicated, updating the catalog metadata after copying each file may be a bottleneck for the duplication operation.
Other factors may contribute to cause delays when duplicating data, such that data from the source device is not being read fast enough to write to the target device. For example, a tape drive may be the target storage device during a duplication operation, and writing to a tape drive is typically performed at a constant speed. If data is not being read fast enough from the source device to fill up the write buffer of the target tape drive, the target tape drive may run out of data to write. When this happens, the tape drive will still be moving forward at a fixed speed. The tape drive will then need to return to the place on the tape where it ran out of data. This type of operation is called backstitching, and this can add to the overhead of duplicating data from a first storage device to a second storage device. Backstitching is not limited to tape drives; backstitching or other similar backtracking movements can occur for other types of storage devices (e.g., disk drives, optical drives).
In many cases, the data selected for duplication may be a plurality of images. A typical method for duplicating a plurality of images is to locate the first image on the first storage device, read the image, and then copy it to a second storage device. Then, the typical method would move to the next image, read the image, and then copy it to the second storage device. This method would continue for all of the images selected for duplication. This traditional method may require repositioning after reading each image. As the number of images selected for duplication increases, this repositioning can waste time and increase the inefficiency and overhead of the duplication operation.
A faster method of duplicating data may be copy data block-by-block at a raw level from one medium to another medium. However, typical block level replication techniques do not include updating a catalog of metadata with metadata on the new copies of data such that a backup application may be aware of or be able to access the new copies for restoration operations. It would be desirable to perform a block level replication of data from a primary storage medium to a secondary storage medium while maintaining the ability to access the data on the secondary storage medium. A user may wish to retrieve one or more individual files from the backup storage media, and a backup application may have the flexibility to choose from the primary and/or the secondary storage medium.
In view of the above, improved methods and mechanisms for performing block level data duplication while maintaining up-to-date catalog metadata are desired.