Backup storage media such as tape or disk are often replicated for retention and security purposes. It is well-known that storage media is subject to corruption, deterioration or loss. Therefore, rather than rely on a single copy of the backed up data, or “volume,” on the storage media, another copy of the backup volume is often created and stored in a secure remote location. One process for creating a copy of the original backup volume is called replication or cloning. For example, the original backup volume may be replicated to create a clone of the original “source” backup volume on another “destination” storage media. At present, tapes are the most prevalent media used for storing backup volumes because of their perceived low cost and stability. However, cloning tape backup media is a slow and resource-intensive process due to the way backup data is written to tape. The cloning process is also slow for file system devices that store backup data on disk, but emulate tape backup.
A. Backup to Tape and File System Devices
The backup process typically includes a backup and recovery application (“BURA”) server connected over a network to a number of backup clients, such as desktop computers, servers or networked storage devices. One will appreciate that a BURA software application that manages the backup and recovery process may reside on a dedicated BURA server. During the backup process, the BURA software application may create and write one or more databases to the storage media. For example, the BURA may create a volume header file, or “volume label,” that describes the data volume written to the storage media. In order to keep track of the backup media itself, the BURA may create and store a media database with information describing the storage media itself, whether it is tape or disk, how the data is organized on the media, and/or whether the storage media is an original source backup volume or a clone. This media database may be stored on the dedicated BURA server.
During the backup process, data will be transmitted from backup clients to the BURA server as save sets. A save set may comprise the file system and file system data from a single backup client. Save sets are temporarily stored in one or more memory buffers in the BURA server before being written to the storage media. Since save sets may be quite large, it may be packaged or segmented into one or more data “chunks” by the BURA. Without this segmentation, a network may be overwhelmed by the amount of data transfer. Transmitting save sets as chunks helps manage network traffic.
Once a save set has been divided into chunks, these chunks are transmitted over the network to the BURA server. Chunks from different save sets may reach the BURA server at different times. In order to keep track of each chunk, the BURA will associate the chunk with information that identifies the chunk's corresponding save set. This information may be written to the BURA's media database. The information may also be written as a “chunk header” that typically precedes its respective chunk on the storage media. As a result, chunks are separated on the storage media by respective chunk headers.
FIG. 1 illustrates how a portion, or “tape block” 101, may appear after backup to tape storage media. As shown in FIG. 1, chunk data from different save sets may be written to the same tape block 101. Block header 105 may contain metadata about all the data in its tape block. Chunk header 111 may describe the physical location, save set information and other information about chunk data 115. Similarly, chunk header 121 may describe the physical location, save set information and other information about chunk data 125. Chunk data 115 and chunk data 125 may originate from different save sets, but they are each separated by their respective chunk header.
Disk media, such as a file system or file type device, is becoming more favorable over tape media. Disk media can store more data and be read faster than tape. Nevertheless, backup to disk media often follows the same save set chunking, buffering and recording steps used during backup to tape. As a result, the backup data on a file system device may resemble or may appear to resemble data backed up to tape. The file system device will contain backup data as blocks organized as chunks that are separated by chunk headers. As will be discussed further below, the problems with cloning file system device backups are similar to the problems with cloning tape backups.
B. Cloning Tape and/or File System Device Volumes
As previously discussed, the purpose of replicating or cloning backup storage media is to create an exact copy of all of the data on the storage media, also known as the “volume.” However, cloning a tape storage volume is not a trivial process. Cloning data that is organized by chunks and chunk headers requires reading each chunk header to determine information about its respective chunk, extracting the chunk data, then recording the chunk data with a new chunk header onto the new “destination” backup device. The new chunk header will contain information about the physical location of the chunk on the destination backup device. In addition, the media database on the BURA server that stores information for the destination backup device may require updating. As a result, cloning tape media is tedious, slow, and akin to a complete recovery and write process.
Disk media, particularly file system devices used to store backed up file systems, also suffer from the same problems. Even though a disk does not require data organization by chunks and chunk headers, most backup processes emulate this method because of the legacy software used to manage the backup process. Also, users are familiar with the way data is organized on tapes, and therefore adopt this organization even though it may not be the most efficient use of disk media.
Since the BURA is the only software application designed to manage recovery operations, cloning tapes may divert its resources away from its usual tasks of backing up data. As a further result, since the new backup volume will necessarily have different chunk header information for each respective chunk, the data on the destination backup volume will not be a true clone of the data on the source backup volume. These slight discrepancies increase the propensity of error, which defeats the purpose of the cloning process.
What is therefore needed is an improved way to clone data, particularly backup data volumes.