In a large organization, data from multiple backup clients are typically backed up for retention and security purposes. Backup clients include desktop computers, servers, networked storage devices and other data stores that can contain large amounts of data. The backup process is usually managed by a backup and recovery application (“BURA”) resident on a dedicated backup or BURA server. In a typical backup operation, data selected for backup will be transmitted over a network from the backup clients to the BURA server, and then from the BURA server to the storage media, which may be tape or disk. Each set of client data selected for backup is known as a “save set.” For example, all the data selected for backup from a desktop computer may be one save set; all the data selected for backup from a server may be another save set; a single database on a networked storage device may be another save set; and so forth. A backup process may involve backing up many of these save sets and will usually require the transmission of many large save sets over the network.
A. Chunk Level Multiplexing
Transmitting large save sets can cause significant strain on the network. Current methods for relieving network congestion include a process known as “chunk level multiplexing,” which involves packaging save sets into more manageable sizes before transmitting or “streaming” them over the network to the backup server. These packages are known as data chunks, or simply, “chunks.” Thus, instead of transmitting backup data as a large save set and potentially overwhelming the network, the save set will be transmitted as chunks, allowing the chunks to reach the BURA server using the most efficient path on the network. Since the multiplexing occurs during the chunk step of the backup process, it is known as “chunk level multiplexing.”
As previously discussed, a backup process typically involves many save sets. Each save set will therefore be packaged into chunks and streamed to the BURA server. As a result, chunks from different save sets may reach the BURA server at the same time. The BURA server will temporarily store the chunks in a buffer on the BURA server before writing them to storage media. In order to identify the chunks' originating save set, the BURA will also create metadata for each chunk. Each chunk's metadata will later be written into a header (“chunk header”) that will physically precede the chunk on the storage media. Chunk headers may be created and associated with each respective chunk during the buffering step.
The buffering step will typically involve a single circular buffer 201, shown in FIG. 2. A person having skill in the art will appreciate that a circular buffer is a type of data structure comprised of at least two data blocks. As data is added to one data block, data may be removed from the other data block. In the case of a backup process, the removal of data from the circular buffer is known as “flushing.” In other words, once a data block in the circular buffer has been filled with data chunks (and their respective chunk headers), the buffer will flush the chunks for recording on the storage media. While one block is being flushed, the BURA may add data chunks and chunk headers to another block on the circular buffer 201. In this fashion, circular buffer 201 allows for a continuous cycle of filling and flushing of its data blocks. This process of filling and flushing is generally controlled and managed by the BURA.
Backup storage media includes both tape and disk media, both of which may be further divided into units called “media records.” As noted previously, the circular buffer 201 may also be comprised of units called data blocks. Usually, a data block on circular buffer 201 will be the same size as a media record. As shown in FIG. 2, one circular buffer data block has been flushed and written to media record 205, and another circular buffer data block has been flushed and written to media record 207. Media record 205 is preceded by a record header 203, which may contain metadata or information on the chunks in media record 205. Media record 205 contains a number of data chunks 211 and 213, each of which corresponds to a different save set. Media record 207 contains data chunks 215 and 217. Note that chunk 215 is from a different save set, but chunk 217 is from the same save set as chunk 211. The separation of chunks 217 and 211 is due to the chunk level multiplexing discussed above. Also shown are chunk headers 210, 212, 214 and 216 that each correspond to the data chunk it precedes. For example, chunk header 210 precedes and contains information for chunk 211; chunk header 212 precedes and contains information for chunk 213; and so on. In some cases, metadata for the chunk may also antecede the chunk, but in either case, chunks from one save set are physically separated from other chunks from the same save set. FIG. 2 therefore shows that not only are save sets chunks separated from one another on the storage media (211 and 217), but each chunk is also further separated by chunk headers.
While this is an efficient method for ensuring all save sets are streamed and written to backup storage media, the result of this process can complicate recovery. Recovery typically involves recovery and restoration of an entire save set, as opposed to individual chunks within the recorded save set. When the BURA server receives a request to recover a save set, the BURA must locate all of the chunks associated with that save set on the storage media before the save set can be recovered. This requires navigating to each and every media record 205 on the storage media and reading each chunk header to identify the respective chunk's originating save set. This is a tedious process that slows recovery and taxes BURA resources.
What is therefore needed is a more efficient way to transmit data for backup to a storage device that also improves the recovery process.