The present invention relates to modifying a backup data stream to be processed by a fixed position delta reduction backup process. More particularly, the present invention relates to modifying a backup data stream to be processed by a fixed position delta reduction backup method, where the backup data stream includes concatenated logical partitions of data blocks.
In order to backup data, data backups are often performed via what is commonly referred to as a “backup application.” In order to backup the data, the backup application sends the data to be stored either to a local storage medium or via a network interface for remote transmission. The amount of data that is stored by the backup application varies with the method implemented by the backup application. For instance, some backup applications backup all data in the specified directory, database or file, while other applications attempt to increase the efficiency of the backup process by storing only the data that has been modified since the last backup. One commonly used method is the fixed position delta reduction method, which determines which fixed position segments of data have been modified since the last backup and stores the data reflecting those changes. In other words, the fixed position delta reduction method determines which segments of data have been modified by comparing one segment of data at a fixed position in a file or data stream received during a current backup with the segment of data previously at that same fixed position in the file or data stream during the last backup for that particular file.
The process via which a backup application implementing a fixed position delta reduction method executes and the effectiveness of that process varies with the format in which data is stored. Specifically, data associated with a particular file or database may be retrieved in the form of separate physical-organized streams or in a single stream including a plurality of data segments (i.e., blocks), which may be formed by concatenated logical partitions of data blocks. In other words, each logical partition includes one or more data blocks. Unfortunately, there are a number of problems that are introduced into a backup application implementing a fixed position delta reduction backup method when data is added to or deleted from a system implementing a backup data stream including a plurality of data blocks, which may be formed by concatenating logical partitions of data blocks.
In order to illustrate the effectiveness of a fixed position delta reduction backup application for a system implementing a backup data stream including a plurality of data blocks, the operation of the fixed position delta reduction backup application will be described with reference to FIGS. 1-3. FIG. 1 is a diagram illustrating an exemplary data stream including a plurality of blocks of data. As shown in FIG. 1, backup data is typically sent to the backup application as a data stream. In this example, a database 102 transmits the data stream 104 to a fixed position delta reduction backup application 105 for storing to a storage medium 106. As shown, the data stream 104 includes data blocks 1, 2, 3, 4, 5.
New data added to a file or database implementing a plurality of data blocks typically requires that a new data block be allocated. This new data block will appear as an insertion of a data block in the data stream provided to the fixed position delta reduction backup application, resulting an a “shift” in all subsequent data blocks in the data stream. As shown in FIG. 2, the insertion of data block “Inserted block” 108 between data blocks 2 and 3 results in a shift of the subsequent data blocks 3, 4, and 5 from their position in the original data stream 104, creating a modified data stream 110. As a result, when each data block in the modified data stream 110 is compared to the corresponding data block of the original data stream 104 (represented by corresponding arrows), the data blocks subsequent to the inserted data block 108 appear to have been modified or to be new data. Specifically, in this example, the blocks 3, 4, and 5 of the modified data stream 110 are compared to the corresponding data blocks 4, 5, and a non-existent data block in the original data stream 104, respectively, resulting in the storing of the data blocks 3, 4, and 5, as well as the inserted block 108. In other words, the “shifting” causes a backup application implementing a fixed position delta reduction method to recognize all subsequent data following the inserted data block as new (or modified) data. Thus, for data streams including a plurality of concatenated logical components, data in the logical components following the inserted data block will be perceived as new (or modified) data. As a result, the detection of this “new data” requires that all of the “new data” be written to a local storage medium or transmitted via a network interface for storing to a remote storage medium in order to perform a complete backup. Accordingly, this “new data” is stored unnecessarily, resulting in an inefficient processing of backup data provided to the fixed position delta reduction backup application.
Similarly, when data is deleted from a file or database implementing a plurality of data blocks, a data block is either deleted or de-allocated (e.g., marked as unused). When this data block is removed from the data stream, this causes a shift in the data blocks in the data stream in the opposite direction shown in FIG. 2. As shown in FIG. 3, the removal of data block “3” indicated at 112 of the modified data stream 114 results in a shift of data blocks from their position in the original data stream 104, as shown. As a result, this “shifting” causes a fixed position delta reduction backup application to recognize all data following the deleted data block to be new data. In this example, blocks 4 and 5 follow the deleted data block 112, and are therefore recognized as new data (since the last backup). Thus, for data streams including a plurality of concatenated logical components, data in the logical components following the deleted data block will be perceived as new (or modified) data. This “new data” is then unnecessarily written or transmitted to local or remote storage, respectively, introducing inefficiencies into the fixed position delta reduction backup process.
The inefficiencies introduced into the fixed position delta reduction backup process for systems implementing a backup data stream including a plurality of data blocks may go unnoticed for a single file that has been edited, resulting in the storing or re-transmission of a larger portion of the file than necessary. However, for a database application backing up a large number of files in the database, the amount of data that is stored or re-transmitted by a fixed position delta reduction backup application could be significant. As a result, these undesirable insertion and deletion characteristics could have a significant impact on the time in which a fixed position delta reduction backup application completes for a single data backup session in a system implementing data transmitted in the form of a stream including a plurality of data blocks, as well as a system implementing data transmitted in the form of a stream including logical partitions of data blocks. Since many common database programs such as Microsoft's SQL Server™ provide data during data backup in the form of a stream including a plurality of data blocks, as well as in the form of a stream including concatenated logical partitions, this is particularly problematic.
A number of fixed position delta reduction methods have been developed for use in backup applications. Those fixed position delta reduction methods that have been developed for use with systems implementing fixed length data blocks include those described in U.S. Pat. No. 5,990,810, entitled “Method for partitioning a block of data into subblocks and for storing and communicating such subblocks,” issued Nov. 23, 1999 to Ross Williams and in U.S. Pat. No. 5,745,906, entitled “Method and apparatus for merging delta streams to reconstruct a computer file,” issued Apr. 28, 1998 to Mark Squibb, both of which are incorporated herein by reference. However, none of the existing methods are effective in reducing the inefficiencies resulting from the insertion and deletion characteristics set forth above.
In view of the above, it would be beneficial if the inefficiencies introduced into a fixed position delta reduction backup process as a result of new or deleted data in a system implementing a backup data stream including a plurality of data blocks and/or concatenated logical partitions of data blocks could be eliminated.