The present invention relates to modifying a backup data stream to be processed by a fixed position delta reduction backup process. More particularly, the present invention relates to modifying a backup data stream to be processed by a fixed position delta reduction backup method, where the backup data stream includes a set of validation bytes for each data block.
In order to backup data, data backups are often performed via what is commonly referred to as a “backup application.” During a data backup, the backup application sends the data to be stored either to a local storage medium or via a network interface for remote transmission. The amount of data that is stored by the backup application varies with the method implemented by the backup application. For instance, some backup applications backup all data in the specified directory, database or file, while other applications attempt to increase the efficiency of the backup process by storing only the data that has been modified since the last backup. One commonly used method is the fixed position delta reduction method, which determines which fixed position segments of data have been modified since the last backup and stores the data reflecting those changes. In other words, the fixed position delta reduction method determines which segments of data have been modified by comparing one segment of data at a fixed position in a file or data stream received during a current backup with the segment of data previously at that same fixed position in the file or data stream during the last backup for that particular file.
The process via which a backup application implementing a fixed position delta reduction method executes and the effectiveness of that process varies with the format in which data is stored. Specifically, data associated with a particular file or database may be retrieved in the form of separate physical-organized streams or in a single stream including a plurality of data segments (i.e., blocks). Unfortunately, there are a number of problems that are introduced into a backup application implementing a fixed position delta reduction backup method when data is retrieved from a system providing a backup data stream including a plurality of data blocks, where each of the data blocks has an associated set of validation bytes.
In order to illustrate the effectiveness of a fixed position delta reduction backup application for a system implementing a backup data stream including a plurality of data blocks, each having an associated set of validation bytes, the operation of the fixed position delta reduction backup application will be described with reference to FIGS. 1-2. FIG. 1 is a diagram illustrating an exemplary data stream including a plurality of blocks of data. As shown in FIG. 1, backup data is typically sent to the backup application as a data stream. In this example, a database or Application Programming Interface (API) 102 transmits the data stream 104 to a fixed position delta reduction backup application 105 for storing to a storage medium 106. As shown, the data stream 104 includes data blocks 1, 2, and 3, where each of the data blocks has an associated set of validation bytes.
When a data stream is received via an application implemented by an IBM iSeries™ platform, the data stream includes a set of validation bytes for each block of data. More particularly, the set of validation bytes includes a Cyclic Redundancy Check (CRC) value. Since each set of validation bytes generated by an IBM iSeries™ platform also includes an “unknown seeding” component, the set of validation bytes associated with each data block will change with each request to the API. As a result, the validation bytes will appear to be changed data to the fixed position delta reduction backup application, regardless of whether the corresponding data block has been modified.
As shown in FIG. 2, each set of validation bytes 107 in the data stream 104 includes an “unknown seed” component. More particularly, the set of validation bytes 107 for the data blocks 1, 2, and 3 of the data stream 104 includes a CRC that is calculated using an “unknown seed,” seed1, that changes from one data backup to the next data backup. When a second request is submitted to the API, a second data stream 110 is received, which again includes a set of validation bytes 108 for each of data blocks 1, 2 and 3. Each set of validation bytes 108 in the second data stream 110 includes a CRC that is calculated using an unknown seed, seed2, that changes from one data backup to the next data backup. Thus, the CRC and therefore the set of validation bytes associated with a particular data block will differ from one backup session to the next, regardless of whether the contents of the data block have changed. As a result, when the set of validation bytes associated with each data block in the modified data stream 110 is compared to the corresponding set of validation bytes of the original data stream 104 (represented by corresponding arrows), the sets of validation bytes appear to have been modified or to be new data. As set forth above, the fixed position delta reduction backup application monitors segments of data for changes. Since each segment of the data stream being backed up typically includes both a data block and a set of validation bytes (and possibly other data block(s) and associated set(s) of validation bytes), the detection in the change of a set of validation bytes typically requires that the data blocks in that segment also be stored.
In this example, the set of validation bytes 108 associated with blocks 1, 2, and 3 of the modified data stream 110 are compared to the corresponding set of validation bytes 107 associated with data blocks 1, 2, and 3 in the original data stream 104, respectively. Since the unknown seed component used to generate the CRC of each set of validation bytes 107 of the data stream 104 differs from that of each set of validation bytes 108 of the data stream 110, the sets of validation bytes appear to have changed. The sets of validation bytes therefore appear to the backup application to be modified data, resulting in the storing of the segment(s) of the data stream including the validation bytes 108 and the corresponding data blocks 1, 2, and 3. Thus, for data streams including a set of validation bytes associated with each of a plurality of data blocks, each of the data blocks may be perceived as new (or modified) data upon a determination that the associated set of validation bytes in the same segment of the data stream has “changed.” As a result, the detection of this “new data” requires that all of the “new data” be written to a local storage medium or transmitted via a network interface for storing to a remote storage medium in order to perform a complete backup. Accordingly, this “new data” is stored unnecessarily, resulting in an inefficient processing of backup data provided to the fixed position delta reduction backup application.
The inefficiencies introduced into the fixed position delta reduction backup process for systems implementing a backup data stream including a set of validation bytes associated with each data block may go unnoticed for a single file that has been edited, resulting in the storing or re-transmission of a larger portion of the file than necessary. However, for a database application backing up a large number of files in the database, the amount of data that is stored or re-transmitted by a fixed position delta reduction backup application could be significant. As a result, these undesirable characteristics could have a significant impact on the time in which a fixed position delta reduction backup application completes for a single data backup session in a system implementing data transmitted in the form of a stream including a plurality of data blocks, each having an associated set of validation bytes. Since many common database programs such as that implemented on an IBM iSeries™ provide data during data backup in the form of a stream including a set of validation bytes for each data block where the set of validation bytes changes with each data backup, this is particularly problematic.
A number of fixed position delta reduction methods have been developed for use in backup applications. Those fixed position delta reduction methods that have been developed for use with systems implementing fixed length data blocks include those described in U.S. Pat. No. 5,990,810, entitled “Method for partitioning a block of data into subblocks and for storing and communicating such subblocks,” issued Nov. 23, 1999 to Ross Williams and in U.S. Pat. No. 5,745,906, entitled “Method and apparatus for merging delta streams to reconstruct a computer file,” issued Apr. 28, 1998 to Mark Squibb, both of which are incorporated herein by reference. However, none of the existing methods are effective in reducing the inefficiencies resulting from the characteristics set forth above.
In view of the above, it would be beneficial if the inefficiencies introduced into a fixed position delta reduction backup process as a result of the generation of a backup data stream including a set of validation bytes for each data block could be eliminated.