This invention pertains to a computer apparatus and method for merging system deltas, and more particularly, to a computer apparatus and method for 1) merging a number of system deltas with a copy of a computer system's files (primary input stream) to create a revised copy of a computer system's files (primary output stream), 2) merging a plurality of system deltas with one another to create a compiled system delta, 3) creating inverse system deltas, and 4) merging inverse system deltas as in 1) and 2), supra. The methods may be used to save, construct and/or retrieve current and historical states of a computer system's files. The apparatus and method may be used in conjunction with a computer backup process, version manager, or the like.
The above-referenced application to Squibb, Ser. No. 08/039,702, discloses a method and apparatus for producing a change list for an original and updated version of a computer file. The method and apparatus utilize a hash generator CPU to produce a token set for an original file. A comparator CPU later uses the token set and a windowing technique to identify and correlate the locations of coexistent pages in the original and updated files. The comparator CPU then uses the coexistent page information and the residue of the updated file (or original) to create a delta expressing the differences between the original and updated files. The delta is transmitted to another computer and combined with a backup copy of the original file to create a backup copy of the updated file. The original file and a series of deltas are used to retain historical file information in a cost effective manner.
The above-referenced application to Squibb entitled "Computer Apparatus and Method for Merging a Sequential Plurality of Delta Streams" discloses a computer apparatus and method for merging a sequential plurality of delta streams, wherein each delta stream represents a change to either a prior delta stream, an original data stream, or an updated data stream. The method and apparatus may be used to 1) merge a sequential plurality of delta streams with an original data stream to create an updated data stream, 2) merge a sequential plurality of delta streams to create a compiled delta stream, or 3) merge a sequential plurality of negative delta streams to retrieve a desired prior data stream. The method and apparatus may be used in conjunction with a computer backup process, version manager, or the like. In summary, a consumer process initiates a number of search requests, within a transaction chain corresponding to the sequential plurality of delta streams, for a number of data bytes to transfer to an updated data stream. The search requests may be fulfilled with data bytes provided by the last delta stream in the transaction chain capable of supplying data bytes, or if the sequential plurality of delta streams is incapable of fulfilling the search request, it may be fulfilled with data bytes provided by the original data stream. As the sequential plurality of delta streams is merged, a sequential plurality of negative delta streams may be generated, thus enabling reconstruction of a desired prior data stream.
Although the above apparatus and methods of Squibb provide unique advantages in performing backup operations, version management, and the like, especially when reading and writing to sequential media, they do not provide an efficient, nor appropriate, method of managing "system" events such as file creations, deletions, replacements, and duplications (clonings). For example, assume that the second system delta in a sequential plurality of twenty system deltas contains delta information indicating that file AZ was deleted. Processing file AZ through a twenty element transaction chain (hereinafter referred to as a "delta bridge") is a cumbersome way to acknowledge that file AZ was deleted by the second system delta, and consequently, all of its data was routed from an original copy of a system's files (original data stream) to an inverse system delta (negative delta stream) corresponding to the second system delta.
It is therefore a primary object of this invention to provide a computer apparatus and method for efficiently merging changes to a computer "system", and more particularly, a computer apparatus and method for 1) merging system deltas with a copy of a computer system's files (primary input stream) stored in a sequential storage media, or 2) merging system deltas with one another.
The prior art comprises three procedures for system backup and/or version control. The first is "full copy versioning." In a full copy versioning system, a system's files are initially copied to a backup repository. Revised files are transferred to the backup repository as incremental backups. An incremental backup comprises a complete set of data for each file in the system that has been modified, copied, moved, renamed, or otherwise changed. In addition to the data of changed files, an incremental backup comprises the data of any new file. A system directory tracks which file versions in the backup repository comprise the current versions of a system's files. Full copy backup is currently the method of choice due to the fact that full and incremental backups may be saved to a sequential media (which is dramatically cheaper than seekable media).
Unfortunately, full copy versioning has several disadvantages. First, full and incremental backups require is a significant amount of storage space. For example, a 0.5% change in one Gigabyte of mixed file types may be spread through 15% of the mixed files, while a 0.5% change in a one Gigabyte database may be spread throughout 100% of the database's files. An incremental backup can therefore be 15-100% of the size of a full backup. For a database or similar system, full copy versioning may represent little savings over performing consecutive full system backups.
Second, although full and incremental backups may be stored as streams in a sequential media, the totality of files representing a particular system state are not stored in the sequential media as a "continuous" stream. Thus, if it is desired to reconstruct a system's files (even a most current version of the system's files), the desired versions of the system's files must be sought out from a myriad of full and incremental backups. As is well known in the art, sequential media is not easily seekable.
Third, due to the size of full and incremental backups, and the great amount of data to be managed by a backup repository, it is difficult, if not impossible, to merge incremental system backups with an original full system backup. As a result, a significant amount of duplicated data is stored within a backup repository, and management of the data can become an overwhelming task. It is therefore desirable to repeat full system backups on a periodic basis thereby reducing the amount of stored data to be managed. However, replacing incremental system backups with a new full system backup increases the number of I/O operations required of a backup repository, and erases historical system data.
Fourth, since the periodic full and incremental backups contain much duplicated (unchanged) data, physical storage requirements are immense, and media costs are high. However, despite the fact that a significant amount of duplicated data is saved in a full copy versioning system, system redundancy (or even file redundancy) does not exist. Though most of the data in an incremental backup may be duplicative of data contained in other full or incremental backups, only one copy of a particular version exists (a system or file version can only be retrieved from a single storage means, and not from either one of a redundant storage means). Archiving the already large amount of duplicative data makes system redundancy a very unattractive prospect.
The second known method of version control is commonly referred to as "revision control." Revision control systems are used to store detailed information concerning the changes made to computer source code. Source code changes are stored as deltas in an online seekable library. When a source code version is required, a base version is retrieved to disk, and the appropriate deltas are merged with it. An advantage of RCS is the ability to store deltas (only the changes to files), and merge particular deltas to construct either a prior or future system version. However, problems still remain. A first disadvantage of revision control is that it requires a seekable storage media. Furthermore, data (source code) must often be stored in a specific format (e.g., newline delimited) for efficient operation of the system. This often creates inefficiencies when revision control is used in conjunction with non-text files. Additionally, revision control is limited to use with small to medium size systems due to the fact that 1) online storage libraries create excessive seekable media storage costs, and 2) revision control delta merging is iterative, requiring an excessive number of I/O operations to merge many system deltas. Furthermore, the online accessibility requirement of a revision control system does not allow for system redundancy.
A third method of system versioning (backup) is "single entity versioning." Single entity versioning is a variant of revision control. However, single entity versioning differs from revision control in that it uses pointers to retrieve particular instances of a system data element (file). Storage media must therefore be seekable. On a large scale, single entity versioning has the same disadvantages as revision control.
It is therefore a further object of this invention to provide a computer apparatus and method wherein data can be directly retrieved from, and recorded to, a sequential storage media.
It is yet another object of this invention to provide a computer apparatus and method which do not require seeking within a sequential media.
It is also an object of this invention to provide a computer apparatus and method which allow for backup of large computer systems, with minimal physical storage requirements and costs.
It is another object of this invention to provide a computer apparatus and method which operate using deltas, thereby allowing the best data compression and the simplest version management.
A further object of this invention is to provide a computer apparatus and method which allow rapid retrieval of prior or future system versions with a minimum number of I/O operations.
Another object of this invention is to provide a computer apparatus and method which eliminate the need for periodic full system backups.
An additional object of this invention is to provide a computer apparatus and method which eliminate data duplication within a single storage means (e.g., a backup repository), but allow for cost efficient storage of data in independent and redundant storage means.
Yet another object of this invention is to provide methods of creating and merging inverse system deltas, wherein the merging of system deltas or inverse system deltas may be used to retrieve a prior or future version of a system's data.