Copies of files are frequently transmitted over a network from one computer to another computer. One reason to copy a file is for backup purposes. If a file created on one computer has been backed up on another computer, it can be easily recovered in the event the hard drive of the first computer fails. Because the loss of a file could mean the loss of extremely important data, and/or result in significant effort to recreate the file, file backup processes are very common. However, file backup has at least two problems associated with it: first, it can require significant network bandwidth to transfer file data from the first computer to the backup computer, and second, it can require significant storage space to maintain copies of files. Both of these problems can be alleviated to some extent through the use of an incremental backup. An incremental backup copies only those files that have been changed since the previous backup. Incremental backups can significantly reduce the number of files that are backed up on a periodic basis.
Typically, when a file is modified, only a small portion of the file is actually changed from the previous version. While an incremental backup can reduce network bandwidth and save storage space compared to a complete backup, it is still inefficient in that a complete file is backed up even though it is possible that only a small portion of the file was actually modified. In an attempt to improve upon incremental backups, backup processes exist that identify the differences between two versions of a file, and attempt to backup only those differences. This is referred to as a differencing process. Differencing processes can reduce network bandwidth and storage requirements because only portions of the file are backed up.
Copies of files are also frequently made for purposes of synchronization or replication. A synchronized file exists in two different locations, such as on two different servers, and changes made to one file must be reflected in the other file. Synchronization usually occurs by periodically copying the file from one location to the other location.
U.S. Pat. No. 5,634,052 discloses a system for reducing storage requirements in a backup subsystem. The system includes creating a delta file reflecting the differences between a base file and a modified version of the base file, and transmitting the delta file to a server for backup purposes. One problem associated with this system is that the base file is necessary to create the delta file that reflects the differences between the base file and the revised file. Thus, if the delta file is to be created on another computer, such as the server, the base file must first be transmitted to the server where the differencing operation is carried out. Moreover, the '052 patent does not disclose optimal mechanisms for creating the delta file.
In a differencing backup system, the differencing mechanism used to create the delta file can be quite important. It is not uncommon for files to be many megabytes in size. A differencing mechanism that processes a file multiple times, or processes a file in an inefficient manner can result in excessive backup times. Moreover, an inefficient differencing mechanism can result in more data being backed up than necessary. In other words, two differencing mechanisms can vary in their ability to efficiently recognize and reflect differences between two files. Also, it would be preferable for a differencing mechanism to be able to determine differences between a base file and a modified version of the base file without actually having to repeatedly process the base file, so that the differencing operation can be performed on a remote computer, without the need to process the entire base file.
U.S. Pat. No. 5,574,906 discloses a system and method for reducing storage requirements in backup subsystems. The '906 patent discloses a system similar to that disclosed in the '052 patent, with the enhancement that the base file from which the differencing operation is derived can be compressed. In certain files, a compressed base file will utilize less bandwidth and less storage space on a computer than would an uncompressed based file. One problem with this approach is that the compressibility of files differs greatly. While compression can significantly reduce the size of some files, compression algorithms do not obtain significant reduction with other type of files. Additionally, the differencing mechanism of the '906 patent works by first compressing the revised version of the file, and upon determining that compressed portions of the base file and the revised file differ, both the base file and the revised file are uncompressed at those locations so that the differences between the two files can be determined. The overhead involved in such compression/decompression algorithms can be significant.
U.S. Pat. No. 5,479,654 discloses an apparatus and method for generating a set of representations of the changes made in a computer file during a period of time. The process disclosed in the '654 patent makes multiple passes through portions of the most recent version of the file to determine the differences between it and the previous version of the file.
Thus, it is apparent that a differencing system that reduces network traffic, efficiently determines and reflects differences between two files quickly, and reduces storage requirements would be highly desirable.