1. Field of the Invention
This invention relates to the synchronisation of data files, and in particular to efficient methods for updating one or more remote data file to coincide with a source data file, over a communications link, network or the like.
2. Related Art
There are numerous situations, particularly in the field of computer networks, where several computers store data files which are intended to coincide with one another. Additions and/or alterations to the data file may typically be made at one of the computers, and then it is necessary to make the same additions and/or alterations to the files at the other computers to ensure they are all synchronised with the latest version of the file. The simplest way of achieving that synchronism is to merely communicate the entire changed file to all of the relevant computers and replace the entire existing files thereat. However, where the computers communicate the data to one another over a communications medium such as a telecommunications network or the like, it is frequently desirable to reduce the amount of data to be communicated, such as to save time in transferring large amounts of data over the limited bandwidth communications link. Thus, it is desirable if only the changed portions of the file can be distributed from the computer at which the file was changed to the other computers, so that the changed portions can be combined with the unchanged portions to reconstruct the entire file.
One application in which this function is useful is in distributing computer software or other data files over the Internet or World Wide Web. For example, software distributed by a vendor to users over the Internet may be periodically updated by the vendor. Even if only a small portion of the program code is changed by the update, it may be necessary for all of the users to download the new program file from the vendor computer server. Systems employing the above mentioned scheme of determining the differences between files and transferring only the changed portions have been proposed, but also suffer from some drawbacks. One known system involves summarising by sections the version of the file held at the user""s computer and sending the file summary (compressed to a fraction of the actual file) to the server. At the server the received file summary is compared to the (updated) source file to determine the differences therebetween, and the server then transmits back to the user""s computer the unencoded portions of the source file which are not found in the version of the file existing on the user""s computer. The new file portions are then inserted into the existing file to form a file which corresponds precisely with the source file.
One difficulty which is inherent in the above described system is that a large computational burden is placed on the server computer if files at numerous user""s computers require updating. For example, the server computer must analyse the file summary from each user by comparing it against the source file. When hundreds or thousands of users are desirous of updating their data files, the operation of the server computer can be slowed considerably, which may negate the time savings of employing the updating scheme or necessitate the use of powerful processing equipment at the server.
In accordance with the present invention, there is provided a method for synchronising data between a receiving computer and a sending computer, wherein the sending computer has a source file and the receiving computer has a reference file and the receiving and sending computers are coupled for communication therebetween by way of a communications link or network, the method comprising the steps of:
i) arranging the source file at the sending computer into a sequence of data blocks, each block comprising a predetermined number of data units, and computing a source key value for each block in the source file;
ii) transmitting the source key values from the sending computer to the receiving computer;
iii) at the receiving computer, comparing the source key values with reference key values computed for each predetermined number of contiguous data units in the reference file to determine matches between source key values and reference key values;
iv) communicating from the receiving computer to the sending computer an indication of which source keys do not have matching reference keys, and transmitting data blocks from the source file corresponding to the unmatched source keys from the sending computer to the receiving computer; and
v) constructing at the receiving computer a target file from the contiguous data units in the reference file determined to have reference key values matching respective source key values and the data blocks from the source file received from the sending computer, wherein the constructed target file at the receiving computer is synchronised with the source file at the sending computer.
Preferably the source key values for the sequence of source file data blocks are pre-computed and stored for subsequent use. In one form of the invention, the sending computer and receiving computer are coupled to communicate by way of an intervening computer containing a cache memory, and wherein a copy of the source key values are stored in the intervening computer cache memory and provided therefrom to the receiving computer.
The present invention also provides a method for constructing a target data file at a first computer from a reference file stored at the first computer and a source file at a remote second computer such that the constructed target file is synchronised with the source file, comprising the steps of:
i) requesting and receiving from the remote second computer a source file summary comprising a sequence of source key values being codes derived from data blocks of predetermined length making up the source file;
ii) generating a reference key value for each contiguous portion of the reference file of predetermined length and comparing the reference key value with the received source key values, to determine matches therebetween;
iii) requesting and receiving from the remote second computer those data blocks from the source file for which no match was found between the corresponding source key value and the reference key values; and
iv) constructing a target data file from the received source file data blocks and those contiguous portions of the reference file for which the corresponding reference key value was found to match a source key value, wherein the constructed target file is synchronised with the source file.
In one form of the invention, the first and second computers are coupled to communicate over a computer network including a proxy computer which is closer or more conveniently located to communicate with the first computer than is the second computer, and wherein the step of requesting and receiving the source file summary includes providing the source file summary to the first computer from a copy of the source file summary generated at the second computer and previously received and stored by the proxy computer. Furthermore, the step of requesting and receiving data blocks for which no match was found may include providing those data blocks to the first computer from the proxy computer from a copy of the source file data blocks previously provided from the second computer and stored in a cache memory at the proxy computer.