A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention is related to computer software and more specifically to computer software for determining the existence of differences between two personal computer files.
Computers systems store information in files. Some computer programs use files to store data used by the computer programs. For example, the Windows 95 operating system commercially available from Microsoft Corp. of Redmond, Wash. stores information used by the Windows operating system and other programs running under the Windows operating system in a file referred to as the Windows registry. The Windows registry contains values used to control the operation of the Windows operating system and other programs running under the Windows operating system. In addition to these values, the Windows registry also contains keys which identify each of the values. The Windows 95 operating system and many computer programs running under the Windows 95 operating system insert keys and values into the Windows registry so that the operating system and programs can operate properly.
Occasionally, the operating system or one or more computer programs may operate improperly. Among the many potential causes of such improper operation is a corrupt file used by the computer program. For example, if the Windows registry becomes corrupt, one or more computer programs or the Windows operating system may not operate properly. Because the Windows registry can be modified by any of a large number of programs and the operating system, and the Windows registry is subject to other conventional sources of a file corruption such as disk errors, the Windows registry can be a commonly suspected source of improper operation of the operating system or computer programs.
It may be therefore necessary to determine whether the Windows registry has been modified since the last time the operating system or program operated properly. If such modification has occurred, the improper operation may be resolved by restoring certain values of the Windows registry to contain values identical to those which it contained the last time the operating system or program operated properly.
One way of determining whether the Windows registry has been modified is to store a copy of the registry on a separate disk. If a computer program operates improperly, computer support personnel may visually compare each key and each value corresponding to each key in the Windows registry on the computer containing the operating system or program that is operating improperly with the copy of the Windows registry previously stored. If the Windows registry is different from the copy, the Windows registry may be restored from the copy to identify whether the differences are causing the improper operation of the computer program.
This way of determining whether the Windows registry has been modified is subject to several problems. Although the Windows registry file is relatively small, in a company with tens of thousands of computers each containing a Windows registry, storing a copy of all of the Windows registries used by the employees of the company can utilize significant storage resources. Additionally, such a visual comparison is time-consuming and subject to error.
Therefore, a system and method are needed which can quickly and easily determine whether a file, such as the Windows registry, is different from another file, such as a previously-stored version of the Windows registry, without requiring the resources necessary to duplicate every Windows registry and without requiring a visual comparison of the files.
A method and apparatus hashes some or all of two files to be compared, allowing comparison of the hash results to identify whether differences exist between the files (a file can be identified as changed over a period by hashing it at the start and end of the period). Files that hash to a different result may be identified as having differences, and files that hash to the same result may be identified as unlikely to have differences. To reduce the probability that two files that, although different, nevertheless hash to the same result and therefore will be identified as unlikely to have differences, a characteristic of each file, such as the size of each file, may be compared as well. If either the hash or the characteristics are different, the files are identified as having differences. Otherwise, the files are identified as unlikely to have differences. If differences exist between the two files, portions of one file may be used to restore portions of the other file.
The hashing of each of the files may be performed using a hash function that includes exclusive-oring bit values from such file with a finite-sized work area and storing the result in the work area. When all bytes of the work area have been exclusive-ored, each byte of the work area is replaced by a byte which can have a different value, using a translation table of bit values selected from several such tables. After the bytes of the work area have been replaced, the exclusive-or process continues using the replaced values in the work area and any additional values from the file until all of the values of the file have been exclusive-ored into the work area. The work area is then halved in size by exclusive-oring the upper half with the lower half, selecting a table and replacing the result of the exclusive-or of each half of the work area with the corresponding value in the table. The work area is repeatedly halved in this manner until it is four bytes in size to produce a four-byte hash result.