This invention relates to the delivery of information from a sender to a recipient and, more particularly, to the automatic verification of the accurate delivery of electronically encoded information to a recipient.
Information often consists of separate pieces that are to be delivered individually. Such a situation frequently arises when there is a new release of electronically encoded information that is used to supplement or update an existing information base held by a recipient; for example, when a new version of a software system is released by a software publisher. The process of delivering these new releases of information consisting of separate pieces can be cumbersome. The recipient must not only reassemble the pieces, but must ensure that all the pieces have been received and are error free. The process can be very labor intensive and can require the active participation of both the sender and the recipient.
As mentioned above, one example of a situation in which the delivery of pieces of information is involved is the propagation of a software release by a software publisher. A new software version may consist of tens to hundreds modules of source and/or object code, data and text files released together. These modules are desirably delivered and installed together in order to guarantee accurate performance of the new release. Often, however, the modules are transmitted and copied asynchronously, i.e. at separate times. The actual transmission can take place via electronic mail, file transfer protocols, diskettes, or other means. These separate transmissions must then be assembled to reproduce the complete release package and checked for inaccuracies and omissions.
As another example, electronic document transmission frequently involves delivery of multiple items. A document can consist of several pieces, such as drawings, text, and audio. New versions of that document are likely to involve changes in some but not all of these pieces. Here, as in the case of software propagation, accurate delivery and assembly of the various pieces must be ensured.
Currently, in the software field, propagation of new versions is carried out with a great deal of human intervention. New software modules are sent or carried to each recipient where the sender then decides whether all the pieces for a new release installation are in place. An installer then proceeds to expand any compressed modules and/or compile and link the modules to generate the new release for the recipient. This process is prone to errors and is especially cumbersome if the number of recipients is large. Among the problems that face software distribution today are the following:
1. it may represent a lot of work for the installer of the software; PA1 2. it may require the installer of the software to be extremely familiar with the new release; PA1 3. the contents of the version may not be certain (some old modules with the same name may be mixed up with the newer version of other modules, or a number of modules may be corrupted because of errors).
Checksums, cyclic redundancy codes and error detecting codes have been employed to characterize pieces of data in situations where there is a release of multiple items of data and/or programs. A 16-bit check sequence which is data-dependent and calculated on the contents of the data field, is found in U.S. Pat. No. 4,964,127 of Calvignac, et al. That system is applied to data transmitted along a data path, presumably in digital format.
Combining checksums in a triply redundant data base to determine if the information in any of the copies contains errors or if any of the information is missing has been addressed in a paper in IEEE Transaction on Parallel and Distributed Systems, vol. 2, No. 2, April 1991, called "A Class of Randomized Strategies for Low-Cost Comparison of File Copies," by Daniel Barbara and Richard J. Lipton. In that application, two sites, each with a copy of the file, exchange checksums or a vector of combined checksums. Each site then generates its own checksum to be compared with the checksum that was sent. By comparing checksums or vectors, each site can determine which pieces of information may be damaged or missing in their local copy. This system, however, acts only as a check to ensure that two copies of the same file are identical. A given recipient can only identify inconsistencies. A voting scheme is needed to determine which copy is valid. The described system, however, cannot deal with the release of new information from a sender in the form of additions, substitutions or revisions.
U.S. Pat. No. 4,864,569 to DeLucia et al. discloses a method of checking releases that involves storing code in a data base. Each time a program is run, the system makes a line by line comparison of the code with the new release. Any differences that are found are checked against a data base of changes stored in the system's memory. If the differences comport with changes logged in the data base, the user is assured of having the new version. If the differences do not compare with the stored changes, a message is sent indicating errors in the updated version.
This system operates in several steps: the comparison of the new code to the old code, the comparison of the differences between the new and old codes with the changes logged in the data base, and the storage of this information. A more efficient method of verifying accuracy of the releases is desired. In addition, methods using comparison systems of the type described in this reference do not identify which parts of the program are missing. It is desirable for a system to indicate specifically to the recipient which modules are missing or damaged.