It has long been the case that computer systems were unable to distinguish between a given file and an exact copy of that file. In computer systems, when a file is copied to create another file, it is replicated bit for bit in such a way that the two files have identical content. Thus, it has been impossible to know whether a file is an original or a copy by simply looking at the file's content.
In situations where it is important to be able to identify a file as an “original” file, strategies have been developed to designate a file as an “original” and to protect it with security, agreements, and procedures that make it possible for a custodian of a file to certify that it is the agreed upon “original.” However, the precise meaning of “original”, and the procedures taken, have varied significantly from one situation to the next.
For instance, many computer programmers are familiar with source code version control tools such as the Source Code Control System (“SCCS”) and the Revision Control System (“RCS”). These programs can also be used for version control of files that contain content other than source code, such as the text of technical articles. Similar document management software, which tends to be used more with natural language texts than with program source code, often uses the same library paradigm as RCS and SCCS. That is, managed files reside in a library, which may be implemented as a directory with file system access restrictions. A file must be “checked-out” of the library for editing by the file's creator or another authorized person. The version control program detects and records any revisions to the document in question after it is “checked-in” (returned to the library). Locks may be imposed, so that only one person can edit the file at a time; other people must wait until it is checked-in and then check-out the file in turn to edit it.
Details about the actual structures and steps used by at least some version control programs are available on the Internet; some of that information is summarized here for convenience and/or is provided elsewhere in the application file history. Version control programs keep an administrative history file which records the changes made to a given library file. They track the original library file's content, and they also keep “delta” information that reflects subsequent changes to the original content. RCS keeps the original content and the deltas in a single file, while SCCS maintains several files. Both these programs follow and rely upon specified file naming conventions. Both of them also rely on conventions regarding directory paths: by default, RCS looks for RCS files in the current directory or in an RCS subdirectory, or an alternate can be specified; the SCCS front end looks for SCCS files in an SCCS directory, but a full filename may also be specified.
Because they focus on version control, these programs are apparently concerned more with reconstructing file content for a given version than with preventing changes to that content. Indeed, content changes are expected. Changes are somewhat controlled (e.g., by the file check-out procedure), and changes are normally tracked so that different versions can be reproduced. But unauthorized changes to an original file would not be difficult to make. For instance, one can apparently substitute a new “original” into RCS or SCCS, with the same or different content, by simply specifying an alternate file path and name. Also, it is apparently possible for someone who has sufficient file system access privileges to effectively change the content that these programs treat as “original” by accessing the file directly rather than going through the version control system's check-out procedure, editing the original content, and then overwriting the original file with the edited version. Care would need to be taken to avoid edits that are inconsistent with delta information, e.g., by removing entirely a line of text that is referenced by a delta so that the reference fails to find an operand. But many substantive edits could apparently be made, after which the version control system would treat the edited version as if it were actually the original. Some version control systems compute a checksum and place it in the revision file, but that checksum could be replaced by another which is computed from the edited content that will masquerade as the “original” content. In short, version control systems may treat either or both of two or more files as the “original” file when the content and context of the file meet relatively loose restrictions.
Another situation which deals generally with a distinction between an “original” and a copy is the installation of software that is “copy protected” to discourage unauthorized reproduction. However, the important distinction in this situation is not between a single original file and copies of that file, but is rather between one or more authorized copies of some master file, on the one hand, and one or more unauthorized copies of that master file, on the other hand. There may be many authorized copies of an executable file, for instance, since there may be many licensed users, and each authorized copy is treated as that user's “original” of the software.
The authorized copies of a given copy protected file may be bitwise identical, or they may differ slightly, depending on the copy protection scheme that is used. For example, U.S. Pat. No. 5,513,260 discusses copy protection technology which checks for an Authenticating Signature on compact disks. When an illicit copy of a protected disk is made, it may contain a faithful replica of the disk's program data but it will lack the Authenticating Signature and thus be distinguishable from authorized copies. U.S. Pat. No. 5,615,061 discusses copy protection technology which uses bad disk sectors to generate an identification number identifying a particular magnetic storage device. The identification number is placed on a software distribution disk the first time the software is installed from that distribution disk. When installation is requested, the distribution disk is checked to see if it already has an identification number; if so, then the software was already installed from that particular distribution disk, and it may be installed again only if the identification number on the distribution disk matches the identification number generated from the bad sectors of the magnetic disk that would receive the installation. One authorized distribution disk may thus differ from another by having a different identification number.
A situation that distinguishes between an original and copies in yet another way is discussed in U.S. Pat. No. 5,319,562. This patent discusses technology for purchasing postage with a personal computer and then printing metered envelopes. The data stream containing a postage meter mark could be captured on its way to the printer from a metering program, and placed in a file instead of being immediately printed. If the image of a metered envelope were captured in this manner, it could be printed an unlimited number of times without using the patent's postage metering program. Accordingly, the postage program assigns a unique serial number to every printed envelope. The postage program also directly controls the printer to prevent end users from printing more than one copy of any envelope with the same serial number. By capturing and storing the serial numbers on all mail pieces, and then periodically processing that information, the postal service can detect fraudulent duplication of metered envelopes. Apparently, unused duplicates are harmless, and using only one duplicate while discarding the original would not burden the postal service with letters for which no postage was paid. From the postal service perspective, the important distinction is thus not between the originally printed meter mark and some copy of it, but is instead between the first used meter mark and any subsequently used meter marks, regardless of whether the first used mark was the original printed by the postage program or was a duplicate of that printing.
Another situation in which distinctions can be made between an “original” and a copy is the situation in which a legitimate program is replaced by a “Trojan horse” that masquerades as the original program but also performs hidden functions such as copying passwords, copying or altering files, tracking user activity, and so on. This situation is similar to the version control situation discussed above, in that distinctions based on file content are important. It resembles copy protection situations in that many legitimate copies of a program may co-exist, with each treated as a given user's “original” program.
Encryption and related technologies such as steganography have been used to create digital signatures and digital watermarks that can be used to authenticate digital documents, that is, to determine whether the digital content of a file has been altered. Examples and references are discussed in U.S. Pat. No. 5,765,176, which is titled “Performing Document Image Management Tasks Using an Iconic Image Having Embedded Encoded Information.” The iconic images discussed in this patent are reduced size partial copies of larger documents, and the images may be embedded in their respective larger documents. An iconic image appears at first glance to be a simple “thumbnail” representing the larger document, but the iconic image can hold encoded information such as a digital signature computed from the original content of the larger document, and a URL for locating a file containing that content.
U.S. Pat. No. 6,144,745 discusses technology for retaining and verifying file data on a recording medium. Logs, digital signatures, hash functions, time data, and medium identification numbers are discussed. Claim 1 is directed to a method of retaining N+1 documents on a recording medium such as a magneto-optical disk. The method apparently computes an authenticatorN based on data in a documentN, records authenticatorN and the data of documentN on the medium, and then computes an authenticatorN+1 based on authenticatorN and data in a documentN+1. That is, the authenticator for the second, third, fourth etc. document apparently depends not only on the content of that document but also on the authenticator for the previous document. According to the Summary of the Invention, this makes it “possible to warrant a continuity of the documents and easily detect illegal acts such as a falsification of an intermediate document and a disposal of the intermediate document, and to therefore restrict the illegal acts against the documents.”
In summary, distinctions between an “original” file and copies of that file can be made in a variety of ways. Version control programs and Trojan horse detection tools try to distinguish between file content that is treated as original (but might not be) and revisions of that content. Copy protection technologies distinguish between authorized copies and unauthorized copies, and permit many users to each license their own authorized “original” files. Postal metering programs and procedures distinguish between the original use of a serialized meter mark and any subsequent use, regardless of whether the originally used mark was on the originally printed envelope or copied onto another envelope. Digital signatures may be used to determine if the content of a digitally signed file was altered after the signature was first computed, and may be used in some cases to help determine whether files that were originally recorded in a particular order have been rearranged and/or copied to another medium. Existing approaches tend to treat any file as an original if enough bits in the file and in its immediate context satisfy the authenticity criteria, even when the file is not a unique original.
It would be an improvement in the art to provide a new computer implemented method for distinguishing a single file from all other copies in an automated way. This would allow identification of a unique original of a file, of serialized copies of a file such as certificates or “certified copies” of an original digital file, and of administrative copies such as backups or mirrored copies. Tools and techniques for creating, managing, and using an original file based on its unique physical location are described and claimed herein.