Document retrieval is usually done physically, e.g. by marking documents in files according to their subject and time sequence. This is usually done in lists defined for retrieval purposes. These lists can only be made after collecting the corresponding information the documents are carrying. Retrieval of documents is usually done according to the information depending on the subject and time sequence in this lists. A name commonly used for such a list is called index.
More compact storage and retrieval is possible by storing the contents of paper files on microfilm, or in computer-readable form, on magnetic tapes, compact discs, digital versatile discs and the like.
It is also needed to provide an identical image of the document to be stored and retrieved, regardless of whether the said document is computer-generated, hand-written or other. This may be done by scanning a hand-written document or a hand-made drawing or a hand-signed document, or by collecting the pixel information during the printing of a document. A method for doing the latter is described in European Patent EP 0 838 061 B1.
In a computer-generated document, it is needed for retrieval purposes to provide a document with a means to identify the origin of a document and a way of distinguishing the original printout from a later printout. Such a “fingerprint” should be added to the document during the original printing process. This “fingerprint” is computed as a hashcode, which is comparable in its uniqueness with a DNA sequence of an individual living being.
It is not possible to retrieve the contents of the document from its hashcode, in the same way as it is not possible to reconstruct an individual living being from its DNA. But in both cases, the hashcode and the DNA are unique in their relationship to its basis, i.e. the document and the individual living being.
Because printing and scanning are dynamically ongoing processes which are never exactly the same in time, location and behaviour of the mechanical parts even if the data to scan or print are the same, their hashcodes are never the same even when the information of the document visible to the eye of the onlooker seems to be the same. To detect these differences, an extremely high sampling rate has to be used to monitor the dynamically ongoing process of printing and scanning on a realtime scale.
The present invention is directed to overcoming one or more of the problems set forth above.