In the current environment of computer networks characterized by an exponential growth in the circulation of soft-copy or electronic text documents such as e-mail over unsecured media e.g., the Internet this, combined with the possibility for any one of easily printing and photocopying a hard-copy of the same text documents, a key issue is authentication. It should be possible for the recipient of a text document, be it an electronic message or a hard copy of it, to make sure of its origin so that no one should be able to masquerade as someone else. Also, it should be possible to verify it has not been modified, accidentally on maliciously, en route. To this end methods have been devised to perform authentication.
The standard solution, which fits well with electronic text documents, consists in adding a MAC or Message Authentication Code to soft-copy text documents. A MAC is a digest computed with a one-way hash function over the text and which is also made dependent on a key e.g., a secret-key known only to the sender and the receiver in order this latter can check first, that what it received has well been originated by whom shares the secret-key with it and second, that the document has not been altered. For example, Secure Hash Algorithm or SHA specified by the National Institute of Standards and Technologies, NIST, FIPS PUB 180-1, “Secure Hash Standard”, US Dpt of Commerce, May 93, produces a 160-bit hash. It may be combined with a key e.g., through the use of a mechanism referred to as HMAC or Keyed-Hashing for Message Authentication, subject of the RFC (Request For Comment) of the IETF (Internet Engineering Task Force) under the number 2104. HMAC is devised so that it can be used with any iterative crypto-graphic hash function thus, including SHA. Therefore, a MAC can be appended to the soft-copy of a text document so as the whole can be checked by the recipient. Obviously, this method does not work on hard-copy text documents since it assumes the addition of checking information to a file. Moreover, this scheme has the inconvenience of indeed separating text and checking information. Thus, this latter can easily be isolated and removed intentionally, in an attempt to cheat, or accidentally just because intermediate pieces of equipment in charge of forwarding the electronic documents are not devised to manipulate this extra piece of information. Then, the checking information should rather be encoded transparently into the body of the text document itself i.e., in a manner that does not affect text readability whatsoever, so that it remains intact across the various manipulations it is exposed to on its way to destination still enabling the end-recipient to authenticate the document.
Another type of approach to authentication which applies mainly to soft-copy images (which thus may also be used on the image of a hard-copy text document still failing to work directly from hard-copy though) consists in hiding data into their digital representation therefore, meeting the above requirement that checking information should better be merged into the document itself. Data hiding has received a considerable attention mainly because of the copyrights attached to digital multimedia materials which can easily be copied and distributed everywhere through the Internet and networks in general. A good review of data hiding techniques is in ‘Techniques for data hiding’ by W. Bender and al. published in the IBM Systems Journal, Vol. 35, Nos 3&4, 1996. As an illustration to the way data hiding may be carried out the most common form of high bit-rate encoding, reported in here above paper, is the replacement of the least significant luminance bit of image data with the embedded data. This technique which indeed meets the requirement of being imperceivable (the restored image is far to be altered to a point where this would become noticeable) may serve various purposes, similar to authentication including watermarking, aimed at placing an indelible mark on an image or tamper-proofing, to detect image alterations especially, through the embedding of a MAC into the soft-copy image.
However, having to consider a text as an image would be a very costly and inadequate solution in term of storage and bandwidth necessary to transmit it. Although, as stated in here above paper, soft-copy text is in many ways the most difficult place to hide data due to the lack of redundant information in a text file as compared to a picture the manipulation of white spaces i.e., blank characters and more specifically inter-word blank characters purposely inserted by the originator of a text document, in excess of what is necessary to make a text readable, is the most simple way of marking a text that is susceptible to be authenticated without the addition of a separated MAC since the information necessary for the checking is then imbedded, somehow hidden, into the text itself, under the form of blanks, that the casual reader is unlikely to take notice of.
Therefore it is an object of the invention to provide a method to merge the information necessary to authenticate a text document into the body of the document itself.
It is another object of the invention to have this method applicable to both soft-copy and hard-copy text documents.
Further objects, features and advantages of the present invention will become apparent to the ones skilled in the art upon examination of the following description in reference to the accompanying drawings. It is intended that any additional advantages be incorporated herein.