The present invention generally relates to information security fields, and particularly to information security fields utilizing digital watermarking techniques and information hiding techniques.
Paperless office environment and electronic transactions are extensively adopted in current business world. Many important documents, such as wills, forms, identification and contracts, etc., require strict authentication and integrity assurance. For those documents containing sensitive information, even a small revision, such as adding, deleting, or modifying a paragraph, phrase, or word, is not allowed since it may cause great change to the meaning of the content and lead to great damage in business activities.
Digital signature is a traditionally well-known technique to verify the integrity of electronic content. This technique firstly generates a digest of the content by a one-way hash function, and then encrypts the digest by using the author's private key and appends it to the content to be signed. The whole procedure is the so-called digital signature. People who have the corresponding public key can decrypt the digest and verify whether it's the same as the hash value of the received content.
However, in most real applications, paper is still in an indispensable position. Signed documents are often printed out or faxed. In such cases, since digital signature requires side information (encrypted digest) transmitted together with the electronic document itself, it has noting to do with the case where electronic documents are printed out. Furthermore, digital signature can only ensure the integrity of the document, but cannot hide any additional information that the author will not let others to directly see.
Furthermore, with the development of digital techniques and the Internet, digital watermarking techniques have become a hotspot of multimedia information security research fields and an important branch of information hiding technique research fields. The techniques verify ownership of the data by embedding watermark information into original data. Such embedded watermark may be a segment of characters, identification or serial numbers and the like, and is often invisible or unobservable. The watermark is tightly combined with the original data (for example, text, image, audio or video data) and hidden therein, and can be maintained after being subjected to operations which do not destroy use value or commercial value of source data.
Text watermarking usually refers to the watermarking for text documents. A typical text document consists of regular structures including words, inter-word spaces, lines, paragraphs, and sometimes equations and graphs. Unlike data hiding method in still images and videos, there is less space in the text document to hide information.
In general, watermarking can be used in two kinds of applications: copyright protection and integrity check. For the two kinds of applications, the corresponding attack models and the technical requirements differ a lot. When used for copyright protection, the purpose of attack is to make the watermark irretrievable. Since text watermarking is vulnerable to deliberate destroy, it may encounter big technical challenges when using it for content protection. But in the application of integrity check, the potential attack is not to remove the watermark, but to modify the meaning of the watermark or the content. So, in this case, some deliberate destruction, such as non-linear processing, is not very important, but the robustness against distortions caused by normal printing, copying, and scanning is still required to match real-life applications.
Existing watermarking techniques are mostly symmetric watermarking techniques. In general, the integrity protection by means of symmetric watermarking techniques is realized by firstly encrypting the signature by applying public key algorithms into the digest of the text, and then hiding the encrypted digest into the document by symmetric watermarking techniques. However, for some applications, in addition to the need of verifying the integrity, there exists the need of adding extra secret information, for example, information which the author might not want others to see. If such information is embedded by means of symmetric watermarking, then a person, which is authorized to detect a watermark, can easily forge another watermark without permission, since embedding and extracting of the symmetric watermark can be derived from each other.