With the development and wide uses of E-commerce and electronic communications and transactions, government agencies, enterprises and public institutions, political party affiliated institutions and government-affiliated institutions, and organizations and agencies for national security, etc. will process a great number of written materials including important files and documents, such as contracts, classified or confidential documents and the like. The copyright protection issues and the protection and security of the contents of these documents are critically important. The digital watermark technology provides one approach to addressing the above issues.
Digital watermarking embeds specific information in digital signals which may be audio signals, image signals, video signals, or the like. Watermarks may be classified as appeared watermarks and concealed watermarks. The former is visible and the included information can be shown to the user when they are watching the image or video. Generally, the appeared watermark comprises a name or a symbol of the copyright owner. The TV station's logo arranged in a corner of television shows is one type of the appeared watermark.
In a concealed watermark, the information in numeric data is embedded in the audio signals, image signals, or video signals and is generally invisible. An important application of the concealed watermark is copyright protection, which is desired to avoid or prevent the unauthorized medium files from duplicating and copying. The steganography, which allows the users to communicate with each other using the information concealed in the digital signals, is also an application of the digital watermark. The annotated data in a digital picture, which can record the time for capturing a picture, the aperture and shutter speed used for capturing the picture, even the brand of the camera used or other information for the captured picture, is also an application of the digital watermark. Some file formats may comprise the above and other information as additional information referred to as “metadata”.
In addition, many text files including a great number of western language documents may be disseminated or transmitted not only in digital form but also in printed or photocopied form on paper or the like. With the increase of internationalization, communication via western language documents becomes more frequent, and thus there is a strong demand for protecting such western language documents. With the rapid development of the digital technology, communications via documents in printed or photocopied form on paper based on document in digital form have become considerably popular and ubiquitous, so that much important or classified information is leaked during dissemination or transmission of documentation in paper. Thus, it is important to research out such a binary text watermark technology that is capable protecting the information from being printed and photocopied.
1. Chinese Patent Application No. 200710121642.7 discloses “a method and device for embedding digital watermark into a binary image”. The disclosed method comprises a step of partitioning a part of or the whole binary image into at least two watermark image blocks, a step of obtaining multiple groups according to the number of black pixels in each watermark image block, a step of applying Hadamard transform to data in each group. The watermark signals are embedded through a quantitative method, and the pixels to be changed in each watermark image are transformed with the Hadamard transform, so as to facilitate embedding or extracting of the watermark.
2. Chinese Patent Application No. 200810055770.0 discloses “a method and device for embedding digital watermark into a binary text image”. The disclosed method comprises a step of partitioning a part of or the whole binary image into a part to be embedded and a part to be adjusted, a step of calculating an average value of the number of black pixels in each group of the part to be embedded and the part to be adjusted, a step of determining a color change parameter according to the calculated average value and the number of black pixels in each group of the part to be embedded, and a step of adjusting the number of black pixels in each group of the part to be embedded and the part to be adjusted according to the color change parameter, so as to embed the watermark.
3. Chinese Patent Application No. 200810055770.0 discloses “a method and a device for embedding and extracting digital watermark into and from a black-and-white binary text image”. The disclosed method for embedding comprises a step of locating and grouping the valid character zones to obtain the number of the black pixels in each character zone. A first number of pixels to be turned will be calculated according to relation between the number of black pixels in respective character zones, watermark information bits string, and a first step length. The method for embedding further comprises a step of turning pixels in each character zone according to the first number. The disclosed method for extracting comprises a step of locating valid character zones in a text image; a step of grouping the valid character zones to obtain the number of the black pixels in each character zone; and a step of extracting a bits string of embedded watermark information according to relation between the number of black pixels in respective character zones and a first step length.
In the above mentioned prior solutions, it is important that the watermark image block is served as the zone to be embedded. It can be seen from the above patent applications that the watermark image block is directly served as the zone to be embedded in the first of the patent applications mentioned above; the zone of the binary text image is grouped as the zone to be embedded, i.e., the watermark image block, in the second of the patent applications mentioned above; and the grouped valid character zone in the text image is served as the watermark image block in the third of the patent applications mentioned above. In the above patent applications, the watermark is embedded through changing the number of black pixels in the watermark image block, and the watermark is extracted through quantifying the number of black pixels in the watermark image block.
Therefore, the above methods are based on two premises.
Premise 1. The partitioning results of characters shall be correct. Nowadays, the algorithms for partitioning characters generally depend on the character recognition results of the OCR (Optical Character Recognition) system. However, the OCR mechanism generally is not used in the digital watermark system in consideration of the recognition speed and efficiency of the OCR system. Moreover, there is a certain error rate when the OCR system recognizes the touched western language characters; and
Premise 2. The variation range of the number of black pixels in the watermark image block is not very large. For example, in a Chinese character document, a Chinese character is served as a watermark image block. The Chinese characters are presented in block form and the area difference between each Chinese character is small. Accordingly, the variation range of the number of black pixels in the watermark image block is not very large, and thus the accuracy of embedding and extracting the watermark can be ensured.
However, the above methods are not suitable for the western language documents due to the following problems.
a). The touching between adjacent western language characters occurs frequently before and after printing of the western language characters, and thus it can be difficult to ensure the characters (for example, “mn”, “tt” or the like) are consistently partitioned before and after print-and-scan. Given that a single western language character is served as a watermark image block, the touching between characters will necessarily influence the resynchronization of the partitioning sequence for the character image block before and after the watermark is embedded and extracted, and thus will necessarily influence the success rate for embedding and extracting the watermark.
b). The length difference between different words in a western language is relatively large, and thus the variation range of the number of black pixels in the watermark image block tends to be large. For example, in the phrase “My extraordinary power”, the length difference between words is significant. If a word is served as a watermark image block, the difference in the number of black pixels in the watermark image block will be very instable, and thus it is impossible to carry out the watermark processing.
c). The change in the font size for western language characters leads to a change in the size of the characters. For example, the difference in the number of black pixels included in “Here” and “Here” is very large. And thus it is needed to apply various quantitative methods to documents with various font sizes.
Therefore, for the western language characters, the following conditions need to be satisfied in order to obtain the watermark image block:                1. the effects from the desynchrony of the watermark image caused by the touched characters shall be avoided;        2. the difference on the number of black pixels in the watermark image shall be small; and        3. for different font sizes of documents, the watermark image blocks shall be adaptively partitioned according to different sizes.        