1. Field of the Invention
The present invention relates to an image processing technique of extracting watermark information embedded in a document image based on the line spacing between the character strings in the document image.
2. Description of the Related Art
As a technique of adding information of, for example, copyright or copy control to a document image, a method described in Kineo Matsui, “Basics of a digital watermark”, Morikita Publishing Co., Ltd. pp. 198-199, 1998 (ISBN:4-627-82551-X) is known, in which information is embedded using line spacing (to be referred to as a line spacing watermark hereinafter). FIG. 2 is a view showing the concept of a line spacing watermark. To extract embedded information from a document image using a line spacing watermark, first, the line spacing between the character strings in the document image is obtained. To obtain the line spacing, generally, a histogram is obtained by fully scanning the document image. The line spacing is derived from the histogram. Then, information is extracted in accordance with the derived line spacing and a rule used for embedding. To embed, e.g., binary information “0”, line spacings U and D are set to U>D, as shown in FIG. 2. On the other hand, to embed binary information “1”, the line spacings U and D are set to U<D.
However, the above-described method of extracting information embedded in a document image using a line spacing watermark has the following problems. To measure the line spacing, it is necessary to fully scan the document image and obtain a histogram. Hence, an accurate information extraction process is time-consuming. In particular, when copy control information is embedded, the copy control information is extracted in a copying machine, whether or not copy is possible is determined based on the extracted information, and then, a copy process is performed. The series of processes of copying one document takes a lot of time. Additionally, when the character string direction copying one document takes a lot of time. Additionally, when the character string direction and scanning direction of the input document image are tilted with respect to each other, no line spacing can be derived from the histogram. In this case, the document image needs to be input again. Alternatively, cumbersome image processing of, for example, rotating the input document image is necessary.