The present invention relates to apparatus and a method for reading a document image, and more particularly, to apparatus and a method for reading a document image to input a content described in a printed document into a computer by structuring the extracted content in a predetermined process.
There is apparatus for reading a document image to take a content in a printed document such as a newspaper article, a book, an office document, and an official document into a computer, and to use them as electronic information which may be usable on the computer. In the apparatus, a printed document is taken into the computer as a file. Then, a layout structure showing how a content of the document taken from the file is arranged in the document, and the logical structure showing how characters or character lines which are sets of characters are related each other in a semantics are extracted from the printed document. And, it is general that a series of process are implemented to reproduce the printed document taken into the computer by coordination of the structures.
There is a method to extract the layout and logical structures by use of close relationship between both the structures. For example, in the paper of xe2x80x9cK. Kise, M. Yamaoka, N. Babaguchi and Y. Tezuka: Organizing Knowledge-Base for Structure Analysis of A document image, Transactions of Information Processing Society of Japan, Vol. 34, No. 1, pp. 75-87 (1993)xe2x80x9d, a document model for indicating a relationship between the layout structure and the logical structure are used. Owing to the above model, the document structure is extracted through applying a predetermined estimation to an input document. In addition, the above model has adopted a frame expression with a capability to describe a structural hierarchy. Thereby, it is possible for a layout description such as a centering to describe variations in each component of the structural hierarchy.
However, conventional apparatus for reading a document image has had such a capability as the apparatus treats printed documents only under specific layout conditions. It has been difficult to flexibly extract desired logical information by analyzing in detail all over various kinds of printed documents.
Furthermore, in the conventional apparatus, it is difficult to precisely process documents described in many languages, those having together many logical elements in one line, or those in which some characters in one line rotate 90 degrees and so on. Moreover, it is also difficult to output extracted information in a desired order or form.
The purpose of this invention is to provide apparatus and a method for reading a document image to extract pieces of desired information with arbitrary logical elements from various kinds of documents such as business letters with a single column and newspapers with a multiple column, the documents including words written in many languages, some lines having together many logical elements, or some characters in one line rotated through 90 degrees, and to input the desired information into a computer in an arbitrary data form.
According to the present invention, there is provided:
apparatus for reading a document image including a plurality of character groups, each of the character groups having a logical element which indicates a particular logical meaning of the character group, the apparatus comprises:
means for memorizing keywords with relation to the logical element of the document image;
means for storing image data segments of the document image, each of the image data segments corresponding to each of the character groups having the logical element;
means for selecting one of the image data segments;
means for sequentially identifying each of the character groups in the image data segments with a corresponding one of the keywords extracted from the memorizing means; and
means for sequentially applying one of tags to each of the character groups, the tags denoting a particular logical meaning related to one of the keywords.
Moreover, according to the present invention, there is provided: apparatus for reading a document image including a plurality of character groups, each of the character groups having a logical element which indicates a particular logical meaning of the character group, the apparatus comprises:
means for memorizing keywords with relation to the logical element of the document image;
means for storing image data segments of the document image, each of the image data segments corresponding to each of the character groups having the logical element;
means for selecting one of the image data segments;
means for sequentially identifying each of the character groups in the image data segments with corresponding one of the keywords extracted from the memorizing means;
means for sequentially applying one of tags to each of the character groups, the tags denoting a particular logical meaning related to one of the keywords;
means for storing identification data which has both of every one of the character groups and every one of the corresponding tags in each of lines, the line composed of the character group and being a row of words, numbers, or other symbols on a page of writing or in print;
means for, when the line includes only one tag of the tags as a result of the identification data and when the one tag is applied to a character group that are arranged in a part of the line, replacing the one tag with another tag applied to a new character group that are arranged in all of the line, the another tag having the same logical meaning as the one tag;
means for, when each of the lines includes two or more tags of the tags as a result of the identification data, repeatedly dividing the each of the lines till each of divided lines includes only one tag;
means for, when logical inconsistency exists in an arbitrary combination of some of the tags, as a result of the identification data, replacing each of the tags applied to the each of lines and each of adjacent tags adjacent to the each of the tags, with each of different tags from the tags and each of different adjacent tags from the adjacent tags respectively, the each of different tags and the each of different adjacent tags having each of appropriate logical meanings respectively so that the combination of some of the different tags and the different adjacent tags is logically consistent; and
means for, when it is impossible to apply the each of the tags, applying an appropriate tag of the tags to the each of lines, the appropriate tag being estimated with reference to tags applied to the lines adjacent to the each of lines.
Furthermore, according to the present invention, there is provided:
a method for reading a document image including a plurality of character groups, each of the character groups having a logical element which indicates a particular logical meaning of the character group, the method comprises:
memorizing keywords with relation to the logical element of the document image;
storing image data segments of the document image, each of the image data segments corresponding to each of the character groups having the logical element;
selecting one of the image data segments;
identifying sequentially each of the character groups in the image data segments with a corresponding one of the keywords extracted; and
applying sequentially one of tags to each of the character groups, the tag denoting a particular logical meaning related to one of the keywords.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.