The present invention relates to a method for processing a document image; and, more particularly, to a method for analyzing structure of a treatise type of document image in order to detect a title, an author and an abstract region and recognize the content in each of the regions.
There are many techniques for processing a document image to construct a database system. One of techniques is a document image structure analysis(see, ChunChen Lin, xe2x80x9cLogical Structure Analysis of Book Document Image using Contents Informationxe2x80x9d, ICDAR 97, Vol. II, pp. 1048-1054, August, 1997). According to the document structure analysis, a character recognition process is performed on a table of contents of a book so that the entire logical structure of the book is analyzed. Since, however, in order to utilize this technique, there must be provided a table of the contents of book, therefore, it is impossible to construct a database system of treatise typed of document image.
In order to construct a database system providing a portion of or an entire of treatises contained at each of journal in a form of document image or a hypertext file format, a table of contents having a title, an author and an abstract information has to be generated.
Hitherto, the table of contents having a title, an author and an abstract information is made by a human being. One reason is that a multi-language recognition is very difficult. Generally, the title and the author are represented on two languages. Another reason is that each position of the title, the author and the abstract is different according to each of the journals. Thus, it is difficult to detect of the position thereof. The other reason is that there is not a distinct difference between the title and the author.
Therefore, it is required to automatically detect title, author and abstract regions and recognize the content in each of the regions so as to make a table of the contents of the treatise in the journals.
It is, therefore, a primary object of the invention to provide a method for automatically detecting title, author and abstract regions in document image and recognize the content in each of the regions so as to make a table of the contents of the treatise in the journals.
In accordance with the present invention, there is provided a method for analyzing a structure of a treatise type of document image to make a table of contents having a title, an author and an abstract information, comprising the steps of: dividing the document image into a number of regions and classifying the divided regions into text regions and non-text regions according to attributes of the regions; selecting candidate regions representing an abstract and an introduction, extracting word regions from the candidate regions, and determining an abstract content portion; separating the title and the author using the basic form and the type definition representing an arrangement of each of journals; and recognizing the content of the separated regions to generate said table of contents.
In accordance with another aspect of the present invention, there is provided a computer readable media containing the program, the program having functions of: dividing the document image into a number of regions and classifying the divided regions into text regions and non-text regions according to attributes of the regions; selecting candidate regions representing an abstract and an introduction, and finding word regions from the candidate regions to determine the position of an abstract content portion; separating the title and the author using the basic form and the type definition representing an arrangement of each of journals; and recognizing the content of the separated regions to generate said table of contents.
These and other features of the present invention are more fully shown and described in the drawings and detailed description of this invention. It is to be understood, however, that the description and drawings are for the purpose of illustration and should not be read in a manner that would unduly limit the scope of this invention.