In the prior art attempts of extracting a title from documents, one of the significant factors is whether the documents are in a predetermined format. If input documents have a predetermined layout, the position of a title area or a minimal circumscribing rectangle is used to extract information on title. On the other hand, if the input documents are in a free form or do not have a rigid layout, one way is to manually extract a title.
To efficiently extract a title from documents in a free form, Japanese Patent Laid Publication Hei 9-134406 and Japanese Patent Hei 5-274471 disclose prior attempts to use layout features of the documents. Japanese Patent Hei 5-274471 discloses a priority scheme in selecting a title-containing area. The priority scheme includes (1) an area's relative closeness to the upper left comer in a document image, (2) a number of characters contained in an adjacent area, and (3) a number of characters in the area and the adjacent area. The above priority is determined by using minimal circumscribing rectangles, and no character recognition is performed. Japanese Patent Laid Publication Hei 9-134406 discloses a selection of a title based upon a projection histogram of a document image. Regions of the projection histogram are compared to a pair of predetermined threshold values to determine a title area. Although the above prior attempts provide some success in determining a title in documents, the accuracy still remains a problem.