1. Field of the Invention
The present invention relates to an image processing method and an image processing apparatus, and particularly to an image processing method and an image processing apparatus for extracting a heading region from an image of a document.
2. Description of the Related Art
There has conventionally been a method of extracting, from character string element regions in the whole image of a document, headings (title heading, section heading) by means of a common extraction rule that is based on a feature quantity.
For example, Document 1 (Japanese Laid-Open Patent Publication No. 11-238096) describes extraction of listed elements (corresponding to headings) from all rows on the row-by-row basis that are included in an image of a document, based on a specific feature quantity.
The technique of Document 1, however, is applied to all rows included in a document for extracting headings by means of a specific feature quantity from the whole document defined as one range. A resultant problem has therefore been that the style of a document from which headings can be appropriately extracted is significantly restricted.
In other words, in the case of a document style including background characters and repeatedly appearing small character string elements (such as itemized elements, header and footer added to a page or slide to be displayed), heading regions cannot be appropriately extracted if the same rule is applied to the whole document without exception.
Conventionally, in order to appropriately extract headings of a plurality of different levels from a document of such a style, it is necessary to add a process of extracting a feature quantity and to sophisticate the rule. The addition of a feature quantity extraction process and the sophistication of the rule, however, add the cost and increase the processing time, and thus fail in practice.