1. Field of the Invention
The present invention relates to a document processing device, a document processing method, and a recording medium. The present invention more specifically relates to a technique of extracting character strings such as a heading from a document in electronic form.
2. Description of the Background Art
Image processing devices identified by names such as complex devices or MFPs (multifunction peripherals) are capable of converting document data acquired by scanning an original into a certain file format such as PDF (portable document format), and outputting the converted document data. For such data output, character strings such as a title and a heading of each chapter and each item contained in the document may be extracted, and the document data may be output with the extracted character strings added thereto as bookmark data. This enhances the convenience in the use of document data.
An example of such a conventional technique of automatically extracting character strings is disclosed in Japanese Patent Application Laid-Open No. JP 2008-305088 A. In this conventional technique, content regions such as character strings contained in a document are extracted by rows from an image of the document, and all the extracted content regions are classified into at least one group. Then, based on the respective positions of the content regions of each classified group in the document image, the suitability of the group as a bookmark is evaluated. Based on a result of the evaluation, at least one group is selected from a group has the highest level of suitability is selected as a group to be a target of generation of bookmark data. Thereafter, based on the attribute information of content regions of the selected group, bookmark data indicating the respective positions of the content regions of the selected group in the document image is generated. So, this conventional technique is capable of automatically extracting character strings such as headings contained in a document.
According to a conventional technique disclosed for example in Japanese Patent Application Laid-Open No. JP 2008-305089 A, generation of bookmark data allows a user to easily find the respective positions of document contents throughout the document and the respective types of the document contents.
In the above-described conventional techniques, character strings such as headings contained in a document are extracted under a predetermined condition. This may result in extraction of a character string that fails to satisfy a user. Correcting a condition under which character strings are extracted from document data is required in order to avoid extraction of the unintended character string. However, the conventional techniques fail to efficiently correct the condition.
An optimum condition differs for each type of document depending on its respective setting, such as a documentary form. At the same time, a user can freely make the setting of a document such as a documentary form, so it is difficult to define an optimum condition in advance that matches all documentary forms. For this reason, when a character string failing to satisfy a user is extracted as a bookmark, it is desirable that the condition is to be corrected with a relatively simple operation.