Field of the Invention
The present invention relates to an image processing apparatus capable of clipping out an object written in a digitized document and to an image processing method.
Description of the Related Art
There are mainly two methods for extracting a desired area from a document to digitize the desired area.
In the first method, an operator designates a desired area to be extracted from an input image on each occasion which is acquired by reading a document by a scanner.
For example, the document is read by the scanner, and the resultant input image is displayed on a display. Then, the operator designates a desired area from the displayed input image using, for example, a mouse.
In the second method, the operator creates in advance a template for defining position information of a rectangle, so that a rectangular area defined by the template is applied to an input image as it is and then extracted. In this case, the rectangular area with a position and size that are defined by the template is extracted from the input image. This saves an operator from having to designate an extraction area one by one.
In the first method in which the operator designates the desired area in the input image, or the second method in which the operator creates the template in which the position information about the rectangle is defined, the operator can determine an area to be extracted. That is, only one area within a designated block in an input image can be selected in a pinpoint manner.
In the first method, however, the operator needs to designate a desired area on each occasion. The operator has to designate each of desired areas from input images acquired by reading many documents. In such a case, designating the desired areas is time consuming.
In the second method using the template, a desired area to be extracted from the input image and an area that is set in the template may differ in position or size. In such a case, an area in which the desired area is chipped off may be extracted.
For example, a length of text written in a desired area to be extracted may differ depending on document, and a desired area to be extracted may be designated narrower due to insufficient length of text used when a template is created. In such a case, an area in which the desired area is missing is extracted from the input image.
Japanese Patent Application Laid-Open No. 11-203491 discusses a method for solving such a problem of extracting an area in which a desired area is chipped off in a case where the desired area to be extracted and an area that is set in a template differ in position or size. First, a position, size, and attribute regarding an area about an input image are retained as template information. Next, a scanner reads a document to acquire an input image, and block areas are extracted from the input image to determine attributes or the respective extracted block areas. Among the extracted block areas, a block area including at least one portion that overlaps with an area indicated by the template information, and coincides with an attribute that is set in the template information is extracted. Such processing eliminates the problem of extracting an area in which a desired area is chipped off even if the desired area is designated narrower than the area designated in the template.
However, since the method discussed in Japanese Patent Application Laid-Open No. 11-203491 is dependent on the block area extraction processing, determination of the area is affected by a designated block recognition result. For example, in a case where only a character block arranged in the middle among three character blocks arranged side by side needs to be selected in a pinpoint manner, all of the three character blocks may be selected due to the block area extraction processing. Consequently, when the operator needs to select only one area within a desired designated block, pinpoint selection of one area is difficult.