1. Field of the Invention
The present invention relates to an image processing apparatus and an image processing method for extracting a line segment included in an image.
2. Description of the Related Art
In some models of network multifunction peripherals, when image recognition processing such as optical character recognition (OCR) or image-area separation is executed for image data input to an image scanner or the like, a recognition target character or the like may not be precisely recognized or the recognition processing may take a long time where the character overlaps with a line segment of a ruled line, closing line or underline.
A technology for extracting and eliminating a line segment disturbing character recognition has been conventionally proposed. In the prior art, an area of a ruled line is predicted based on image data, and black run data in the direction vertical to the predicted ruled line is extracted from the vicinity of the predicted ruled line area. Then, a regression line passing through the center of the run data is figured out, and the run data adjacent the regression line is deleted by saving the run data having a long distance from the regression line. With this technology, it is possible to precisely eliminate a ruled line overlapping with characters without deleting the characters by mistake.
However, it has been pointed out that the above described prior art may be sometime unable to extract a line segment precisely.
The above-described prior art has taken the following method as an example: document alignment is executed using an image of an original format in which any handwriting has not been inserted, and an area of the target image where characters are supposed to be inserted is predicted based on the area specified with location coordinate values. Then, it is determined that any ruled line exists in the vicinity of the predicted area. In this method, every time image data is input, document alignment needs to be executed with image data of a blank original format. Thus, in a case where an original (printed material) is not in a regular format, or in a case where a blank original format has not been pre-stored in an image processing apparatus, it is difficult to predict an area of a ruled line.
Further, in a case where a plurality of ruled lines are not parallel lines, that is, intersect each other like a closing line or lines of a table, black run in the direction perpendicular to one intersecting ruled line could be data of the other ruled line, which makes it difficult to distinguish a ruled line and characters which need to be saved.