The present invention relates to an image processing apparatus and method applied to image processing devices such as an OCR (optical character recognition) device, a copying machine and a facsimile, for dividing an input image into image areas such as document areas, figure (geometric structure) areas and table areas.
Conventionally, image processing devices are known which divide input image data into image areas such as document areas, figure areas, picture (photograph) areas, and table areas, and process the image data by image area using a type of processing corresponding to the type of area. For example, when an original image is a mixture of a document and a photograph, the document area of input image data is converted into character codes through OCR processing, and the picture area is subjected to compression.
In this area-based image dividing, dotted lines and broken lines are extracted by:
(1) finding a figure that seems like a part of a broken line, and searching for a possible next part on the line extended from the figure; PA0 (2) utilizing the angle and distance between short lines; PA0 (3) grouping isolated pixels and extracting a dotted line/broken line based on the straight line connecting the first and last pixels, the distance and height between the pixels.
However, these methods require a long calculation time, and merely provide extraction with low precision regarding dotted lines. Above all, it is impossible to detect special-shaped broken lines such as lines 1001 and 1002 in FIG. 11. In use of these methods in the area-based image dividing, a conceivable problem is that character recognition processing may be performed on image areas due to incorrect dotted/broken line extraction.
The area-based image dividing is necessary for image processing devices such as an OCR device, a copying machine and a facsimile device, to process document areas, figure areas, table areas and the like with high precision.
Image dividing is performed by, e.g., a spectrum analysis method that divides input image data into various areas by analyzing Fourier spectrum of the image data, or a projection method disclosed in Japanese Patent Application Laid-Open No. 64-15889, that alternates detection of vertical projections and detection of horizontal projections of an image, and divides the image into areas based on peripheral-pixel distribution information.
The divided image areas are labeled by, e.g., a method that tracks the outline formed by the connected pixels.
However, the conventional area-based image dividing methods require a long calculation time, and increase storage area due to pixel-based processing. Further, with respect to an image having a complicated area structure, area dividing is realized with low precision.
Further, in the above methods, if the resolution of input image data is low, dividing precision is degraded, while if the resolution is high, labeling processing time is prolonged and storage area is increased.