In recent years, it has been proposed that various optional functions involving the processing of input image data as digital signals be applied to digital image processing apparatuses such as digital color multifunction printers.
An example of the optional functions is an OCR function of extracting a character image from a document image and converting the extracted character data into text data. Generally, the OCR function is performed by performing a segmentation process on a document image, detecting regions of concatenation of edge-detected pixels extracted as a feature of a character portion, i.e., runs of edge-detected pixels extending in main scanning and sub-scanning directions within the document image and thereby extracting a rectangular region considered to be a character, analyzing a distribution of edge-detected pixels within the extracted rectangular region and thereby identifying the character, and converting the character into a character code assigned to that character. For this reason, the process of extracting a rectangular character portion has a decisive influence on the precision of the OCR process.
It should be noted here that the conventional process of extracting a rectangular character region presumes, basically, that there are no run of edge-detected pixels extending in the main scanning or sub-scanning direction between characters in the document image, i.e., that there is no character joined to another.
However, in cases where the document image contains a ruled-line image such an underline drawn under a character portion, a line surrounding a character portion, or a line extending in the main scanning and sub-scanning directions, the ruled-line image is judged as a run of edge-detected pixels as with a character or image portion. Therefore, a landscape or portrait image area constituted by a plurality of characters based on the ruled image is extracted as a single rectangular character area in the process of extracting a rectangular character area. This causes a remarkable decrease in precision of a character analysis process in the image area, thus causing a remarkable decrease in precision of the OCR function.
Accordingly, the OCR function requires a ruled-line removal process (ruled-line extraction process) as a pre-process preceding the process of extracting a rectangular character area, thus requiring that the precision of the ruled-line extraction process be improved for the purpose of enhancing the precision of the OCR function.
Another example of the optional functions is a document matching process. For example, Patent Document 1 discloses a technique for extracting ruled-line information from a document image and performing a matching process on a ledger sheet or the like with use of the extracted ruled-line information. Even such a technique for performing a matching process with use of ruled-line information requires that the precision of the ruled-line extraction process be improved for the purpose of enhancing the precision of the matching process.
It should be noted, for example, that Patent Document 2 discloses, as a technique relating to the ruled-line extraction process, a technique for scanning a document image in a first direction, detecting as a “first ruled line” a segment whose length is not less than a threshold value, and then detecting as a “second ruled line” a segment, belonging to a group of segments each extending in a second direction from any point on the segment, whose length is not less than a threshold value.    (Patent Document 1) Japanese Unexamined Patent Application Publication No. 255236/1996 (Tokukaihei 8-255236; published on Oct. 1, 1996)    (Patent Document 2) Japanese Unexamined Patent Application Publication No. 193446/2007 (Tokukai 2007-193446; published on Aug. 2, 2007)    (Patent Document 3) Japanese Patent No. 3779741 (published on Feb. 2, 1996)    (Patent Document 4) Japanese Unexamined Patent Application Publication No. 094805/2002 (Tokukai 2002-094805; published on Mar. 29, 2002)
According to the technique of Patent Document 2, however, only a ruled line connected to a “first ruled line” is extracted as a “second ruled line”; therefore, a ruled line, if any, isolated from a “first ruled line” is not extracted.
Further, according to the technique of Patent Document 2, it is necessary to scan a document image in various directions; therefore, it is necessary for a detecting device to contain a frame buffer memory in which image data of the entire document image is stored. This causes an increase in circuit size of the detecting device.