The present invention relates to a character detecting method which detects a character line from a document image when a document written on paper is directly read and its characters are recognized in, for example, an optical character reader (OCR), and relates to a character recognition apparatus using the character detecting method.
For example, in a character recognition device which reads a handwritten or printed document through an image reader such as a scanner, and takes out document information by recognizing characters from the read document image, a process for allocating characters or a character line from the document image is required.
An example of the method for detecting a character line is a method for obtaining rectangular information which is circumscribed with a group of black pixels on the image and the rectangular information is integrated so that a character line is detected. In this character line detecting method, an image of a document read by a scanner, etc. undergoes a differential and binary process so that a binarized image is obtained. Moreover, an image area where black pixels are bound on the binarized image is obtained, and rectangular information which is circumscribed with this area is obtained. Thereafter, circumscribed rectangular areas are integrated based on shapes and positions of the rectangular areas so that a line candidate area is allocated.
As to the line area candidate allocated in this manner, the circumscribed rectangular information is not accurate because, for example, a document read by a scanner, etc. is stained, a line is slanted, a size of characters in one line is not uniform. As a result, one line is occasionally allocated as a plurality of divided lines, or a plurality of adjacent lines are occasionally allocated as one line.
Therefore, in the conventional line detecting method, projection information within the candidate areas is calculated, and line candidate areas are integrated and divided. Namely, when a binarized image in the line candidate area is allocated, the projection information in a line direction is calculated according to the following procedure.
First, each pixel in the line area candidate is evaluated, and when the picture element is a black pixel, a projection value, which is in a position where the pixel was projected in a line direction, is increased so that the projection information is updated.
When the evaluation of all the pixels is ended, the projection information in the line direction is obtained. For example, in the case where a black pixel on a line image 101 shown in FIG. 3, is projected in the line direction, a result 103 is obtained.
A judgment is made based on the projection information in each line candidate area obtained in such a manner as to whether or not a separate position of the line candidate is detected or each line candidate area is integrated, and a line is finally detected.
However, in the conventional line detecting method, it is necessary to make a judgment as to whether or not pixels in the line candidate areas are, one by one, black pixels, and if they are black pixels, to add them to projection information. For this reason, many calculations are required for one line, and dedicated hardware is required for high-speed calculations.
In addition, in the case where line candidates are divided and integrated many times, processing time increases.
Furthermore, since characters which belong to a line are treated as image information, accuracy of the detection of a character line is low, for example, in the case where characters are written at a slant in a document.