1. Field of the Invention
The present invention relates to a ruled line elimination apparatus and method for eliminating the ruled line from an input image of a form in which the ruled line contacts or overlaps character.
2. Description of the Related Art
An optical character reader (OCR) can read characters in an entry item printed by dropout-colour on a form. Recently, various kinds of forms are easily created, and the form in which the entry item is printed by non-dropout-colour is used. Such forms have many uses including government forms, bank forms, or various applications. The forms may contain character boxes or ruled lines in known locations. A person filing out the form should enter all characters within the ruled lines. Sometimes the characters are not all within the ruled line.
Therefore, the need to recognize the characters in the entry item of non-dropout-colour increases. An OCR which recognizes the characters in the entry item of non-dropout-colour must eliminate the ruled line of the entry item and extract the characters only when the characters overlap the ruled line. One method is disclosed in Japanese Patent Publication (Kokoku)No.6-97471. In this method, as for input image in which a character "5" contacts with the entry frame or ruled line as shown in FIG. 1, the entry frame is cut along a boundary line inside the entry frame and the entry frame is eliminated as shown in FIG. 2A.
Alternatively, the entry frame is cut by extending a line outside the boundary line from the cut begining point to the end point, and the entry frame is eliminated as shown in FIG. 2B. However, in this method, if the entry frame of one side overlaps the character at three parts (or more than three) as shown in FIG. 3A. the entry frame is not eliminated completely as shown in FIG. 3B. In this case, the character is extracted while the ruled line is added to the character.
In this method, if the character "8" overlaps plural sides of the entry frame as shown in FIG. 4, plural images of the character in which the rule line is eliminated are extracted as shown in FIG. 5. These are image candidates to be recognized. However, character recognition includes a large number of calculation quantity. Therefore, a large number of calculation time is necessary to recognize many candidates.
Furthermore, in FIG. 3B, it often happens that the character "2" is mistakenly recognized as "8" because the ruled line is added to the character. In FIG. 5, in case plural image candidates are recognized, possibility for mistaken-recognition increases.
On the other hand, as a method to eliminate the ruled line from the multivalued image, the character and the ruled line are binarized to separate each other by threshold using density difference between the character and the ruled line, and pixels corresponding to density of the ruled line are eliminated from the multivalued image. This method is disclosed in "An Automatic Threshold Selection Method Based on Discriminant and Least Squares Criteria", '80/4 vol.J63-D,No.4, pp.349-356, THE TRANSACTIONS OF THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS. However, in this method, in case the density difference between the character and the ruled line is small value, the character is not separated from the ruled line completely.