The present invention relates to an optical character reader (OCR) which can accurately recognize and read underlined characters.
The conventional OCR detects and extracts a single character from a character block using projection data arranged in columns (horizontal) and rows (vertical). However, the above method cannot detect, extract or recognize the adjacent characters individually when a plurality of subsequent characters are underlined.
An OCR which eliminates the above drawback is disclosed in U.S. Pat. No. 4,377,803 to Lotspiech et al., for the invention entitled "Algorithm of the Segmentation of Printed Fixed Pitch Documents." In this system, as shown in FIG. 1A, a character block 1 is scanned in the horizontal direction (hereafter referred to as row direction), designated by arrow 3, to form the projection data 5. Furthermore, as shown in FIG. 1A, when an underline 7 is spaced from the character block 1, the character block 1, except the underline 7, is scanned in the vertical direction designated by arrow 9 to obtain vertical projection data 11. The vertical projection data 11 is used to detect, extract, and recognize the respective characters of the character block 1. When the underline 7 contacts the characters as shown in FIG. 1C, the areas of the character block 1, except the portions corresponding to the width of the underline 7, are scanned to achieve the projection data 11 as shown in Fig. lD. Thus, the projection data 11 is used to detect, extract and recognize the individual characters, as described above.
However, in the above OCR system, when the underlined characters have a narrow line space as shown in FIG. 2A, for example 6 lines/inch, the individual characters must be detected and extracted so as to exclude the underline of the character block on the line one line above. Such a way of processing causes the system construction to be complicated.
Furthermore, a system in which the underline is obtained by using the horizontal projection data (histogram) also has problems. For example, when the underlined characters show, as shown in FIG. 3A, those parts of the characters which are below the horizontal broken line, the position of the upper end of the underline are undesirably extracted. Consequently, the resultant characters, which are incomplete as shown in FIG. 3B, are not recognized, thereby considerably lowering the recognition accuracy.
Another type of OCR is disclosed in U.S. Pat. No. 4,292,622 to Henrichon, Jr. for the invention entitled: "System and Method for Processing Horizontal Line Characteristics in an Image." The disclosed system detects and eliminates the underline to thereby extract the individual characters. However, in this system when the underline contacts the characters, the elimination of the underline is inevitably accompanied by partial extraction of the bottom portions of the characters. This would inevitably reduce the accuracy of character recognition.
An OCR of the so-called pattern matching type is known. In this type of an OCR, a constant size of area, including character block, are extracted from the pattern memory and compared with the reference data for coincidence. However, the pattern of any character whose center is close to an underline may undesirably be extracted together with the underline. This would inevitably reduce the accuracy of character recognition.