1. Field of the Invention
The invention relates to image processing method and apparatus which can correctly extract desired information from input image information and can recognize the extracted partial information.
2. Related Background Art
A conventional apparatus which can recognize characters from image information is constructed as shown in FIG. 12. Reference numeral 41 denotes a scanner for converting a character image on an object to be read into an analog electric signal; 42 a preparation signal processor to binarize the analog signal and to eliminate noise from the signal; 43 a character extractor to separate a character train into individual characters; 44 a feature extractor for extracting a feature which is peculiar to the particular character being examined and for producing a feature vector in accordance with a predetermined algorithm; 45 a recognition dictionary to store statistics (mean value, distribution, and the like) pertaining to each character type of the feature vector; 46 a comparator for comparing the feature vector obtained from the input character image and the recognition dictionary, thereby selecting the optimum candidate; 47 a word dictionary to store the results of translations of words; 48 a dictionary retrieval unit to extract the corresponding translated word from the recognized character train with reference to the word dictionary; and 49 a display to display the translated word.
The above conventional technique, however, has the following two drawbacks.
(1) What are called blank characters, which are written in white on a black background, cannot be recognized. FIGS. 7 and 10 show examples of such blank characters.
As a conventional technique, there is known a method whereby a ratio of the number of black pixels in the whole image buffer is counted, and when the ratio is equal to or larger than a predetermined value, the characters are determined to be blank characters, those blank characters are inverted from white to black, and, thereafter, a recognizing process is executed. Such a method, however, has a drawback in that it takes a long time to count the number of black pixels, and consequently the whole processing time increases. As shown in an example of FIG. 9, on the other hand, there are cases where, when character lines are very thick, those characters are erroneously identified as blank characters. Or, as shown in FIG. 10, there are cases where, in spite of the fact that the characters are really blank characters, they cannot be so identified because there are a large number of white pixels. There is, consequently, a drawback in that it is not always possible to judge correctly whether given characters are blank characters.
(2) As shown in FIG. 11, when a ruled line exists before a character train to be recognized, the characters of the train (here, the word "take") cannot be extracted.
Hitherto, when the operator tries to extract character information by tracing an outline of the character information by using a certain point of an input image as a start point, so long as a ruled line exists as shown in FIG. 11, the tracing starts from the start point and reaches the edge of the input image, and the tracing ends at such an edge. Therefore, the outline tracing operation doesn't reach the character train "take" and "take" cannot be extracted.