The present invention relates to image processing, and more particularly to a method and a unit for binarizing images having a complex background or uneven brightness. The present invention also relates to a method and a unit for recognizing characters suitable for recognizing distorted characters and for separating each one character from an inputted character string pattern in a character recognition unit which has characters and the like as an input and recognizes the input.
In recognizing characters by using an image processing unit, generally an image data picked up by using a television camera or the like is compressed into binary data of "0" and "1" at a certain threshold value level and then the binary data is processed. For example, characters are expressed in a binary level of "1" and a background is expressed in a binary level of "0", and the data of the "1" level is processed to recognize the characters.
When a relatively clear image is to be recognized such as the case where characters written in the color black on white paper are to be recognized, it is possible to easily determine the above-described threshold value in advance (such as, for example, an average density is set as a threshold value). However, in order to cope with a more complex application, a satisfactory performance cannot be obtained in many cases just by such a simple binary processing as described above. The following are some of the examples of complex applications.
(1) extracting characters from a corrugated card board box having patterns
(2) extracting characters from an outdoor billboard
(3) extracting characters from a printed substrate
From the objects as described above, it is not possible to extract characters by a simple binary processing method because the background of the characters is complex and there are extreme variations in the brightness on the objects. It is therefore necessary to obtain a binary method for satisfactorily extracting characters from such objects as described above. As one of conventional examples of such a method, there has been proposed a binary extracting method in the "An Investigation of a Method of Extracting Characters from a Scenary Image" written by Mr. Ohtani, (lecture theses for the national meeting of the Information Processing Society of Japan; March, 1986).
As shown in FIG. 2a to FIG. 2c, one picture of a variable-density image 140 inputted from a television camera or the like is divided into a plurality of sub-blocks 141, and an optimum binary threshold value level .theta.ij is determined in each of the sub-blocks 141 as shown in FIG. 2b. In this case, the threshold value .theta.ij is taken as a two-class issue for dividing each image into two classes of white and black within each of the subblocks 141, and the threshold value is the value at which a dispersion between the two classes is a maximum. Further, in order to maintain the continuity between the sub-blocks 141, picture elements are interpolated by using the respective threshold values .theta.ij as shown in FIG. 2c. As a result, a threshold value .theta.x,y is obtained. In other words, the threshold value .theta.x, y for each picture element is determined, thereby to express the input image in binary data.
The above-described conventional technology has problems. In the former case, a density histogram is used to determine the threshold value .theta.ij within each sub-block (that is, the frequency of each density level is obtained within each subblock), and a second order image data is converted to a first order data. Accordingly, no consideration is given to positional information of brightness, so that it is not possible to determine an optimum threshold value by this method.
Further, it takes an extremely long processing time to obtain the .theta.x, y, making it impossible to extract characters in real time.
When an input image is differentiated, non-edge portions become almost "0" and edged portions become a variable-density image having values corresponding to an intensity of the edges. When such an image is to be expressed in binary values, the above method is not suitable to perform optimum edge extraction.
According to a conventional character recognition unit shown in FIG. 13, when a character string pattern 41 is obtained from an image pick-up unit or the like, each character is separated by a single character separation circuit 42 and each character is recognized by being compared with dictionary data 44 in a recognition circuit 43. In recognizing the characters, when a shape pattern is used as dictionary data a magnitude of proximity between the dictionary data and the character data is obtained. Then, the character data is allocated to a category having the highest magnitude of proximity (which is usually called pattern matching) to obtain the character. Further, when "characteristics" of a number of holes and a number of dots in a character are held as dictionary data, a decision tree of these "characteristics" is followed to recognize the character.
There are methods for recognizing characters in high precision when the character string is slanted as shown in FIG. 15, as disclosed in JP-A-59-66783 and the JP-A-1-156887. According to these methods, projection patterns of a character string in both horizontal and vertical directions are obtained to obtain an approximate slope .phi. of the character string. Based on this slope .phi., the direction of scanning the character string direction is gradually changed, and the slope .phi. of the character string is determined to be the one at which the change of a number of segments obtained (that is, the number of characters contributing to the projection distribution) and a projection value becomes the largest.
As shown in FIG. 15, the slope .phi. of the character string obtained by this method corresponds to an angle rotated from the right position when each character is looked at. Thus, the character string coincides with the dictionary pattern when the character string is rotated adversely by the angle .phi. to return the character string to the right standing state. An image picked up from a car number plate, however, has a character string pattern 41, for example, as shown in FIG. 14. This character string pattern 41 is the pattern of the character string pattern 41 in FIG. 13 which is looked at from below the left slant angle at which the horizontal direction is not changed and the characters are distorted to decline to the left side. When the magnitude of the distortion is constant, it becomes possible to recognize the characters if there is dictionary data for each distorted character. However, this distortion is different depending on a positional relationship of a camera for picking up the number plate relative to a car. Further, the distortion is also different depending on a position at which the number plate is fitted to the car. Accordingly, the magnitude of distortion is different for each car so that it is impossible to have dictionary data for each of the individual distortions. Therefore, unless these distorted characters are corrected, it is not possible to accurately recognize characters as the magnitude of proximity in a proximity calculation circuit 43 is low even if pattern matching is tried between a distorted character 44a and a dictionary pattern 44b as shown in FIG. 14. This distortion does not disappear even if the whole character is rotated by the slope angle .phi. as is done in the prior art, and no accurate recognition of characters can be performed accordingly.
A character recognition unit for recognizing printed characters or stamped characters has functions as shown in FIG. 25. A pattern to be recognized is picked up by a television camera 10 or the like, and an obtained analog signal is quantized into about seven to eight bits by an A/D converter 11. These are changed into binary values by a binary circuit 18 and the result is stored in a frame memory 12. From this recognized pattern stored in the frame memory 12, each one character is separated by a character separation circuit 30 and is compared with a dictionary pattern stored in advance, for example, in a dictionary pattern memory 28, by a recognition circuit 27. A pattern which is most similar to the dictionary pattern is outputted as a result of recognition.
In such a recognition unit as described above, when the quality of characters is poor, characters may be in contact with each other in many cases in the above-described binary processing stage. Therefore, in order to separate contacted characters from each other, the character separation circuit 30 is used and a projection pattern (shown in FIG. 26b) is obtained in the direction perpendicular to the line of the character string shown in FIG. 26a. The characters are separated at a position having the least projection pattern and each character is recognized. However, according to this method, erroneous separation of characters occurs frequently and the rate of recognition is extremely low.
Methods disclosed in the JP-A-61-72373 and the JP-A-63-216188 provide an improvement for the prevention of erroneous separation of characters in the above simple processing. However, both of these methods use a projection pattern and therefore, there are still such cases where characters are separated at erroneous positions when the characters are connected as shown in FIGS. 27a and 27b.
Since the projection pattern is for counting a number of picture elements in the vertical direction, it is difficult to determine at what position of "A" the two characters in FIG. 27a are to be separated because the number of picture elements in the vertical direction is all the same at the portion of "A" in the case of FIG. 27a. Further, in the case of FIG. 27b, it is difficult to discriminate between the projection pattern of the "-" (hyphen) (portion "A") and the projection pattern of the upper portion of "7" (portion "B"), so that a position of separating the two characters may be selected wrong. Further, no consideration has been given to the method of solving the case where sizes of characters and distance between characters are different. (Distance between characters is regarded as being constant.)