1. Field of the Invention
The present invention relates in general to a method of sorting out candidate characters in a character recognition system which recognizes characters utilizing a statistical manner, and more particularly to a method of sorting out candidate characters in a character recognition system which is capable of sorting out the candidate characters quickly and accurately by extracting characteristics of the characters on the basis of run-lengths to recognize combination type characters such as Hangul, Chinese characters and etc..
2. Description of the Prior Art
In a method of sorting out candidate characters in a statistical character recognition system in accordance with the prior art, the primary characteristics are obtained with respect to all of characters and the characters are then classified into a tree structure on the basis of similarities of the primary characteristics. Thereafter upon input of a character to be recognized, the primary characteristic of the input character is obtained and candidate characters for the input character are sought for along pre-stored trees on the basis of the primary characteristic. That is, characters (or character group) on the tree position to which the input character to be recognized corresponds are determined as the candidate characters for the input character.
As the method of obtaining the primary characteristics with respect to all of the characters and classifying the characters into the tree structure on the basis of the similarities of the primary characteristics, there are well-known a character classifying method employing meshes, a character classifying method employing a parallel characteristic on the basis of distances to pixels of characters and a character classifying method employing a time/frequency transformation.
Referring to FIG. 1, there is illustrated the character classifying method employing the meshes in accordance with the prior art. As shown in this figure, each characters is covered with n.times.n dimensional, lattice-shaped rooms, which are called the meshes. The number of pixels (for example, black pixels) of each of the characters which are included in the individual meshes is calculated. The calculated values are adopted as the primary characteristics of the characters. Similarities of the primary characteristics of the characters are obtained in the unit of the corresponding meshes. The characters are then classified into a tree structure as shown in FIG. 2, which is formed on the basis of the similarities of the primary characteristics.
For example, the n.times.n dimensional meshes are numbered and each characters i s covered with the numbered n.times.n dimensional meshes. The similarities of different characters are calculated on the basis of the primary characteristics every the meshes of the same number from 1 to NN (in the case of n.times.n dimensions). The characters are grouped into those of the same class on the basis of the similarities. The characters grouped into the same class are re-classified into those from 2nd to N.times.Nth, thereby resulting in forming an enormous tree structure as shown in FIG. 2. The method of calculating the similarities of the characters every the meshes of the same number is various and is performed mainly 15 utilizing the Fisher's law, the Euclidian distance, the Mahalanobis distance and etc..
Thereafter upon input of an unknown character, the input character is covered with the meshes, which are numbered, and then the primary characteristic of the input character is extracted according to the number of pixels of the input character in the meshes. The tree structure previously defined as shown in FIG. 2 is searched on the basis of the primary characteristic of the unknown character, for a tree position to which the unknown character belongs. When the tree position which is the most similar to the unknown character is extracted, characters (character group) on the extracted tree position are determined as the candidate characters for the unknown character.
Alternatively, a small number of the most definite characteristics may be selected instead of using the n.times.n characteristics in all, so that the trees can be reduced in number. This has the effect of making the classification of the characters possible at a high speed.
Referring to FIG. 3, there is illustrated the character classifying method employing the parallel characteristic on the basis of the distances to the pixels of the characters. As shown in this figure, the distances from the left side of a box circumscribing each of the characters to the first pixels (for example, black pixels) of each of the characters are extracted in the unit of line as a classifying characteristic (parallel characteristic). Therefore, the above method is that classifies the characters on the basis of the classifying characteristic extracted in the above manner. In this method, measuring points are selected on the character circumscribing box at a constant interval with respect to one another and straight lines are drawn from the measuring points on the character circumscribing box to the first pixels of the character. The lengths of the straight lines are adopted as the primary characteristics of the characters.
Referring to FIG. 4, there is illustrated the character classifying method employing the time/frequency transformation. As shown in this drawing, the above method emphasizes a characteristic which the characters each possesses, by transforming a time domain into a frequency domain at a two-dimensional plane utilizing the Fourier transformation or the Laplace transformation. Therefore, the above method is that classifies the characters on the basis of the characteristics of the characters emphasized by the above-mentioned transformation.
However, the character classifying method employing the meshes encounters a problem of finding the number of the most effectively dimensional meshes divided to cover one character. Although the more number of the divided meshes results in a more accurate classification of the characters, a more time loss is caused due to an increase in the number of the meshes for extraction of the similarities. This makes the character recognition speed of the system fall. For this reason, 8.times.8 (64) dimensional meshes are mainly used in most of Hangul cases and 16.times.16 (256) dimensional meshes are mainly used in most of Chinese characters cases. If the dimension is high as mentioned above, the meshes are such increased in number that make the character recognition speed of the system fall.
The character classifying method employing the meshes has another disadvantage, in that the characters may be misrecognized in the case where they are subject to distortions, since the tree structure is previously defined and the candidate characters for the characters are then determined on the basis of the tree structure. That is, in the case where a certain character is subject to a distortion and, therefore, the characteristics of one or more meshes of the character exceed a critical value, the tree structure is searched for an extraordinary tree position. For this reason, the search falls into local minima, resulting in the misrecognition of the character.
Also, the character classifying method employing the meshes has a further disadvantage, in that much time is required in process since the more highly dimensional meshes result in an increase in the number of the characteristics. Moreover, in the case where the character circumscribing box is formed of a different size from the original size of an input character because of a noise mixed at the periphery of the input character, possibility of misclassification of the input character is high due to variation at the positions of the meshes.
The character classifying method employing the parallel characteristic as shown in FIG. 3 is advantageous, in that the characteristics are reduced in number as compared with those in the character classifying method employing the meshes. This has the effect of making the tree structure simple and reducing the character recognition processing time. However, the character classifying method employing the parallel characteristic is disadvantageous, in that branches of the character classifying characteristic are small in number. This results in an inaccurate classification of the characters. Also similarly to the character classifying method employing the meshes, since the character circumscribing box is varied in size when a noise is present in the character, a bottleneck is caused in positioning the character circumscribing box.
The character classifying method employing the time/frequency transformation is desirable in that the characteristics of the characters are definitely classified, but has the disadvantage of requiring much time in the transformation. Also in the case of combination type characters such as Hangul, Chinese characters and etc., the characteristic positions cannot definitely discriminated due to the structural problem of the characters.