1. Field of the Invention
The current invention is generally related to a system and a method of comparing a given sample image with a predetermined set of reference images and is more particularly related to a system and a method of improving the above pattern matching by adjusting a feature value of the sample image according to predetermined characteristics of the sample image with respect to those of the corresponding reference image.
2. Related Prior Art
In the field of image recognition, a predetermined set of standard image patterns is stored in a library and used as a standard for recognizing a given image pattern. The given sample pattern is generally compared to the above-described standard images to determine whether an acceptable match exists. The above-described recognition process is substantially the same in character recognition. A predetermined character set is generally stored in the digital format for comparison, and an inputted sample character is compared to the digitally stored characters for recognition.
In prior art character recognition, when an inputted character as a whole is compared to the stored reference character, due to various factors such as the size, font and position shift of the inputted character, character recognition is not easily accomplished. To improve the recognition of a character, the inputted data is generally broken into smaller units each representing a part of the character, and the small data units are compared with the corresponding parts of the reference character data. For example, as shown in FIG. 1, inputted character data is indicated by the dotted line while the stored reference data is shown by the solid line. Although the inputted character has the same size and font for a character "L," the inputted character is shifted to the right. Because of the positional shift, when the inputted data as a whole is compared with the stored data, the inputted character is not recognized as "L." However, when the comparison is made for each of the divided sub-areas based upon a predetermined feature such as the number "on" pixels and the length of edges, such a pattern matching method recognizes the sampled input as "L."
In addition to the above-described improved prior art character recognition method for accommodating the variations in the input data, the inputted data is divided into variable sizes of sub-areas as disclosed in Japanese Patent Nos. 59-202823 and 59-202825. These prior art references also disclose that a predetermined set of identification values is assigned to certain pixel patterns. For example, FIG. 2 illustrates a predetermined set of pixel patterns and associated identification values. Type 1 patterns include five predetermined patterns that share a certain common feature. In dividing the input character data according to the above-described prior art references, the identification values are distributed among the divided sub-areas in a predetermined manner. Under this scheme, the sub-areas are often unequal in size. The above-described identification values are compared between the sample character and the reference character for recognition.
Even with the above-described character recognition method, handwritten characters are not often correctly recognized. This is because handwritten characters have other inconsistencies in addition to the size, font and positional shift. Because of these inconsistencies, to recognize a handwritten character, prior art attempts further included character image standardization. For example, Japanese Patent No. 1-116892 and "Improvement by Non-Linear Normalization" by Hiromi Yamada, Electronic Information and Communication," D-439, Spring Conference 1988, both disclose a technique for normalizing a character image by expanding or compressing certain portions of the input character image. To illustrate normalization, referring to FIG. 3A, a Japanese character " " is used as a reference character. Although FIGS. 3B and 3C show the same character, since they are handwritten, certain lines such as the left vertical line are shorter than the corresponding lines in the standard character of FIG. 3A. Certain other lines such as the bottom horizontal line in FIG. 3B are longer than the corresponding line in the standard character in FIG. 3A. When these lines are locally expanded or compressed, the normalized character becomes substantially similar to the reference character. However, to determine what portion of the character image needs to be modified, the above normalization technique requires additional processing. For example, Japanese Patent No. 3-286390 discloses an interpolation technique using a spline function to normalize a non-linear shape. Furthermore, when an orientation of an input character does not coincide with that of a reference character, to correct the orientation, the normalization technique becomes even more complex.
To simplify the above recognition technique, prior art attempts also included a technique using overlapping variable mesh regions as disclosed in U.S. Pat. No. 4,903,313. According to this technique, a sample image is first divided into sub-regions based upon a predetermined rule. As already discussed above, for example, the identification values are equally distributed among the sub-regions, and the sub-regions are often unequal in their sizes. Then, as FIG. 4 illustrates, the sub-regions are integrated into a smaller number of mesh regions. Consequently, some mesh regions are overlapping with each other. In other words, due to overlapping of the adjacent overlapping regions, certain identification values are used in at least two mesh regions. Because of the redundancy of certain identification values, when a sample character has variations from a reference character, an effect on pattern matching is substantially minimal.
On the other hand, the size of each mesh region differs significantly when handwritten characters are processed. In fact, the size difference of sub-areas is a crucial problem in maintaining an acceptable level of accuracy for character recognition. The size difference of the divided sub-areas tends to cause a deviation in the identification value from the predetermined reference identification value. FIGS. 5A-5D illustrate the above-described problem using a Japanese character meaning a tree. Assuming that FIG. 5A is a reference character while FIG. 5C is a handwritten sample character, the corresponding upper right sub-areas contain a part of the horizontal line of the character. Because the character in FIG. 5C is handwritten, it is not completely symmetrical between the right and left halves about the vertical line. In FIGS. 5B and 5D, the corresponding portions are shown in enlarged views. Although the height of these corresponding portions is the same, the width is different by 4 arbitrary units. Consequently, the corresponding portion in FIGS. 5B and 5D respectively contains seven and nine dark pixels. In terms of the number of patterns, the same pixel may be counted twice depending upon the position of the center pixel of the predetermined pattern. For this reason, the upper right portion as shown in FIG. 5B contains fourteen type 1 patterns (half of which is 301 and the other half 302) and one type 2 pattern 303, as defined in FIG. 2. However, the corresponding portion as shown in FIG. 5D for a handwritten character as shown in FIG. 5C contains eighteen type 1 patterns and one type 2 pattern. Because of the above noted difference in the number of patterns in the corresponding sub-area, recognition of a handwritten character is less accurate. In order to improve the recognition capability for handwritten characters, a proper adjustment needs to be made to the identification values.