1. Field of the Invention
The present invention relates to a method and apparatus for reading and recognizing characters, printed or handwritten, on a sheet of paper.
2. Description of the Related Art
Japanese Unexamined Patent Publication Nos. 64-78395 and 5-108882 disclose apparatuses for reading and recognizing characters printed or handwritten on a sheet of paper. To execute character recognition, image data of a character read by a CCD (Charge Coupled Device; solid image sensing elements) in a reading section is converted to binary image data by a binarization section, and a character area is extracted from this binary image. The extracted character area is then segmented to mesh areas in a matrix form (e.g., 8.times.8).
For each mesh area, the ratio of the area of black pixels to the area of the mesh area, or the density, is acquired. The density distribution of mesh areas in a character area represents the characteristics of the character pattern. Character recognition is performed by comparing the density distribution for a character area with the density distributions of character patterns in a previously prepared dictionary based on the characteristics.
Handwritten characters and printed characters vary in size and shape even if they are the same character. To facilitate comparison of a handwritten or printed character with character patterns in the dictionary, therefore, size and outline shaping processing or normalization is performed.
Normalization has been accomplished mainly in two ways. One is to normalize a rectangle circumscribing a character and the other is to normalize a square circumscribing a character. As shown in FIG. 9, the former method normalizes a character by forming a circumscribing rectangle F with respect to the character pattern L of a binary image and converting the circumscribing rectangle F to a specified area S of a predetermined square size. According to this method, even if the input characters differ in size and shape, their character patterns L, after normalization, become substantially the same in size and shape as shown at the lower portions in FIGS. 9(a) and 9(b), thus ensuring a constant aspect ratio (the ratio of the vertical size to the horizontal size). This scheme is advantageous in that the number of character patterns in the dictionary can be reduced.
In the case of a vertically elongated character such as "1" and a horizontally elongated character such as "-" as shown in FIGS. 9(c) and 9(d), however, the normalization based on a circumscribing rectangle fills most of the specified area S with black pixels as shown in the lower portions in those drawings. This makes character recognition difficult.
However, as shown in FIG. 10, the other method for normalizing a character based on a circumscribing square normalizes a character by forming a circumscribing square "A" with respect to the character pattern L of a binary image and converting the circumscribing square "A" to a specified area S. Space, or white pixel area, is added to the sides of the character or above and below a character, depending on whether the character is elongated vertically or horizontally. This method therefore overcomes the aforementioned shortcoming of the normalization that is based on a circumscribing rectangle.
When a character like "7" or "9" is written significantly long vertically by, for example, a hand, however, adding space to the lateral sides of the character at the time of normalization may yield a deformed character that is difficult to recognize. The same applies to characters that are significantly horizontally elongated. Since the shape of a read character directly reflects on the character area L after normalization in the circumscribing-square based normalization scheme as shown at the lower portions in FIGS. 10(a) and 10(b), there are various sizes and shapes for normalized character patterns L that cannot be classified. This necessitates the preparation of many character patterns in the dictionary.
A solution to this problem was proposed in, for example, Japanese Unexamined Patent Publication No. 5-108882, which discloses an apparatus for extracting a character area in the form of a circumscribing rectangle with respect to the character pattern of a binary image and changing the number of segments of the character area in accordance with the aspect ratio of the character area. With regard to vertically elongated characters, this method increases the number of segments in the vertical direction and reduces it in the horizontal direction. For horizontally elongated characters, the number of segments is reduced vertically and is increased horizontally.
Even a character written significantly long in either direction is normalized without being deformed, and it is easily recognized. This character recognition apparatus however needs exclusive dictionaries for different ways of segmentation and thus suffers an increased number of dictionaries and an increased number of character patterns to be stored in the dictionaries.