The present invention generally relates to a method of extracting feature quantities of a character for character recognition. The present invention relates more particularly to a method of extracting feature quantities of a character which is not affected by a font style or size of a character obtained by optical scanning.
A method of extracting feature quantities of a contour portion of an unknown character by tracing the contour portion is known as one of methods of extracting feature quantities of a character which are used in an optical character reader (hereafter simply referred to as OCR). The feature quantities of the contour portion are obtained in the form of a closed loop composed of direction codes representing feature quantities. At the time when the feature quantities are compared with feature quantities of a reference (known) character stored in a dictionary, the obtained feature quantities in the form of the loop must be extended as a one-dimensional sequence or chain thereof. At this time, a reference point for cutting the loop is necessary to extend the loop of the feature quantities. On the other hand, a reference point of the group of the feature quantities of the reference character is predetermined. In practice, it is very difficult to certainly decide the reference point of the loop of the direction codes of the unknown character. When the loop is cut at an erroneous point, a difference (distance) between the feature quantities of the unknown character and the feature quantities of the known character corresponding the unknown character is increased. This leads to an increase in error in the character recognition. Conventionally, a start point from which tracing of the contour portion of the unknown character starts, is decided to be the reference point. The start point is obtained by finding the first change from a white pixel to a black pixel by raster-scanning a rectangular region having a character image, starting from the top of the rectangular region. However, the reference point thus obtained deviates due to fluctuation of a character image and is therefore unstable.
Additionally, the conventional method of extracting the feature quantities of the character contour portion cannot extract topological feature quantities of a character, i.e., information on a shape of a character by a simple process.
Moreover, another conventional method of producing a histogram of direction codes to identify the unknown character is very sensitive to variations in shape of characters. Therefore, a number of dictionaries used for storing reference histograms of characters must be prepared for various shapes of one character. This requires a large amount of memory capacity and an increased time for the character recognition.