1. Field of the Invention
The present invention relates to a recognition apparatus for extracting a feature quantity from an input pattern comprising an image pattern such as a typed or hand-written pattern or a speech signal pattern such as word feature quantity pattern and for recognizing the pattern, and to a method for extracting the feature quantity.
2. Description of the Related Art
A feature quantity is extracted from an unknown image pattern (such as a typed or hand-written pattern) and a speech signal pattern (such as a time variable pattern of a frequency envelope of an unknown word) and a recognition of these unknown input patterns is conducted. A method for matching this feature quantity with a feature quantity of a previously-stored learning dictionary pattern and thus determining the dictionary pattern with the highest similarity to the input pattern as the recognition result, is generally known.
A prior art method for performing a matching of one unknown input pattern with a plurality of learning dictionary patterns is as follows. A set of feature quantities, extracted from an unknown input pattern, and a set of feature quantities representing the whole feature of a pattern are stored as a dictionary pattern. Respective sets of feature quantities of the dictionary pattern are matched with a set of feature quantities of the unknown input pattern so that all the sets of feature quantities of the dictionary patterns are matched with a set of feature quantities of the unknown input pattern. The dictionary pattern is arranged in order of the highest to lowest similarity of the feature quantity to provide recognition candidates. This is the "usual pattern matching method".
In the above prior art, when a lot of image data is to be recognized, a lot of data is required to enable feature quantities to be extracted from one pattern. Thus, the quantity of data required to express a pattern is very large. About 3000 Japanese image data are required to recognize one Japanese type. Thus, a great amount of memory capacity is required to store feature quantities of all Japanese type image data, resulting in high cost.
As feature quantities of unknown input patterns are matched with all the feature quantities of respective dictionary patterns of 3000 types, the prior art requires an extremely long recognition time and has a bad response capability.
On the other hand, where a single unknown input pattern is matched with a learning dictionary pattern to recognize a word, it is important to compare appropriate feature quantities of the unknown input pattern and learning dictionary pattern, because the respective patterns comprise a plurality of feature quantities.
The first prior art for the matching method is as follows. Time sequence patterns of feature quantities of unknown input words are made to correspond to the time sequence patterns of the feature quantities of the learning word, starting from the beginning of respective words, and similarities (distances) between respective feature quantities are calculated. The sum of the similarities between respective feature quantities of the whole time sequence pattern is obtained and the similarity between the words is calculated. This calculation is performed for all the learning words and the one having the highest similarity (i.e. the smallest distance) is provided as the recognition result.
Generally, words have different lengths as the time taken to pronounce words is not constant. Therefore, as the above first prior art does not consider differences in word length, it cannot achieve a high recognition ratio.
A DP (dynamic programming) method is provided as the second prior art. This method expands or compresses, nonlinearly in the time direction, respective time sequence patterns of feature quantities of the unknown input words and respective time sequence patterns of the feature quantity of the learning word. It then repeats the calculation of distance between respective feature quantities and calculates the accumulated distance for respective words. This calculation is applied to all the learning words and the one having the smallest accumulated distance is determined to be the recognition result.
The above second prior art considers differences in length and can achieve a high recognition ratio. It requires a lot of calculation as it needs to expand and compress the time axis non-linearly for respective words. In particular, when this calculation is repeated for all the learning words, the total amount of calculation is extremely large, making it difficult to perform realtime processing.
Both first and second prior arts, which require a matching for all the learning words recited above, have the basic problem that a large amount of calculation is required.
To solve this problem, there is proposed a method of roughly classifying the unknown input words to focus the word candidate, and performing a fine recognition on a limited number of word candidates. In this case, the prior art cannot provide a recognition method which focusses the word candidate sufficiently quickly and accurately. As a result, it cannot easily realize a quick word recognition method with a high recognition ratio.
Extraction of a feature quantity constitutes the basis of pattern matching. A feature quantity expressing the feature of the pattern must be extracted efficiently from a type image pattern or word signal pattern (which is obtained by expressing a time variable pattern of the frequency envelop in a form of an image) to enable recognition of the pattern.
In the first prior art for extracting a feature quantity, the density or direction of strokes in many directions is extracted.
However, this prior art has the problem that it cannot easily reflect the structure of a complex pattern (such as a complex Chinese character in Japanese) by the density of strokes. For example, it cannot express a pattern having a structure of many short strokes by the density of strokes.
The second prior art is a so-called structure segmentation method, in which the pattern is recognized by separating local structure segments. For example, the Chinese characters, "hen" and "tsukuri" constitute parts of a composite Chinese character, and these parts are separated from each other for recognition.
However, the above second prior art needs to examine all the structure segments and it takes time to extract the feature quantity. It also needs to separate and recognize respective structure segments and thus it is not effective when noise is present. It cannot be applied to patterns which have single structure segments, namely, patterns such as the in Chinese characters in Japanese. Further, in some characters the structure segments have the same shape but different size, for example, in and . This makes it difficult to form a pattern with a standard feature quantity.
Therefore, the second prior art cannot provide a pattern recognition system with high capability.