This invention relates generally to speech recognition apparatus and method, and more particularly to a speech apparatus and method using phoneme recognition.
Apparatus for and methods of speech recognition wherein spoken words are automatically recognized are extremely useful for supplying computers and other devices with data and instructions. In the prior art, pattern-matching is frequently used for word recognition. According to the pattern-matching method, there are prepared and prestored in a memory various standard patterns for all words to be recognized. The degree of similarity between an input unknown pattern and the standard patterns is computed to determine the input pattern data having the greatest similarity to the stored pattern. In this pattern-matching method, it is necessary to prepare standard patterns for all words to be recognized. Hence, new standard patterns must be supplied and stored by the apparatus when the apparatus is to recognize the words spoken by different people. If several hundred words are to be recognized, time-consuming and troublesome operations are performed to register all these words spoken by each speaker. Furthermore, a memory used for storing such spoken words is required to have an extremely large capacity. Moreover, when this method is used for a large number of words, a long time period is required to match an input pattern and the standard patterns.
Another method of obtaining the similarity between words prestored in a word dictionary uses phonemes. Input sounds are recognized as a combination of phonemes. In phoneme matching, the capacity of the memory used as the word dictionary is small, the time required for pattern matching comparison is short, and the contents of the word dictionary can be readily changed. For instance, since the sound "AKAI" can be expressed by way of a simple form of "a k a i" with three different phonemes /a/, /k/ and /i/ being combined, a number of spoken words emitted from unspecific speakers is easily handled.
In speech recognition for unspecific speakers, the characteristics of sounds drastically change depending on sex distinction and age difference. A problem with prior art phoneme devices is how to generalize various sound characteristics so as to recognize words spoken by unspecific persons.
In the case of recognition with a phoneme unit, phoneme standard patterns are subjected to a large dispersion due to sex distinction and age difference; for instance, in the case of a vowel /a/, there is a great difference in the shape of spectrum patterns in a spectrum diagram between male and female speakers.
In prior art devices this problem is solved by preparing plural standard patterns for each phoneme; each pattern corresponds to the phoneme for plural speakers. A calculation is performed for all the standard patterns and an input sound to determine which standard pattern is most similar to the input sound. However, this conventional technique suffers from the following drawbacks:
(1) The speech recognition must be expensive to perform high speed calculations for a large number of similarity calculations.
(2) Recognition rate is somewhat low since similarity is calculated by finding a phoneme having the greatest similarity to all the standard patterns; the number of similar phonemes is large, therefore, causing increased confusion between phonemes.
(3) The recognition rate is very low if a speaker utters sounds which do not correspond to any of the prepared standard patterns.