The present invention relates to a speech recognition apparatus for recognizing spoken words or speech by dividing the spoken words into phonemes and recognizing the respective detected phonemes.
A keyboard has conventionally been used for inputting instructions and so on in a computer. However, this type of keyboard for inputting data is difficult to handle and requires special training for a operator. Accordingly, a data input method has been recently proposed according to which the instructions are verbally given the spoken words or speech recognized, and the instruction data corresponding to the recognized speech are entered into a computer. When the instruction data is entered using a speech recognition apparatus, the work load on the operator is vastly decreased.
In a speech recognition apparatus of this type, the speech signal is once converted into phonemes and thereafter matched with phonemes prepared in units of words or clauses. The phoneme division, in this case, is performed by separating a series of vowel and consonant phonemes or a series of vowel and vowel phonemes. For this reason, the spoken words has been conventionally divided into phonemes utilizing the fact that the acoustic power of a vowel is larger than that of a consonant, and that the acoustic power of different vowels is also different from each other. Another method for dividing speech into phonemes is based on differences between the acoustic patterns of vowels and consonants, or between vowels, for example, differences in the power spectrum, the zero-crossing interval, the linear predictive coefficient, and the vocal tract area function.
However, both of these methods have been unable to provide precise phoneme division because, in practice, the differences in the speech parameters at two continuous timings are detected, and the speech is divided into phonemes only based on these differences.