Typically, a speech recognition apparatus has a function of recognizing a human utterance speech both by converting a pronunciation of a recognized word stored in a recognized-word storing unit into a phoneme string and by generating a word model as a standard pattern string from the thus converted phoneme string. Specifically, the speech recognition apparatus converts a pronunciation of a recognized word into a phoneme string with reference to either a conversion rule between a pronunciation and a phoneme or a conversion rule between a pronunciation and a phoneme string. The speech recognition apparatus generates a word model as a standard pattern string from the converted phoneme string. The speech recognition apparatus calculates a similarity at each time between an inputted utterance speech and the generated word model. From the generated word models, the speech recognition apparatus extracts a word model whose similarity at each time is equal to or higher than a threshold value. The speech recognition apparatus outputs as a recognition result the recognized word that corresponds to the extracted word model (see for example, Japanese Laid-open Patent Publication No. 62-116999, Japanese Laid-open Patent Publication No. 63-5395, Japanese Laid-open Patent Publication No. 01-302295 or Japanese Laid-open Patent Publication No. 08-248979).
In the meantime, a human being does not always vocalize all phonemes clearly. That is, human utterance speech usually includes ambiguous phonemes. In particular, when a human being vocalizes fast, his/her utterance speech often may include ambiguous phonemes. For this reason, even when a human being vocalizes a recognized word, sometimes a speech recognition apparatus may not be able to recognize his/her vocalization.
By way of example, a pronunciation ‘toyotomi of a recognized word has been stored in the recognized-word storing unit of the speech recognition apparatus. In this case, the speech recognition apparatus converts the pronunciation ‘toyotomi of the recognized word into a phoneme string /toyotomi/ in accordance with a conversion rule. Here, according to the conversion rule, a Japanese character ‘to’ corresponds to /to/ (hereinafter, this rule is expressed as ‘to’< >/to/), similarly, ‘yo’< >/yo/ and ‘mi’< >/mi/. The speech recognition apparatus generates a word model of “toyotomi” as a standard pattern string from the converted phoneme string /toyotomi/. Here, in a human vocalization of the recognized word ‘toyotomi, since ‘yo’ in the ‘toyotomi is vocalized ambiguously, the speech recognition apparatus determines that the ‘yo’(phoneme /yo/) in ‘toyotomi is ‘o’ (phoneme /o/) from which a phoneme /y/ is omitted, hence that the utterance speech is ‘tootomi. When the similarity at each time between ‘o’ in the utterance speech ‘tootomi and “yo” in a word model “toyotomi” is equal to or lower than a given threshold value, the speech recognition apparatus cannot recognize the utterance speech ‘tootomi.
For solving such problems, in a typical speech recognition apparatus, a phoneme string including a phoneme that tends to be ambiguous due to the pronunciation of the recognized word has been added in advance to the conversion rule. In the above example, a rule of ‘toyo’< >/too/ is added to the conversion rules including ‘to’< >/to/, ‘yo’< >/yo/ and ‘mi< >/ml/. Thereby, the speech recognition apparatus converts the pronunciation of the recognized word ‘toyotomi into a phoneme string /toyotomi/ and a phoneme string /tootomi/. The speech recognition apparatus generates a word model of “toyotomi” as a standard pattern string from the converted phoneme string /toyotomi/ and a word model of “tootomi” as a standard pattern string from the converted phoneme string /tootomi/. In this manner, even when the speech recognition apparatus determines that the utterance speech is ‘tootomi, since the similarity at each time between the utterance speech ‘tootomi and the word model “tootomi” is higher than the given threshold value, the utterance speech ‘tootomi ’ can be recognized as ‘toyotomi.