Conventionally, it has been suggested to apply speech recognition technology to a user-friendly input front end for controlling a device. Generally in speech recognition, an approach is adopted in which uttered speech is compared with each of standard patterns of words defined in a speech recognition dictionary, with the most similar word pattern being regarded as a recognition result, as described in Non-patent Reference 1.
However, since the user of the device does not always remember all the words covered by speech recognition, he/she may utter a word not covered by speech recognition. Since, in such a case, the most similar word registered in the speech recognition dictionary is returned as a result under the above-mentioned basic framework of speech recognition, there is a problem in that this inevitably causes misrecognition of the utterance. To address this problem, a method for detecting a user's utterance of a word which is not included in the speech recognition dictionary (an unregistered word) has been designed.
For example, Patent Reference 1 describes a method in which the similarity between input speech and each word in the speech recognition dictionary is calculated, the similarity of each word is corrected based on the reference similarity calculated from a pattern which is a concatenation of unit standard patterns, and the user's utterance of the word is regarded as an unregistered word when the corrected similarity is less than a predetermined threshold value.
Patent Reference 2 describes a method for detecting an unregistered word with a small amount of processing and with high accuracy, using a phoneme Hidden Markov Model (HMM) and a Garbage HMM.
It can be easily conceived that when the user's utterance of an unregistered word is detected, a warning such as a beep is sounded to the user or a response such as “sore wa arimasen (it is not found)” by substituting the uttered word with a pronoun (it).
However, it is not enough for the user only to return such a response, because this response does not clearly indicate to the user whether his/her uttered word has not been recognized by chance or the word is an unregistered one.
Therefore, the user has no other choice but to accept such a situation or repeat the utterance with more attention to the pronunciation until giving up. This is a problem that decreases the convenience of controlling the device by voice input.
To address this problem, Patent Reference 3 describes a method for presenting, to the user, a list of words which can be accepted by the device depending on the situation, when the user's utterance of an unregistered word is detected. According to this method, even if the user has no idea about the words which can be recognized by the device, a list of words he/she can utter in the situation is presented every time he/she utters the unregistered word. Therefore, the user does not need to repeat the utterance of the same word over and over, and thus can make the device operate as he/she intends.
Patent Reference 4 describes a method in which speech recognition is performed using, as speech recognition dictionaries, both an internal dictionary corresponding to a conventional speech recognition dictionary and an external dictionary containing a lot of words which are regarded as unregistered in the conventional speech recognition dictionary, and when a recognition result is a word contained in the external dictionary, the fact that the word is an unregistered one is presented as well. According to this method, for example, when a user utters “Matsushita-Taro” under the situation where the word “Matsushita-Taro” is contained in the external dictionary, it is possible to return such a response as “Matsushita-Taro wa orimasen (Matsushita-Taro is not present)”.
Patent Reference 1: Japanese Patent No. 2808906
Patent Reference 2: Japanese Patent No. 2886117
Patent Reference 3: Japanese Patent No. 3468572
Patent Reference 4: Japanese Laid-open Patent Application No. 09-230889
Non-patent Reference 1: Kiyohiro Shikano, Satoshi Nakamura, and Shiro Ise, “Digital Signal Processing Series 5: Speech/Acoustic Information Digital Signal Processing” Shoko-do, Nov. 10, 1997, pp. 45 and 53.