In order to enhance the convenience of a user who is driving, there is developed a voice interface capable of operating an in-vehicle device such as a car navigation system by using voices. In a case of using, for example, the voice interface, thereby initiating an operation of the in-vehicle device, usually the user vocalizes a defined command word, thereby performs a voice operation.
Note that, in a case where the user does not remember command words or in a case where an environment or a condition of vocalization causes voice recognition to fail, a situation in which the user repeatedly vocalizes a word other than the command words may occur. Therefore, it is desirable that, by detecting that the user repeatedly vocalizes the same word, the user is notified thereof. An example of a conventional technology for detecting a repetition of the same vocalization will be described.
As an example of the conventional technology, there is a technology in which, by using a large-sized word dictionary having a volume of vocabulary corresponding to 1700 or more words, processing for determining a word similar to voice information vocalized by a user is sequentially performed and individual determined words are compared, thereby detecting a repetition of the same vocalization.
However, the conventional technology is based on an assumption that the large-sized word dictionary is used, and it is not appropriate to install the large-sized word dictionary in an apparatus such as an in-vehicle device, which uses command words small in size. In addition, in the conventional technology, in a case where a system is constructed by using a small-sized word dictionary, it becomes difficult to determine a word similar to voice information vocalized by a user. In contrast, there is an attempt to detect a repetition of the same vocalization without using a large-sized word dictionary.
In another conventional technology, there is repeatedly performed processing for detecting and registering, from voice information vocalized by a user, feature parameters of voice information that does not fit with a word dictionary. In addition, the feature parameters of voice information that does not fit with the word dictionary and already registered feature parameters are subjected to dynamic programming (DP) matching, thereby detecting a repetition of the same vocalization. As feature parameters of voice information, MEL frequency cepstral coefficients (MFCC) are used, for example.
As examples of the related art, Japanese Laid-open Patent Publication No. 62-173498 and Japanese Laid-open Patent Publication No. 2002-6883 are known.