1. Field of the Invention
The present invention relates to a voice recognition device, a voice recognition method, and a voice recognition program that recognize voice input from a user to obtain information for controlling an object based on the recognition result.
2. Description of the Related Art
In recent years, for example in a system in which a user operates an apparatus and the like, a voice recognition device is used which recognizes voice input from the user to obtain information necessary for operation of the apparatus and the like. In such a voice recognition device, when it recognizes the voice (speech) input from the user, it responds to the user based on the recognition result to prompt next speech of the user, for interaction with the user. Then, it obtains the information necessary for operation of the apparatus and the like from the result of recognition of the interaction with the user. At this time, for example a recognition dictionary having word entries as recognition objects registered in advance is used to recognize the voice by comparing the acoustic feature of the input voice with the acoustic features of the word entries registered in the recognition dictionary.
The voice recognition device is mounted to a vehicle, for example, and a plurality of apparatuses mounted to the vehicle, such as an audio system, a navigation system, an air conditioner and the like, are operated by the user. These apparatuses are advanced in function; for example, the navigation system is provided with a plurality of functions including “map display”, “POI (Point of Interest) search”, and others, which functions are operated by the user. When there are such a large number of control objects, however, the number of word entries for operating the control objects increases inevitably. The increase of the word entries to be recognized leads to an increase of the case where the acoustic features of the word entries are similar to each other, which may increase the possibility of recognition error. As such, there has been proposed a technology to perform voice recognition processing by restricting the recognition dictionary to be used in accordance with the input voice, to improve the recognition accuracy (see, e.g., Japanese Patent Laid-Open No. 2001-034292 (hereinafter, referred to as “Patent Document 1”)).
In the voice recognition device (word sequence recognition device) of Patent Document 1, the input voice is compared with the recognition dictionary data to carry out first-time voice recognition processing to recognize and extract a keyword (a word having a prescribed attribute). Then, in the voice recognition device, a topic is confirmed based on the extracted keyword, and the recognition dictionary data is reconstructed to have the word entries restricted based on the confirmed topic, and second-time voice recognition processing is carried out based on this reconstructed recognition dictionary data, to recognize another word. Further, in the voice recognition device, the processing of reconstructing the recognition dictionary data based on the topic confirmed from the recognized word to thereby recognize another word, as described above, is repeated a required number of times to carry out a multi-stage processing, to recognize the voice input from the user.
With the above-described voice recognition device, however, the recognition dictionary data is not restricted in the first-time voice recognition processing, which means that there would be many word entries having similar acoustic features, leading to a high possibility of recognition error. If the keyword extracted from the input voice in the first-time voice recognition processing is a result of such erroneous recognition, the topic confirmed based on the extracted keyword would considerably deviate from the actual topic. When the second-time voice recognition processing is carried out based on the recognition dictionary data reconstructed to be restricted to the word entries not even close to the actual topic, the word entries matching the user's speech would not be recognized, hindering correct recognition of the user's speech. Furthermore, the voice recognition device carries out the second-time and subsequent voice recognition processing in a manner similar to that of the first-time voice recognition processing, and the features of the input voice are grasped in a similar manner in each processing. Thus, even if a correct keyword is extracted in the first-time voice recognition processing and the recognition dictionary data is reconstructed appropriately, the possibility of recognition error in the reconstructed recognition dictionary data is not eliminated, and thus, it is still highly likely that a correct recognition result may not be reached.