A voice recognition technique used for conventional voice recognition devices is specialized according to a recognition rate, a computation amount and hardware resources. Voice recognition for the purpose of this technique refers to identification, recognition and understanding of speech. For example, an in-vehicle voice recognition device adopts voice recognition (local recognition) specialized for vehicles, and has advantages of high noise resistance and responsivity. Further, for example, a voice recognition device of a server which recognizes voice data received from an outside via a network adopts voice recognition (server recognition) specialized for serves. This voice recognition device has advantages that it is possible to use a dictionary including multiple or new vocabularies and to recognize voices with a high computation amount.
In this regard, in recent years, for reasons of diversification of usages, a configuration where local recognition and server recognition are combined to provide both of the advantages has been studied. However, according to a configuration which uses a plurality of voice recognition units, recognition methods of respective voice recognition engines are different, and dictionaries for recognition (recognition dictionaries) of respective voice recognition engines are different. Therefore, there is a problem that it is not possible to simply compare respective recognition results.
More specifically, each of a plurality of voice recognition units determines a candidate character string corresponding to an input voice (a character string such as a vocabulary which is highly likely to match the input voice) as a voice recognition result. Each voice recognition unit calculates a score value indicating accuracy of each candidate character string (a probability that each candidate character string matches with the input voice). However, when score values for some candidate character strings are different between a plurality of voice recognition units, there is a problem that it is not possible to simply compare score values between a plurality of voice recognition units.
Hence, various techniques have been proposed for this problem. For example, Patent Document 1 proposes a technique of statistically processing score values which are different between a plurality of voice recognition units, normalizing the score values to score values which can be compared between a plurality of voice recognition units, and outputting a candidate character string of the highest score value as an entire recognition result.
Further, for example, Patent Document 2 proposes a technique of causing a first voice recognition unit to recognize an input voice by using a plurality of recognition dictionaries, storing candidate character strings of higher score values which are voice recognition results, in a secondary decision dictionary, and causing a second voice recognition unit to recognize an input voice by using the secondary decision dictionary.