1. Field of the Invention
The present invention relates to a speech recognition apparatus and a speech recognition method for executing voice recognition on input speech, and also relates to a computer-readable recording medium having recorded thereon a program for realizing the apparatus and method.
2. Description of Related Art
Conventionally, dictionaries for speech recognition called language models are used in speech recognition. Also, the recognition accuracy of speech recognition currently tends to falls relative to a rise in the number of parameters in the learning texts used in the creation of a dictionary. Accordingly, instead of having general-purpose dictionaries that allow recognition with any language, the area in which a dictionary is to be applied is limited to a certain extent, and language models are created and used with focus on a certain field.
In order to improve recognition accuracy under these circumstances, JP 2001-100783A, JP 2002-091484A, and JP 2010-170137A disclose techniques for performing speech recognition using multiple language models.
For example, according to the technique disclosed in JP 2001-100783A, firstly, multiple language models are created by adding example phrases regarding specific topics to the respective language models. Next, speech recognition is performed using those language models, and then the most likely recognition result is selected from among the recognition results.
Also, according to the technique disclosed in JP 2002-091484A, firstly, tree-structure clustering is performed on learning text data, and thus the learning text data is divided into multiple clusters such that each cluster has linguistically analogous properties. Next, a language model is created for each cluster, speech recognition is performed using each language model, and then the word string (recognition result) that has the highest likelihood is output.
Furthermore, according to the technique disclosed in JP 2010-170137A, firstly, speech recognition is performed using multiple language models that are different from each other, and a confidence is calculated in units of utterances. Next, the recognition result having the highest confidence is selected, and the selected recognition result is output.
However, the technique disclosed in JP 2001-100783A and the technique disclosed in JP 2002-091484A have the problem that they are only useful with speech whose field is known in advance. For this reason, when the field of input speech is unknown, it is necessary for someone to recognize the field in advance by listening to the speech and then prepare a language model or learning text data in the corresponding field. Also, in order to implement speech recognition using batch processing when there is a large number of speech files in different fields, it is necessary to classify the speech files by field in advance, and then prepare corresponding language models.
However, according to the technique disclosed in JP 2010-170137A, it is conceivable to be able to handle speech whose field is not known in advance if as many language models as possible are prepared, but as the number of language models rises, there ends up being a rise in the number of speech recognition engines that operate at the same time. This results in the possibility of an excessive rise in the processing load borne by the system during speech recognition.
Although it is conceivable to solve the above-described problems by automatically identifying the field of input speech and selecting a language model in the appropriate field, the reality is that a technique for automatically identifying the field of input speech does not exist.