(1) Field of the Invention
The present invention relates to a speech recognition apparatus and a speech recognition method for recognizing speech by using language models.
(2) Description of the Related Art
Language models which are intended for use in speech recognition and the like are obtained in the following way: preparing a great number of example sentence collections corresponding to target tasks of speech recognition; performing pre-processing, for example, deleting unnecessary symbols and the like from the prepared example sentences; performing a morphological analysis of the pre-processed example sentences; and statistically modeling word concatenation information. In general, 2-gram and 3-gram are used as language models.
Conventionally, the cost of generating language models like this was enormous because a great number of example sentences must be collected in order to execute tasks for which speech recognition is desired to be applied. Therefore, the following has been considered: reducing the number of example sentences which should be collected; and generating language models which are applicable to the topics of an utterance to be recognized (For example, refer to Patent Reference 1 and Patent Reference 2. Patent Reference 1: Japanese Patent Publication No. 2003-36093. Patent Reference 2: Japanese Patent Publication No. 10-198395).
The Patent Reference 1 has disclosed a topic adaptation technique of language models for speech recognition as a method for generating language models such as those described above.
FIG. 1 is a flow chart indicating a speech input search system employing a conventional topic adaptation technique disclosed in Patent Reference 1.
As shown in FIG. 1, in response to a search request uttered by a user, the speech input search system performs speech recognition using acoustic models 1012 and language models 1014 (Step S1016), and generates transcription of the speech (Step S1018). Here, the language models 1014 are generated based on text databases 1020. Next, the speech input search system executes text search using the transcribed search request (Step S1022), and outputs the search results in a predetermined order of relevance degrees (Step S1024). Next, the speech input search system obtains information from the documents of the search results in the descending order of relevance degrees and performs modeling based on the information (Step S1026), and refines the language models 1024 for speech recognition. Additionally, the speech input search system displays the search results on a display unit such as a display screen of a personal computer (Step S1028).
In addition, Patent Reference 2 has disclosed an invention of generating language models of a specified target task using information obtainable from present language models (language models generated from text data of other tasks) instead of collecting a great number of text databases.
FIG. 2 is an illustration for indicating the processing operation performed by the language model generation unit of the speech recognition apparatus in Patent Reference 2.
This language model generation unit calculates language probabilities (probabilities of word appearance) by using a distribution of concatenation frequencies (a posterior knowledge) and concatenation frequencies (a priori knowledge). The former is obtainable from language models (language models generated from the text data of other tasks), and the latter is obtainable from the collection of example sentences (Patent Reference 2, page 11, column 19, lines 3 to 5) containing thousands of words related to a specified target task (the text data of the specified task). In other words, the language model generation unit generates language models corresponding to a specified task. After that, the speech recognition apparatus of Patent Reference 2 performs speech recognition using the language models generated by this language model generation unit.