1. Field of the Invention
The present invention relates to a speech recognition apparatus for recognizing speech based on a speech signal of utterance speech with reference to a statistical language model, and in particular, to a speech recognition apparatus equipped with means for removing an erroneous candidate of speech recognition.
2. Description of the Prior Art
In continuous speech recognition apparatuses, a statistical language model based on a statistical method which is so called N-gram is widely used (See, for example, a prior art document 1, L. R. Bahl et al., "A Maximum Likelihood Approach to Continuous Speech Recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 179-190, 1983). In a continuous speech recognition apparatus using an N-gram, improvement in the speech recognition rate is intended by previously training the transition probability, at which a transition is made from preceding N-1 words to the next word, with large-scale training data, and by predicting a word to be next connected with the trained transition probability at the time of speech recognition. Generally speaking, the larger the N grows, the more the prediction accuracy for the next word improves, whereas the number of kinds of word concatenation becomes large, this requires large amounts of training data to obtain reliable transition probabilities. Thus, under the existing circumstances, N is in many cases set to 2(in the case of bi-gram) or 3(in the case of tri-gram) or so for practical use. However, upon analyzing results of continuous speech recognition using the bi-gram of words or tri-gram of words, even if local word concatenations within 2 or 3 words have naturalness, unnatural mis-recognized sentence would frequently be outputted, when viewing the whole sentence. Thus, it is considered that more general language restrictions are necessary.
There have been language models that enable more general restrictions with the use of grammars such as context-free grammar and dependency relationships among words. However, taking into account the structure of natural utterance sentences and the various kinds of dependency relationships, it is difficult to build rules and dependency relationships, while the amount of processings become remarkably larger. On the other hand, a method for solving the ambiguity of syntactic construction by an example-led approach was proposed in a prior art document 2, Eiichiro Sumida et al., "An Example-Led Solution of Ambiguities on Destinations of Prepositive Words and Phrases in English", The Transactions of the Institute of Electronic Information and Communication Engineers of Japan (D-II), J77-D-II, No 3, pp. 557-565, "March 1994" (hereinafter, referred to as a prior art example). The method of this prior art example includes steps of extracting examples from corpus, calculating semantic distances between the expression of an input sentence and examples according to a thesaurus, and selecting such a sentence construction that the final semantic distance is minimized. The effects of this method were also confirmed for equivalent-word decision process or the like (See a prior art document 3, Furuse et al., "transform-led machine translation utilizing empirical knowledge", Transactions of the Information Processing Society of Japan, Vol. 35, No 3, pp. 414-423, March 1994).
However, in the speech recognition apparatuses using the method of the above-mentioned prior art example, there has been such a problem, for example, that inputting a sentence of unnatural construction with respect to trained examples would result in increased distances from any of the examples so that the resulting speech recognition rate would be relatively low.