A speech recognition device sometimes outputs a so-called ungrammatical sentence, such as an ungrammatical word sequence or a word sequence having no normal meaning, as a result of a speech recognition process. In a related art, there has been suggested a method of determining correctness and fault of a recognition result based on a scale of confidence measure in order to detect the recognition errors that cause these ungrammatical sentences. This method has been described, for example, in Patent Document 1 to be described later.
The technology described in Patent Document 1 is associated with conducting rescoring by using a confidence measure model obtained by integrating plural confidence measures that originate from a speech recognizing unit used when a recognition result is obtained. A speech recognition system using this technology generates plural sentence hypotheses including a characteristic amount by a sentence hypothesis generating unit from a characteristic amount-included word network including a recognition result of an inputted speech and the characteristic of the recognition result as shown in FIG. 6. And, the confidence measure of each sentence hypothesis is calculated based on the generated sentence hypothesis and confidence measure model by a confidence measure calculating unit, and a sentence hypothesis whose order has been changed based on the confidence measure is outputted as a result of speech recognition by a rescoring unit.
In the meanwhile, in a speech recognition process a speech is converted into data using a language model that functions as a reference, and there have been suggested the following technologies to raise accuracy of the language model. For example, there exist a method of using a relationship between words that are far off from each other as described in Non-patent Document 1 to be described later, a method of optimizing the entire documents using topic information as described in Non-patent Document 2, and a method of using corpuses obtain able from WWW to estimate the appearance probability of a word as described in Non-patent Document 3.
Patent Document 1: Japanese unexamined patent application publication No. 2006-85012
Non-patent Document 1: R. Lau, R. Rosenfeld, S. Roukos, “Trigger-Based Language Models: A Maximum Entropy Approach”, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing proceedings, (U.S.), IEEE (Institute of Electrical and Electronics Engineers), 1993, Volume 2, pp. 45-48
Non-patent Document 2: D. Gildea, T. Hofmann, “Topic-Based Language Models Using EM”, Sixth European Conference on Speech Communication and Technology (EUROSPEECH '99) proceedings, ISCA (International Speech Communication Association), 1999, pp. 2167-2170
Non-patent Document 3: A. Berger and R. Miller, “Just-in-time Language Modeling”, 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing proceedings, (U.S.), IEEE (Institute of Electrical and Electronics Engineers), 1998, Vol. 2, pp. 705-708
Non-patent Document 4: J. Lafferty et al., “Conditional Random Fields Probabilistic Models for Segmenting and Labeling Sequence Data”, 18th International Conference of Machine Learning proceedings, 2001, pp. 282-289