Examples of a speech recognition system relating to the present invention is disclosed in Patent Document 1 and Non-Patent Document 1.
As shown in FIG. 7, the speech recognition system according to the prior art includes a speech input section 501, an utterance label input section 502, an acoustic model storage section 503, a recognition dictionary storage section 504, a speech recognition section 505, an utterance variation data calculating section 506, an utterance variation data storage section 507, a recognition dictionary extending section 508, an extended recognition dictionary storage section 509, a speech input section 510, a speech recognition section 511, and a recognition result output section 512.
The speech recognition system having the above configuration operates as follows.
First, a learning step of an extended recognition dictionary of a speaker p will be described. Learning speech of the speaker p is input through the speech input section 501 and is then recognized by the speech recognition section 505 using an acoustic model stored in the acoustic model storage section 503 and a recognition dictionary stored in the recognition dictionary storage section 504. Then, in the utterance variation data calculating section 506, a recognition result phoneme sequence output from the speech recognition section 505 and an utterance label including a correct phoneme sequence corresponding to the learning speech of the speaker p which is input through the utterance label input section 502 are compared with each other to calculate a correspondence between the correct phoneme sequence and recognition result phoneme sequence. The calculated correspondence is stored in the utterance variation data storage section 507. Further, in the recognition dictionary extending section 508, standard phoneme sequences of words included in the recognition dictionary stored in the recognition dictionary storage section 504 are replaced with the utterance variation phoneme sequences stored in the utterance variation data storage section 507 to generate an extended recognition dictionary including a plurality of phoneme sequences. The generated extended recognition dictionary is stored in the extended recognition dictionary storage section 509.
Next, a recognition step of speech of the speaker p will be described. The speech of the speaker p input through the speech input section 501 is recognized by the speech recognition section 511 using the acoustic model stored in the acoustic model storage section 503 and the extended recognition dictionary that has learned the utterance variation of the speaker p which is stored in the extended recognition dictionary storage section 509. A recognition result of the speech recognition section 511 is output from the recognition result output section 512.    Patent Document 1: JP-A-08-123470    Non-Patent Document 1: “Phoneme Candidate Re-entry Modeling Using Recognition Error Characteristics over Multiple HMM States” written by Wakita and two others, transactions of the Institute of Electronics, Information and Communication Engineers D-II, Vol. J79-D-II, No. 12, p. 2086-2095, December 1996    Non-Patent Document 2: “Pattern Recognition and Learning from the perspective of statistical science: Section I—Pattern Recognition and Learning” written by HidekiAsoh, Iwanami-Shoten, 2003, p. 58-61    Non-Patent Document 3: “Information Processing of Characters and Sounds”, written by Nagao and five others, Iwanami-Shoten, January 2001, p. 34-35    Non-Patent Document 4: “A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction (ROVER)” written by Jonathan G. Fiscus, Proc. IEEE ASRU Workshop, p. 437-352, 1997