It has been traditionally known that speech recognition by a computer always causes a recognition error. As seen from wrong hearing of other person's talk, even a human being cannot recognize speech 100 percent correctly. This is because human speech includes an utterance that is mistakable for other word, an utterance including a homonym, or an unclear utterance. A problem of such erroneous recognition (wrong hearing) is easily solved by a speech dialogue between human beings. However, between a computer and a human being, it is difficult to perform such a flexible speech dialogue between the human beings. No matter how a speech recognition technique is improved to increase a recognition rate, the recognition rate will never reach 100%. It is because always continuing to give a clear and unambiguous utterance is extremely difficult for a human being. Accordingly, in order to fabricate a speech recognition system by which speech recognition can be routinely used, it is essential to allow erroneous recognition that would always occur somewhere to be easily corrected.
Various techniques for correcting a result of recognition have been therefore proposed up to now. In commercially available dictation software, for example, when a user sees a text display of a recognition result and then discovers erroneous recognition, he can specify a segment of the erroneous recognition by an operation using a mouse or a voice input. Then, other candidates for the segment of the erroneous recognition are displayed. The user can thereby select a correct candidate and correct the segment of the erroneous recognition. In a technique disclosed in Nonpatent Document 1, the technique as described above is developed, and a recognition result of a speech separated by word boundary lines is displayed after completion of the speech. Then, it is arranged that boundaries of words may be shifted using a mouse in such a manner that segmentation of the words is modified by kana-kanji conversion. In this case, a possibility that a correct candidate can be fetched up has increased. However, time and effort for correcting erroneous recognition by the user, such as specification of a location of the erroneous recognition, change of a word boundary, and selection of a candidate, have increased. On the other hand, in a technique disclosed in Nonpatent Document 2, a practical recognition error correction system is implemented for subtitled broadcasting for news programs that utilizes speech recognition. This technique, however, assumes division of labor by two persons. It is necessary for one person to discover and marks a location of erroneous recognition, and it is necessary for another person to type a correct word into the location of erroneous recognition. Accordingly, an individual cannot use this technique in order to correct input of his speech. As described above, both of the conventional arts require time and effort: the user first discovers and points out a location of erroneous recognition, and next the user determines and selects other candidate for the location of erroneous recognition, or corrects the location of erroneous recognition by typing.
Patent Document 1 (Japanese Patent Publication No. 2002-287792) discloses a technique in which correction of speech recognition is performed by a voice input. Patent Document 2 (Japanese Patent Publication No. 2004-309928) discloses an electronic dictionary system that has a function of displaying a plurality of output candidates on a display portion when there are the output word candidates resulting from speech recognition, and instructing a speaker to select a desired word from among the output word candidates. Patent Document 3 (Japanese Patent Publication No. 2002-297181) and Patent Document 4 (Japanese Patent Publication No. 06-301395) disclose a technique of using a confusion matrix in order to improve a recognition rate of speech recognition.
Nonpatent Document 1: Endo and Terada: “Candidate selecting interface for speech input”, In proceedings of Interaction 2003, pp 195-196, 2003.
Nonpatent Document 2: Ando et al.: “A Simultaneous Subtitling System for Broadcast News Programs with a Speech Recognizer”, The Transactions of the Institute of Electronics, Information and Communication Engineers, vol. J84-D-II, No. 6, pp. 877-887, 2001.
Patent Document 1: Japanese Patent Publication No. 2002-287792
Patent Document 2: Japanese Patent Publication No. 2004-309928
Patent Document 3: Japanese Patent Publication No. 2002-297181
Patent Document 4: Japanese Patent Publication No. 11-311599