To prepare proceedings of meetings or lectures, speech data have to be transformed into text (hereinafter, transformation of speech into text is called “writing-out”). However, as enormous human costs are required for speech writing-out operation, improvements in operating efficiency are required. Meanwhile, with the advancement in the speech recognition technique in recent years, accuracy in speech recognition for free speech has been improved. In such situation, a support for write-out operation using the speech recognition technique has been considered.
Patent Document 1 describes an example of an existing speech-to-text system using speech recognition. The speech-to-text system described in Patent Document 1 is a system in which speech data to be written out is speech-recognized and is automatically transformed into text, and then errors in the recognition result text are edited by a human to thereby complete writing-out. As shown in FIG. 10, this conventional speech-to-text system 200 includes a speech storing unit 212, a speech recognition unit 211, a recognition result storing unit 213, an editor unit 222, an edit position storing unit 225, an edit result storing unit 226, a speech playback unit 221, a speech playback time storing unit 224, and a synchronization unit 223.
The existing speech-to-text system 200 having such a configuration operates in a following manner. The speech storing unit 212 stores speech data to be written-out. The speech recognition unit 211 reads in the speech data from the speech storing unit 212 to recognize the speech, and converts the data into recognition result text information and outputs it to the recognition result storing unit 213. In this process, link information for associating each word in the recognition result text information with a part of the speech data is also output together. The link information includes time information based on the playback time of the speech data corresponding to each word. The recognition result text information and the link information are stored in the recognition result storing unit 213. Thereby, the recognition result text information and the speech data can be matched.
The editor unit 222 reads in the recognition result text information stored in the recognition result storing unit 231, and edits errors in the recognition result text according to the edit instructions by the writing-out operator, and outputs the edited text to the edit result storing unit 226. Similar to a general text editor, the editor unit 222 positions an edit cursor on the text, the text at which the edit cursor is located is edited. The position of the edit cursor is stored in the edit position storing unit 225.
Meanwhile, the speech playback unit 221 plays back the speech data stored in the speech storing unit 22 in accordance with the speech playback instructions from the write-out operator. At this point, the time of the speech being played back is stored in the speech playback time storing unit 224. The writing-out operator proceeds editing operation of the errors in the recognition result text while listening to the speech being played back.
The synchronization unit 223 synchronizes the position of the edit cursor stored in the edit position storing unit 225 with the speech playback time stored in the speech playback time storing unit 224 in accordance with the synchronization instructions by the operator. This is realized by referring to the link information, associating the recognition result text information and the speech data, stored in the recognition result storing unit 213. That is, in the case of synchronizing the speech playback time with the position of the edit cursor, it is only necessary to match the speech playback time to the time corresponding to the recognition result word at which the edit cursor is positioned. In this way, by synchronizing the speech playback time with the position of the edit cursor, the writing-out operator can promptly listen to the speech corresponding to the position of the edit cursor and check it during the editing operation.
In contrast, in the case of synchronizing the position of the edit cursor with the speech playback time, it is also possible to move the edit cursor onto the recognition result text corresponding to the part of the speech data which is being played back. By synchronizing the position of the edit cursor with the speech playback time in this way, the writing-out operator can place the edit cursor at the position on the text corresponding to the part recognized incorrectly as soon as he/she listens to the speech which is recognized incorrectly.
As described above, in the conventional speech-to-text system, as it is possible to synchronize the position of the edit cursor placed on the recognition result text with the speech playback time each other by using correspondence between the recognition result text and the speech data, efficiency of the writing operation is improved.
Patent Document 1: JP Patent Laid-Open Publication No. 2004-530205