As a speech recognition technology has been progressed and precision thereof has been improved, an application field thereof has been being greatly expanded, and the speech recognition technology has begun to be utilized for preparation for documents by dictation, the documents including business documents, medical charts, legal documents, and subtitles for television broadcasting. Moreover, it has been conceived to introduce a transcription technology using the speech recognition in a court, a conference and the like in order to prepare an investigation of a court and the minutes of a conference in a manner that the proceedings are recorded and the recording is transcribed into text.
The following documents are considered:                (Non-Patent Document 1) “Onsei-ninshiki ni yoru kakiokoshi-shisutemu no goshokai (Introduction of Rewrite System by Speech Recognition),” [online], Advanced Media, Inc., [searched on Nov. 25, 2003], Internet <url: http://www.advanced-media.co.jp/event/news/AmiVoice_Rew riter.pdf>        (Non-Patent Document 2) E. W. Brown et al., “Toward speech as knowledge resources,” IBM Sys. Journal, Vol. 40, No. 4, pp. 985-1001, 2001, Internet <url: http://www.research.ibm.com/journal/sj/404/brown.pdf>        
Conventionally, as the transcription technologies using the speech recognition of such a type, there have been one which recognizes, for each channel, speeches recorded in multi channels, manages the speeches along time by adding channel (that is, speaker) IDs to the multi channels, and displays a text transcript (for example, refer to Non-Patent Document 1); one which associates entire text obtained by the speech recognition with original speeches and makes it possible to reproduce the associated speeches by designating the text (for example, refer to Non-Patent Document 2); etc.
As described above, the speech recognition technology has been utilized for transcribing spoken speeches into text, and in some cases, the original speeches are reproduced and compared with the transcribed text for reasons of the study, verification and the like of the contents. In order to cope with such cases, each of the conventional technologies described above has a function to manage the prepared text and the original speeches in association with each other.
However, in the case of performing transcription from spoken speeches into text in order to prepare an investigation in a court or the minutes in a conference, the speeches in the court or conference become free speeches. Accordingly, unlike a written language and recitation speech of reading a draft aloud, the transcribed text is made of a spoken language, includes many restatings, repetitions and the like, and thus becomes difficult to read. Hence, when such a type of transcription into text is performed, it is necessary to modify the prepared text to one easy to read.
On the other hand, also when the modified text is read, there is a conceivable case where the original spoken speeches are desired to be listened to again in order to guess the mental situation of the speaker. For this purpose, it is advantageous that the modified text be associated with the original speeches in a proper unit such as for each speaker.
As described above, each of the conventional technologies described in Non-Patent Documents 1 and 2 and the like has the function to manage the text prepared through the speech recognition and the original speeches in association with each other. However, this function has a principal object to reproduce the speeches and allow them to be listened to again for the purpose of studying and verifying the contents. In this function, it is not considered to associate the modified text with the original speeches.
Moreover, the speeches in a court and a conference have features as described below.                Questions and answers occupy major part of a dialogue, and a questioner and an answerer do not change their places sequentially.        One speaker makes a speech at a time except for sudden speeches such as heckling and hooting, and therefore the speeches are seldom mutually overlapped.        The order of questioners is determined, and it is rare to question at a time a plurality of persons who will answer. Therefore, in many cases, answers regarding the same topic are dispersed in various portions in speech data.        
However, in the aforementioned conventional technologies, techniques of managing and outputting data have not been suitable for such special circumstances as described above because the conventional technologies have been generally made applicable to various speech circumstances.