The field of the invention is transcribing speech data using speech-to-text conversion techniques. More particularly, the invention relates to transcribing speech data from multiple files into a single document using speech-to-text conversion techniques.
Transcription is an old art that, up until the relatively recent past, has been performed by manually typing a recorded message into an electronic or physical document. More recently, speech-to-text conversion techniques have been employed to automatically convert recorded speech into text.
A difficulty arises with manual or automatic transcription techniques when multiple speakers are recorded onto a single recording (e.g., as in a recorded meeting or court proceeding). In most cases, it is desirable to identify which of the multiple speakers uttered the various phrases being transcribed. This is particularly true in court proceedings, for example, where an attorney may utter some phrases, a witness may utter others, and a judge may utter still others.
In order to automatically associate an individual with a phrase, it would be necessary to couple speaker recognition technology with the speech-to-text conversion software. Typically, however, speaker recognition technology requires the speaker recognition software to be trained by each of the speakers. Training is not always feasible, and the necessity for training would limit the usefulness of the transcription system.
Therefore, other methods of separating each speaker""s uttered phrases are desirable. In some prior art techniques, each speaker is provided with a separate microphone, and the signals are combined into a single recording. A transcriber would then listen to the recordings and attempt to type the speakers"" statements in sequential order. However, this solution is non-optimal, because it requires the transcriber to differentiate between multiple speakers whose voices may not be distinctive, or who may be talking over each other at the same time. In addition, the solution has not been successfully integrated with automated techniques of speech-to-text conversion and speaker recognition. Thus, the solution is inefficient because it relies on the use of a human transcriber.
What is needed is a method and apparatus for transcribing recordings of multiple simultaneous speakers. What is further needed is a method and apparatus for transcribing such recordings in an automated manner which takes into account the issues of recording synchronization and speaker identification.
The present invention includes an apparatus and method for transcribing speech originating from multiple speakers.
A general object of the invention is to automatically transcribe speech from multiple speakers in a manner that each speaker is identified in the transcription, but without the use of speaker recognition technology.
Another object of the invention is to automatically and accurately transcribe speech from multiple speakers who are talking simultaneously, while identifying the speakers in the transcription.
The method for transcribing the speech accesses multiple files of digitized speech data, which represent multiple speech recordings that were recorded within a recording session. The multiple files are then transcribed by applying a speech-to-text conversion technique to phrases within the multiple files, resulting in textual representations of the phrases. The textual representations are stored in a sequential order, resulting in a single sequence of textual representations of the digitized phrases from the multiple files.
Audio representations of the digitized phrases can be output to a speaker, and offset times for each of the files can be adjusted, where the offset times indicate time differences between the beginning of the recording session and the beginnings of the files.
The method can be executed by a machine that executes a plurality of code sections of a computer program that is stored on a machine readable storage.
The method is carried out by a transcription apparatus which includes at least a processor and a memory device. The processor accesses the multiple files, transcribes the phrases, and stores textual representations of the phrases in a sequential order in a combined file. The memory device stores the textual representations.