1. Field of the Invention
The invention relates to speech transcription, and more particular to improving speech transcription through a human-machine collaboration system.
2. Description of Related Art
Quickly and reliably producing written speech transcripts is an essential component of many enterprises, from language acquisition research to new tools for spoken note taking. Transcription tools have been in use for decades, but unfortunately their development has not kept pace with the progress in recording and storage systems. It is easier and cheaper than ever to collect a massive multimedia corpus, but as the size of the dataset grows so does the challenge of producing high quality, comprehensive annotations. Speech transcripts, among other annotations, are critical for navigating and searching many multimedia datasets.
Speech transcription technologies can be divided into two basic categories: entirely manual and entirely automatic. Manual speech transcription relies on a human to listen to audio and produce a written transcript. Entirely automatic methods replace the human with software that can process an audio stream into a textual output. Automatic speech recognition technologies are finding their way into some everyday applications, such as telephone-based menu systems, but accuracy and robustness are still limitations. For example, a telephone based system for checking airline flight status can be structured to limit the range of spoken responses by a user, and can be successful while only recognizing a few words. This is a very different challenge from recognizing spontaneous speech in a natural, person-person dialog. In addition, contextual knowledge and other cues that enable a person to resolve ambiguities and accurately transcribe speech are often missing from current speech recognition systems. If accurate transcription of natural speech is the goal, entirely manual systems hold many advantages, but their disadvantage is the high price in terms of human labor. Entirely manual transcription is surprisingly time consuming, and it is not uncommon for the human effort to take an order of magnitude longer than the actual audio duration. Especially for today's massive corpora, an improved methodology is needed. The invention addresses this need.