Before our invention there were many situations when it is necessary to create a transcription of a multimedia file, wherein the transcription of the multimedia file was synchronized with the original multimedia file. This situation was particularly relevant in fields pertaining to the transcribing and/or translation of multimedia video data, the maintaining of media databases, and the preparation of caption data for televised programming.
Presently, transcripts of multimedia data are created using either automatic speech recognition (ASR), and/or automatic translation tools. Unfortunately, initial draft transcriptions that have been generated by ASR often have the need to be edited in order to provide the correct textual representation of an original media data stream file. Typically, as a result of the editing process, the time-alignment between various media streams and the edited transcribed/translated text is destroyed. Therefore, there exists a need to provide a cost-effective, standard user-based methodology for the editing of time aligned transcripts, annotations to the time aligned transcripts and translations of the transcripts.