Media presentations typically comprise data used to render one or more components of the presentation, such as the video and audio components of a multimedia clip. The presentation may be associated with additional data commonly referred to as metadata. Metadata may describe the data component(s) comprising the presentation and/or may provide information about the contents of the presentation.
For example, metadata may comprise text representing a transcript of an audio component of a presentation. The transcript may include text describing the contents of the audio component, such as words, music, and/or other sounds. When rendering a presentation, the metadata may be used in addition to the data comprising the presentation. Returning to the example of transcript metadata, the transcript may be used to render subtitles or closed-caption text as part of or alongside the media presentation; of course, the transcript could be provided separately for later review.
Media presentations can be created and/or edited using various types of authoring tools. One example of an authoring tool is ADOBE® PREMIERE PRO®, available from Adobe Systems Inc. of San Jose, Calif. Using an authoring tool, video, audio, and/or other components from different sources can be combined into a composite presentation. If desired, effects can be added. As one example, a group of video clips with associated audio can be combined into a composite presentation such as a newscast that features transitions between scenes, voice-overs, credits, and other effects that tie the components into a unified media product.
Multiple video or audio components are often stacked in layers on different tracks so that the components overlap the same time locations in the program to create composited effects, cutaways from the main subject, etc. These tracks may contain multiple audio clips that are mixed at different volumes so the audience (ideally) hears the intended primary audio track. For example, a documentary on education may contain a segment with a video clip of a classroom lecture with a professor talking to students while a narrator describes the class via a different audio track. At the same time, a music clip may also be playing that contains lyrics from a song. This segment is mixed so the narration can be clearly heard over the other audio elements playing at the same time.
Each of the separate audio clips may contain or may be associated with a text transcript of the spoken words recorded in that clip. However, if tracks including multiple audio components are included in the composite presentation at the same time, an automatically-generated transcript from all the tracks may be confusing or unusable due to the intermingled text.