1. Field of the Invention
This invention pertains to technologies employed in translation of multi-track, multi-speaker audio conversations in real-time, and non-real-time, for applications in live conferences, movies, television broadcasts, multimedia presentations, streaming audio, streaming video, and the like.
2. Background of the Invention
There are many scenarios in which multiple speakers may speak simultaneously, and in which one or more of the speaker's audio must be translated into one or more secondary languages. These scenarios may be divided into two main categories: (a) real-time or live translation, and (b) post or non-real-time translation.
Real-time translations are required during live broadcasts or live meetings, such as a United Nations plenary session. During these sessions, speakers of hundreds of languages may be present, such that when one speaker is talking in a primary language, live translators interpret the first speaker's phrases, and provide translated audio to speakers of other languages (e.g. listeners), in real-time, as speeches are given.
Non-realtime or post translations are translations which may be made after the fact, or after the complete delivery of a speech. Movie audio tracks, and audio tracks of previously-recorded streaming video, are two such scenarios, in which there may be less of a demand for speed of translation, but more demand for synchronization of the translated audio to other events, such as scenes in video. However, in some other post-processing scenarios, there may be a near realtime demand, such as the translation of podcasts following a live online event, or following the uploading of a podcast in a primary language.