Recent advances in speech-to-speech translation and automatic multi-media transcription have led to functioning and deployed speech translation and transcription systems of lectures and multi-media presentations. Such support of presentations can be deployed via a client-server architecture or by way of local system installations. The resulting systems provide automatic transcription and translation of lectures, speeches, and multi-media presentations either in real-time, as simultaneous interpretations systems, or as a post-hoc processing step after a presentation has been recorded and archived. They permit an audience to search, retrieve, read, translate, and generally better discover lecture, multi-media, or spoken material that formerly was not accessible because of its spoken form. The output is presented to an audience via various devices acoustically or textually, and it is presented either locally or via the internet to a browser on a listener's personal device or PC.
As listeners follow a lecture or multi-media presentation in another language that they do not understand, other additional forms of support become desirable. For example, in addition to understanding a lecture or multi-media presentation, a user also wishes to understand the visual presentation materials of the presenter as well, and relate what the presenter is saying to the visual presentation materials.