Telepresence enjoys great popularity among high-end users due to a true sense of on-scene presence thereof. Auditory positioning and life size imaging are key technical indicators in evaluating a telepresence system.
To achieve life size imaging and auditory positioning in a telepresence conference system, each display of a terminal meeting room having multiple displays may display an image of conferees in a remote spatial area according to a direction in which an audio is output. Namely, in the meeting room, a display displaying the image of a speaker corresponds to the direction in which the audio of the speaker is output. For example, in a three-display meeting room, if a speaker seated on the left speaks, then sound should be heard from the left. Therefore, A position where a remote image is output has to be determined according to a negotiated strategy, and strict synchronization among multiple displays is required to achieve realistic communication effect.
The inventors find that in a telepresence conference system, due to difficulty in strict synchronization among multiple video streams encoded and transmitted separately, it is hard to meet a strict real-time requirement of a video conference. In addition, there is no solution for determining a position where a video image is output.