Telepresence refers to an advanced remote videoconferencing system, and enjoys great popularity among high-end users due to a true sense of on-scene presence thereof. In a telepresence system, auditory positioning, life size imaging, and eye contact directly concern whether the users can have an immersive sensation, and therefore are key technical indicators in evaluating the telepresence system.
In a traditional videoconferencing system, in addition to providing a stream auxiliary to a video, a videoconferencing terminal generally serves to encode and send an audio stream and/or an video stream; and to receive, decode, and output an audio stream and/or an video stream. Since there are only one stream of sound input source and one stream of sound output, a user can not sense from which direction of a meeting room the sound comes. Since there are only one stream of video input source and one stream of video output, the whole meeting room has to be included in a collecting and encoding frame at a local end; for a multipoint conference, one can select to watch only the picture of one meeting room or a Mosaic picture of multiple remote meeting rooms; neither a sent video nor a received video can meet the requirement for life-size display.
In contrast, user experience required by the telepresence system is to have multiple audio and video streams, provide information on the direction from which each audio stream comes to achieve auditory positioning, and display a life-size image of a remote conferee based on a projected requirement, in which case one meeting room generally needs to be provided with multiple video inputs and multiple video outputs. At present, some telepresence terminals are obtained by integrating traditional videoconferencing terminals; specifically, multiple videoconferencing terminals are deployed in a single meeting room and each videoconferencing terminal may be connected to audio-video input/outputting devices respectively; and then substantial auditory positioning and life-size display are achieved through audio-video-input/output-device deploying and assembling techniques. However, with such integration of multiple videoconferencing terminals (wherein generally, when multiple videoconferencing terminals are deployed in a single meeting room, each videoconferencing terminal needs to be called respectively), it is difficult to call a single conference ID number, to implement stream synchronization or the like; what is more, integration of multiple terminals complicates system deployment, which then requires professional personnel of integration and deployment; any minor problem appearing in use requires on-site maintenance by professional personnel, thus posing a major obstacle to promotion of such a high-end application like the telepresence. Moreover, not all videoconferencing terminals are used fully employing functions thereof in the integrated system, leading to resource waste to some extent. In addition, such a complicated and non-standardized solution of integration makes it very difficult for telepresence systems deployed by different manufacturers to intercommunicate with each other.