A videoconferencing service is a kind of multimedia communications means, which uses a television and a communications network to hold a conference, and may implement image, voice, and data interaction functions between two places or among multiple places. Generally, the videoconferencing system includes several parts such as a video terminal equipment, a transmission network, and a multipoint control unit (Multipoint Control Unit, MCU for short). The video terminal equipment mainly includes a video input/output device, an audio input/output device, a video codec, an audio codec, an information communications device, and a multiplexing/signal distribution device. Basic functions of the video terminal equipment are to compress and encode image signals shot by a local camera and audio signals picked up by a microphone, then transmit the signals to a transmission network to transmit the signals to a remote conference site, and simultaneously receive digital signals transmitted from the remote conference site, and restore the digital signals to analog image and audio signals by performing decoding.
The videoconferencing service achieves long-distance audio and video communication. With continuous development and progress in technologies, a telepresence system appears, which can enable remote communication to achieve a face-to-face communication effect. The current telepresence system uses videoconferencing technologies to implement remote transmission of images and sounds, combines a whole peripheral, for example, uses a large-sized LCD television to achieve “true-to-life dimensions”, uses certain camera-based processing technologies to achieve people's “eye to eye” communication, and combines a complete decoration solution for a conference room to achieve a highly realistic effect of remote presentation.
The current telepresence system is capable of achieving a comparatively realistic effect. However, regarding newly emerged double-row or multi-row telepresence, there is a certain distance between a front row and a back row. The current telepresence system is capable of only achieving a mapping between an image direction and a sound direction on a same plane. That is, sounds from both the front row and the back row are generated from the same plane. If no image is seen, whether a sound is from the front row or the back row cannot be distinguished. Therefore, the on-site feeling effect of the sound is not realistic.