With the development of techniques and the increase of objective requirements, the screen area for video display, such as the projector or video wall employed in a multimedia communication conference system, is becoming larger and larger. As a result, the images of the participants of the conference can move within a larger region on the screen. However, the orientation information of the sound played in the existing multimedia communication system does not correspond to the displayed image(s). Here, the orientation information of the sound refers to the information about the direction, along which the sound is sent, that is sensed by a listener, i.e. the location information of the acoustic source. Accordingly, when the position of the image of a speaking participant changes on the screen, the direction of the sound of the speaking participant does not change according to the change of position of the image of the speaking participant. In this way, the position of the speaking participant in the picture does not match the direction of the sound. In other words, the sound of the speaking participant heard by the listener is not propagated from the position of the image of the speaking participant on the screen. This results in a lack of sense of reality in the multimedia (including audio and video) communication.
U.S. Patent Publication No. 2003/0048353 discloses a method for solving the above problem. In the solution, a bar is disposed atop a television. The bar includes multiple microphones, multiple speakers and a video camera therein. An audio signal and the orientation information of a speaking participant with respect to the bar, i.e. the location information of the acoustic source, may be obtained after the sound signals collected by the microphones are processed. The transmitting end of the video communication system transmits the obtained audio signal and the location information of the acoustic source to the receiving end via network(s). The receiving end selects one or more speakers according to the received location information of the acoustic source. In this way, the location information of the speaking participant may be reproduced at the receiving end.
In the above existing solution, the location information of the acoustic source collected by the transmitting end is about the location with respect to the bar. This may result in a problem as follows: when the lens in the video camera is initially positioned right ahead of the bar so that the speaking participant who is right ahead of the bar is in the center of the picture, the collected orientation information of the sound of the speaking participant is also from right ahead of the bar. When the lens in the video camera rotates by an angle from its initial position, the speaking participant right ahead of the bar deviates from the center of the picture, even out of the picture. At this time, however, the collected orientation information of the sound of the speaking participant is still from right ahead of the bar. This may result in the mismatch between the collected orientation information of the sound of the speaking participant and the position of the speaking participant in the picture, i.e. the collected location information of the acoustic source does not match the position of the acoustic source in the picture. This may deteriorate the presence sensation in the multimedia communication.