Multipoint videoconferencing typically involves a number of conferees or endpoints. An endpoint can provide speech; speech and video; or speech, data, and video. In order to present two or more conferees simultaneously, a multipoint control unit (MCU) that conducts the videoconference composes the video images coming from two or more locations into a single layout that is transferred to the different participants. The MCU receives several media channels from access ports. According to certain criteria, the MCU processes audiovisual and data signals and distributes them to the connected channels. Such a composed layout is also referred to as a continuous presence (CP) layout. Examples of MCUs include the MGC-100, which is available from Polycom Inc. Additional information about the MGC-100 can be found at the website of www.polycom.com, which is incorporated herein by reference. A more thorough definition of an endpoint (terminal) and an MCU can be found in the International Telecommunication Union (“ITU”) standards such as, but not limited to; the H.320, H.324, and H.323 standards, which are incorporated herein by reference. (The ITU is the United Nations Specialized Agency in the field of telecommunications. Additional information regarding the ITU can be found at the website address of www.itu.int, which is incorporated herein by reference).
Usually the location of the participants in a CP display changes dynamically during a conference, depending on the dynamics of the conference. FIG. 1 illustrates different snapshots of 2×2 layouts during different periods of the conference. A 2×2 layout is a layout in which up to 4 participants out of the total number of current participants are displayed. The number of the current participants can be four or more but at any given moment a maximum four conferees can be displayed. Which conferees are displayed at a given time depends on selection criteria that can be defined when reserving the conference or establishing the conference. For example, one criteria may be that the current loudest four conferees are displayed.
Mixed audio transmitted in a 2×2 layout can include the mixed audio of the 4 displayed participants. Since the four louder conferees can vary, the location on the display dynamically changes due to the dynamics of the conference.
For example, layout 100 is a snapshot during the time in which conferees A, B, C, and D are the loudest conferees and are therefore displayed on the display. Layout 110 is a snapshot of another period in the same conference in which conferee E is louder than B and therefore conferee B is removed from the layout and conferee E replaces her/him. Layout 110 includes conferees A, E, C, and D. Layout 120 is a snapshot in which conferee B is louder than C and therefore conferee C is removed from the layout and conferee B replaces her/him. The above three layouts demonstrate the dynamics of a conference.
In common multipoint conferencing systems the mixed audio is mono and cannot deliver any impression on the location of the image of its source on the screen. However, in order to improve the user experience it is desirable to be able to associate the direction from where a participant's voice can be heard with the location of the participant on the display.
There are few prior art references teaching methods and/or system to create synthetic stereo audio that related to virtual location. For example, U.S. Pat. No. 6,408,327, the entire contents of which are incorporated herein by reference, discloses a method and system for facilitating synthetic stereo audio conferencing of a plurality of users over a local or wide area network. However, the prior art does not provide a video conferencing system in which a mixed stereophonic audio is a function of the location of the talkers over the current conference layout. Therefore, there is a need for improving the experience of multimedia multipoint user by associating a conferee's voice with the location of the conferee on the display.