The present invention relates generally to video conferencing and more specifically to a system and associated methodology for presenting several participants located at different endpoints on a single monitor using a dynamic layered multi-site video conferencing system.
In a conventional video conferencing system, participants are represented in a video stream displayed in its entirety on a video endpoint. When more than two sites are communicating with each other, the sites are either shown one after another, such as in voice switching, or in matrix form spread over on one or multiple monitors. However, participants in such video conferences frequently experience issues that prevent video conferencing from becoming a standard form of communication.
For example, each participant is displayed in a separate ‘window’ rather than displayed as being in a same room. Participants are also scaled differently so that participants sharing a single site, such as when several participants are located in a single meeting room, are displayed in a smaller scale than participants that do not share a common site, such as those joining from a personal endpoint. Additionally, the appearances of the participants are confined within the borders of their respective video streams displayed on the monitor, making all participants appear smaller than the monitor screen would potentially allow, and the above-described problems are exacerbated as more sites are presented on the display. This also makes it harder to identify a participant that is currently speaking.
The experienced quality of a video conference is defined by the degree of “natural communication” or tele-presence. This includes optimal eye contact, sensation of being in the same room, life size representation of participants, and being focused on the same discussion. Eye contact, for example, plays a large role in conversational turn-taking, perceived attention and intent, and other aspects of group communication. However, video conferencing systems may provide an incorrect impression that the remote interlocutor is avoiding eye contact.
Further, when more than one site (endpoint) is introduced on a monitor at the same time, the immersive sensation of the video is diminished since the participants displayed on the monitor are often represented differently. Hence, a traditional multisite video conference has poor quality in terms of experienced natural communication or tele-presence.