Participants in face-to-face meetings benefit from paralinguistic cues, such as expression, and gesturing, that facilitate communication between humans. Conventional video conferencing systems provide video images and audio of meeting participants, but attenuate or fail to capture these cues. The quality of the resulting communication invariably suffers.
Some conventional video conferencing systems employ a video “grid” approach as illustrated in FIG. 1. While this approach enables each participant to see each other participant, it also suffers from a number of drawbacks. The video grid is an unnatural visual arrangement for a meeting. The video grid also accentuates the fact that the participants are physically and geographically distributed, rather than attempting to minimize this effect. Furthermore, the video grid does not allow the direction of eye contact and gestures toward another participant, but instead broadcasts such gestures to all participants.
FIG. 2 illustrates an expensive dedicated room-based video conferencing system that attempts to emulate a face-to-face table style meeting. Such room-based systems require similarly-configured rooms at all participating locations including multiple high-end cameras, flat panel displays, and speakers, and require high bandwidth connectivity between the locations.
A recent trend in remote conferencing is to have each conference participant control an animated avatar that represents the participant at a specific location in a 3-dimensional virtual world. While these virtual environments offer several advantages such as freedom from physical constraints, and the perception of meeting in a same location, such virtual environments suffer from a number of disadvantages such as failing to provide a capability for each participant to see each other participant's facial expressions, reactions, gestures, enthusiasm, interest or lack of interest, etc.