Virtual conferencing in the form of video conferencing has become widely available in the past decade. Video conferencing provides a convenient way for participants to “meet” without traveling to be physically together. In addition to saving time and cost associated with traveling, video conferencing is environmentally friendly, as it should help avoid unnecessary driving and flying. In spite of the above advantages, video conferencing is under-utilized today and people still travel distances for face-to-face meetings. This is because many people find video conferencing to be a poor substitute for face-to-face meetings.
One of the reasons video conferencing is unsatisfactory is the loss of eye contact and gaze information. Studies have shown that spatial distortions of eye contact have a negative impact on effective communication in video conference. Conference participants like knowing who is focusing on whom and if anyone is focusing on them, and lack of these information makes video conferencing impersonal, uncomfortable, and ineffective for many people. Moreover, absence of eye gaze information can even lead to miscommunication. For example, in a video conference with multiple people, it is sometimes difficult to tell exactly whom the speaker is talking to. When the speaker asks, “Could you handle that?” at the end of a long job description, multiple people could assume that they are each being asked to handle the job. The possibility of this type of miscommunication leads people to avoid handling important communication via a video conference, forcing them to travel.
Ideally, a video conference system should allow participants to interact with one another, select whom or what they want to focus on, and know who is interacting with whom. However, most existing video conferencing systems do not offer such features. Instead, the existing video conferencing systems typically deliver videos the same way to each participant, usually at the maximum allowable resolution and frame rate. In particular, the existing systems do not allow participants to customize their interactions with other participants, or view the interactions between other participants. As a result, interaction among the participants is limited in existing video conferencing systems.
Although some existing video conferencing systems can deliver videos of participants based on the participants' activity level (e.g., detecting a certain voice level and subsequently delivering video of that speaker to the participants), nevertheless it is the video conferencing systems, rather than the participants, that determine the source of the videos and how those videos are delivered. Furthermore, confusion can arise when several participants speak at the same time, because the video conferencing systems may not be able to differentiate to which individuals the various communications are directed. This makes it difficult for participants to determine who is talking to whom (or who is focusing on whom), or what another participant is focusing on. For example, when a first participant says “hello,” the same “hello” video will be delivered to the terminals of the other participants and displayed the same way on their screens. None of the other participants can be sure who the first participant is actually speaking to. This confusion makes video conference less natural because participants often need to guess the direction of communications, which limits the level of interaction among the participants during the video conference.
As such, there is a need for a virtual conferencing system that is capable of conveying accurate gaze information to the participants.