When meetings are held among people who are not in the same location, having a fluid back-and-forth conversation is usually difficult because there are added problems such as latency, sound location, and poor fidelity. Moreover, certain forms of communication, such as subtle visual or aural cues, may be missed during a videoconference or teleconference because they occur out of the range of a camera or a microphone. For example, a participant may make a quiet sigh or tap his fingers, signaling impatience or boredom, and those clues may not be captured or conveyed during the conference to other participants.
In teleconferencing, some aural cues can break into an ongoing flow of a conversation, and participants may spend two or more seconds of exchanging confusing request and answers, such as “What? . . . ” “No . . . ” “Did someone ask something?” “No, you go ahead . . . ” Such exchanges can be all the more frustrating for conferences having three or more remote locations. Moreover, having one or more participants talking on cellular phones during the conversation can aggravate these problems due to the higher latency caused by the low fidelity audio from cellular phones.
The most commonly accepted solutions to enhance a videoconference use full-duplex audio and good quality video. In full-duplex audio, both sides of a two-way conference can speak at the same time. Although this may make participants aware of an interruption, the audio does not effectively indicate which participant spoke or vocalized. As meetings become three-way or more, full-duplex audio rapidly grows less effective.
With point-to-point sessions, participants can see each other and tell when one is signaling, raising an eyebrow, or opening their mouth. However, in multipoint sessions using composite displays of multiple participants, participants may not be able to easily to tell which participant is doing what. Additionally, in a switched multipoint video session where the display switches between showing different locations of the multipoint session, the switching between views of participants can take considerable time and add confusion. Additionally, built-in delay between switching may be used during the session so that the system does not switch views between locations unless audio of a particular length comes from participants at a location other than the one being currently displayed.
While a good video and audio connection can help, even a good connection does not necessarily cure the problems noted above. Further, the very best video connections (such as immersive telepresence) are usually unavailable to most participants.
The subject matter of the present disclosure is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.