Typically, a camera in a videoconference captures a view that fits all the participants. Unfortunately, far-end participants may lose much of the value in the video because the size of the near-end participants displayed at the far-end may be too small. In some cases, the far-end participants cannot see the facial expressions of the near-end participants and may have difficulty determining who is actually speaking. These problems give the videoconference an awkward feel and make it hard for the participants to have a productive meeting.
To deal with poor framing, participants have to intervene and perform a series of operations to pan, tilt, and zoom the camera to capture a better view. As expected, manually directing the camera with a remote control can be cumbersome. Sometime, participants just do not bother adjusting the camera's view and simply use the default wide shot. Of course, when a participant does manually frame the camera's view, the procedure has to be repeated if participants change positions during the videoconference or use a different seating arrangement in a subsequent videoconference.
Voice-tracking cameras having microphone arrays can help direct cameras during a videoconference toward participants who are speaking. Although these types of cameras are very useful, they can encounter some problems. When a speaker turns away from the microphones, for example, the voice-tracking camera may lose track of the speaker. In a very reverberant environment, the voice-tracking camera may direct at a reflection point rather than at an actual sound source. Typical reflections can be produced when the speaker turns away from the camera or when the speaker sits at an end of a table. If the reflections are troublesome enough, the voice-tracking camera may be guided to point to a wall, a table, or other surface instead of the actual speaker.
One solution, as disclosed in U.S. Pat. No. 8,248,448, which is hereby incorporated by reference, was to use two different cameras, one used for a wide shot and one used for speaker shots. The speaker view was aimed based on voice-tracking, while the wide shot remained fixed. The wide shot was used when transitioning the speaker view camera between speakers. When the speaker view camera had relocated to the new speaker, the speaker view camera image was used. This wide view/speaker view arrangement allowed for changing speakers being viewed without disturbing motions, but it did require the use of two cameras.
For these reasons, it is desirable during a videoconference to be able to tailor the view of participants dynamically based on the meeting environment, arrangement of participants, and the persons who are actually speaking. The subject matter of the present disclosure is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.