1. Field of the Invention
The present invention relates to video teleconference technology. In particular, the present invention relates to voice-activated tracking by a camera of a speaking participant of a video teleconference.
2. Discussion of the Related Art
One feature desired in a video teleconference equipment is the ability to automatically steer the camera to a participant when he or she speaks. Clearly, before the camera can be steered, it is necessary to locate the speaking participant (xe2x80x9cspeakerxe2x80x9d) based on detection of his or her voice, and rejecting noise resulting, for example, from multiple paths and interference from other noises in the environment.
Speaker location is typically achieved by processing the sound received at a large number of microphones, such as disclosed in U.S. Pat. No. 5,737,431. One conventional method is based on estimations of xe2x80x9ctime delays of arrivalxe2x80x9d (TDOA) of the same sound at the microphones, modeling the sound source as a point source with circular wavefronts. A second method is based upon a TDOA estimation at each pair of microphones, modeling the sound source as a far field source with planar wavefronts. In that second method, each TDOA estimate provides the direction of sound with respect to a pair of microphones, such as described in U.S. Pat. No. 5,778,082. Typically, regardless of the method used, to accurately determined the location of the speaker, a large number of microphones have to be employed to allow an optimization step (e.g., a least-square optimization) to estimate the location of the speaker. Under the prior art methods, four microphones are insufficient to reliably estimate the speaker location.
Once the position of the speaker is determined, a camera is steered towards the location. Unfortunately, because of noise and the acoustics of the environment, the position determined can vary constantly, which can result in undesirable camera movements. One solution, which is described in copending patent application, entitled xe2x80x9cVoice-activated Camera Preset Solution and Method of Operationxe2x80x9d, by Joon Maeng Ser. No. 08/647,225, filed on May 9, 1996, zooms out to cover a larger area when the speaker position is found to alternate between two adjacent regions. In addition, reflections from the ceiling, floor, the walls, and table-tops also create false source locations. Camera shots of table tops or the floor resulting from false source locations can be annoying.
The present invention provides a method for avoiding invalid positioning of a camera in a video conference. The method of the present invention includes: (a) establishing a boundary outside of which the camera is prohibited from being focused; (b) receiving a new position for focusing the camera, the new position corresponding to a position of an active speaker; (c) determining if said new position is outside of the boundary; and (d) directing the camera to the new position, when the new position is within the boundary, and directing the camera to an adjusted position within the boundary, when the new position is outside of the boundary.
In one embodiment, the boundary includes a maximum vertical extent and a minimum vertical extent corresponding respectively to expected maximum and minimum heights of a speaker. In that embodiment, when the new position is above the maximum vertical extent, the adjusted position is at or below the maximum vertical extent. Similarly, when the new position is below the vertical extent, the adjusted position is at or above said minimum vertical extent. Further, a second boundary outside of said first boundary can be established. When the new position is outside the second boundary, the new position is disregarded, and the camera is not redirected.
In one embodiment, the present invention is applied to a video conference system. The video conferencing system includes: (a) a number of microphones and a camera positioned in a predetermined configuration, each microphone providing an audio signal representative of sound received at the microphone; (b) a position determination module which provides, based on the audio signals of the microphones and the predetermined configuration, a new position of said sound source; and (c) a camera control module directing the camera towards the sound source using the new position of the sound source. In that system, when the new position corresponds to a position outside a predetermined boundary, the camera control module directs the camera to an adjusted position within said boundary.