This invention relates in general to the field of video conferencing, and more particularly an automatic voice tracking camera system and method of operation.
In conventional video conferencing systems, infrared technology has been employed to track the position of a speaker in the video conference. This conventional method uses an IR transmitter and three IR receivers to triangulate the position of the IR transmitter which is carried by the speaker. This type of system may not work well in a conference room environment where a number of persons may talk at any given time.
A second conventional method for tracking a speaker is the use of touch-to-talk microphones. The position of each microphone is preset in order to direct a camera when a speaker touches a microphone to talk. The positions of the microphones are preloaded in the system so that the system knows where each speaker is to be located. This may be undesirable because it requires fixed positions of speakers, limits the movement of speakers, and is not easily portable.
Microphone array technology is being introduced in the video conferencing field in order to improve the reception of a sound and to allow location of the position of the source of the sound. This microphone array technology can be used in both conference room and classroom environments. The position information from such a microphone array is problematic if used to direct a camera because the position information changes continuously due to the movement of speakers and due to errors in locating the position of the speakers.
It is desirable in a video conferencing environment to provide automatic voice tracking of a speaker in order to control cameras such that there is natural camera movement in viewing a given speaker.
In accordance with the present invention, an automatic voice tracking camera system and method of operation are provided that substantially eliminate or reduce disadvantages and problems associated with previously developed video conferencing systems.
According to one embodiment of the present invention, an automatic voice tracking camera system is provided. The system includes a camera operable to receive control signals for controlling a view of the camera. A microphone array includes a plurality of microphones. The microphone array is operable to receive a voice of a speaker and to provide an audio signal representing the voice. A beamformer couples to the microphone array and is operable to receive the audio signal, to generate from the audio signal speaker position data representing a position of the speaker, and to provide the speaker position data. A camera controller couples to the beamformer and to the camera. The camera controller is operable to receive the speaker position data and to determine an appropriate responsive camera movement. The camera controller is further operable to generate camera control signals and to provide the camera control signals to the camera such that the view of the camera automatically tracks the position of the speaker.
According to another embodiment, the present invention provides a method for automatically controlling a camera to track a position of a speaker using the speaker""s voice. The method includes the step receiving the speaker""s voice and generating an audio signal representing the speaker""s voice. A next step is to process the audio signal to generate speaker position data representing a position of the speaker. Then, the method includes the step of determining an appropriately responsive camera movement from the speaker position data. The method then generates and provides camera control signals to a camera such that a view of the camera automatically tracks the position of the speaker.
A technical advantage of the present invention is the automation of tracking a speaker in a video conference such that the camera views the speaker using only the voice of the speaker to determine the speaker""s position.
Another technical advantage of the present invention is the use of two cameras whereby a non-active camera can be used to find and view a new speaker prior to switching between the two cameras. In this manner, a switch to a new speaker does not include a scan between the two speakers.
A further technical advantage of the present invention is the movement of a camera to a new view only if the speaker""s position moves outside of a defined window. Thus, a minor position change is not translated into movement of the camera.
An additional technical advantage of the present invention is the use of a second defined window to determine whether a current camera or other camera should be used to view the speaker when the speaker""s position moves outside of the first defined window.
Another technical advantage of the present invention is the filtering of speaker position information to delay movement of the camera until a new position is verified. In this manner, insignificant noises that might otherwise result in a camera movement are filtered.
A further technical advantage of the present invention is the use of zoning of a conference room. A conference room is divided into a number of zones each associated with one camera. Each camera is then controlled to view speakers within its associated zone.