Video teleconferencing systems are used to create virtual meetings between two or more people, or two or more groups of people, located in separate locations.
Determining the direction between a detection point and an acoustic source is useful in video conferencing. Such determining may, for example, be used in automatic camera pointing in a video conferencing system. A typical situation at a video conference end-point (the terminal equipment at a video conferencing site) is a meeting room with a number of conference participants sitting in front of or around a table watching the display device of the end-point, while a camera positioned near the display device is capturing a view of the meeting room. If there are many participants in the room, it may be difficult for those who are watching the view of the meeting room at a far end side to determine the speaker or to follow a discussion between several speakers. Thus, it would be preferable to localize the active speaker in the room, and automatically point and direct the camera onto that participant. In addition, the camera may be zoomed in order to obtain an appropriate view of the speaker.
One example of audio source localization in the background art is shown in U.S. Pat. No. 5,778,082, the entire contents of which are incorporated by reference. U.S. Pat. No. 5,778,082 describes, i.e., a method and a system using a pair of two spatially separated microphones to obtain the direction of an audio source. By detecting the beginning of the respective signals of the microphones representing the sound of the audio source, the time delay between the received audio signals may be determined, and the direction to the audio source may then be calculated.
This principle has been illustrated in FIG. 1. Two microphones A and B arranged at a distance D receive an acoustic signal from the acoustic source C. The angle θ of incidence represents the direction between a detection point (the mid-point between microphones A and B) and the source C. A time delay τ represents the difference between the time of arrival of the acoustic signal at microphones A and B. This time delay is calculated as the maximum point of the cross correlation of the signals provided by microphones A and B, respectively. Herein, “maximum point” refers to the argument, i.e. the time, corresponding to the maximum value of the cross correlation of the microphone signals.
The angle θ is then calculated as
  θ  =      arcsin    ⁡          [                        c          ×          τ                D            ]      wherein c is the sound velocity, τ is the calculated time delay, and D is the distance between the microphones.
The above-mentioned background art approach has certain drawbacks. In particular, noise generated by the microphones themselves has proven to have adverse effect on the resulting angle determination. Hence, it has been necessary to use expensive, high quality microphones in order to obtain a sufficiently accurate and reliable determination of the direction between the detecting point and the active speaker.