Videoconferencing has become widespread and many offices have rooms especially configured for videoconferencing sessions. Such rooms typically contain video conferencing gear, such as one or more moveable cameras and one or more microphones, the microphones typically being placed at locations around a table in the room for participants. Active Speaker Detection (ASD) is frequently used to select a camera, or to move (pan and/or tilt) a camera to show the person in the room who is speaking and/or to select the microphone which will be active. When a remote person is speaking, their image and/or sound come out of an audio-video display, such as a television (TV), monitor, or other type of display, in the room. This may cause the ASD to erroneously select the image on the remote person on the TV who is talking rather than to select the last local person who is or was talking.
Also, in multiple-location videoconferencing sessions, where three or more separate locations are in a single videoconferencing session, then, typically, several panels will be displayed, one panel being larger than the others and showing the person who is speaking, and the other panels showing a picture from a camera at the other locations. When erroneous ASD occurs, as mentioned above, the equipment in the room where a person is speaking will send a signal to the equipment at the other locations advising that the person at its location is speaking and so the main display should be from its camera. When this happens, the larger panel may switch from showing a person who is actually speaking to showing a picture of a TV screen or an empty chair. Thus, a problem with ASD is that if the sound from the remote videoconferencing system is reflected or is so loud that it triggers ASD then the remote sound may be retransmitted back to the remote system and/or cause the local camera to focus on an empty chair or the display screen showing the remote videoconferencing location.
One technique that has been used to eliminate such erroneous ASD selection is to spot the image scan line tracing on the TV to determine that the sound is coming from a TV rather than a local person. High Definition TVs (HDTVs), however, have high (240 Hz or better) progressive scan rates and image resolutions that are the equal of the cameras so image scan line tracing is of limited use when HDTV is involved. Additionally, ASD can often have trouble with sound echoing around a room. A sound reflective surface, such as window or a glass-covered picture, may reflect sound from the TV in a manner that the sound appears to originate from a local person at the table, even if there is not actually a person sitting at that position at the table. Further, if a recording is made of the videoconference, it is dependent upon a human to remember to accurately label the recording with at least, for example, the date of the videoconference. This is often forgotten and done later, sometimes with an erroneous or incomplete label. It is with respect to these considerations and others that the disclosure made herein is presented.