A conference call typically includes video equipment to capture an image of various participants in a room, and audio equipment to record speech for the participants. During the conference call, it may be desirable to focus a video camera on a given participant. For example, active speaker detection (ASD) techniques may be used to focus the video camera on an active speaker. This may be accomplished by identifying a source for human speech within the image, and automatically moving or focusing the video camera on the identified source. In some cases, however, there are additional objects within the room which may potentially interfere with ASD operations. This may result in reduced accuracy in the identification of a given speaker, and the subsequent focus of the video camera. Consequently, there may be a substantial need for improvements in ASD techniques to solve these and other problems.