Conferences and meetings commonly converge in places such as conference rooms, convention centers and private offices, where one or more people may confer with others and may give presentations to other occupants in the room. Video cameras are commonly used during meetings in order to record the meetings for future use or to transmit through networked communications for virtual meetings. Typically, in such video conferences, it is desired to frame the room occupants or meeting participants in order to get a clear, focused and high resolution view of the people in the room in the video frame.
Often times, framing the room occupants using a pan, tilt, and zoom camera (PTZ camera) can be a time consuming and tedious process, as the camera needs to be manually adjusted by a user to locate and zoom on occupants. As a result, users often do not adjust the camera and the video camera typically captures an entire view of the room, including empty surrounding environment, rather than an optimal view where the participants are the focus. Some automatic framing systems may enable a camera to automatically adjust the camera settings to frame the participants as the focus of the video frame using face detection methods. However, face detection methods may be inconsistent, as they break down in low light scenarios the system has trouble locating facial features, and when only partial or occluded views of participants' faces are available during a meeting because the system has trouble locating facial features.