A video conference system includes an endpoint that captures video of participants seated in a room, for example, and then transmits the video to a conference server or to another endpoint. The video conference endpoint may detect participant faces in the captured video to compose periodically updated camera framing, i.e., to frame the detected faces in the captured video.
Participants tend to move during a long teleconference session. For example, a participant may turn away from a camera that captures the video for a few seconds while remaining seated, leave the room, or move to another seat. In each case, the endpoint may be unable to detect the face that was originally detected prior to the movement, and assume that the lost detection means the participant has left the room.