A video conference system includes an endpoint that captures video of participants in a room during a conference, for example, and then transmits the video to a conference server or to a “far-end” endpoint. During the conference, the participants may wish to show specific objects of interest to participants at the far-end, such as when one of the participants moves to a whiteboard to explain and/or draw illustrations on the whiteboard. A camera in the endpoint may frame the entire room, thus rendering the object of interest, e.g. the whiteboard, too small to be read. Alternatively, the camera may point away from the object of interest and, therefore, fail to capture it. As a result, the participants may be forced to manually control pan, tilt, and/or zoom settings of the camera so that the camera points to and captures the object of interest in sufficient viewable detail; however, such manipulation of the camera is time consuming, cumbersome, and intrusive to the participants.