Modern video cameras and communication systems allow for real-time video to be transferred to remote locations over communication networks. For instance, videoconferences can be held with video cameras at endpoints capturing video of participants and transferring that video to other endpoints. This allows the participants to have a substantially face-to-face conversation. More sophisticated endpoint cameras may be capable panning, tilting, or zooming to capture different areas of a location in which a camera is positioned. This is especially beneficial when more than just the face of a single participant can be captured at an endpoint.
For example, a location may include multiple participants and presentation materials, such as whiteboard drawings, physical demonstration models, or other types of objects that can be captured by a camera. While a camera may be positioned such that all of these items are captured in a single view, a particular item(s) of interest in the view may not be captured and displayed in a manner best suited for viewing. That is, the item of interest may be relatively small within the view due to the angle and zoom level needed to capture all items. While manual pan, tilt, and zoom controls may be provided to a user, directed the camera to better view the item can be a tedious process.