Conventional video conferencing techniques typically employ a camera mounted at one location and directed at a user. The camera acquires an image of the user and background that is then rendered on the video display of another user. The rendered image typically depicts the user, miscellaneous objects, and background that are within the field-of-view of the acquiring camera. For example, the camera may be mounted on the top edge of a video display within a conference room with the user positioned to view the video display. The camera field-of-view may encompass the user and, in addition, a conference table, chairs, and artwork on the wall behind the user, i.e., anything else within the field-of-view. In this typical technique, the image of the entire field-of-view is transmitted to the video display of a second user. Thus, much of the video display of the second user is filled with irrelevant, distracting, unappealing, or otherwise undesired information. Such information may diminish the efficiency, efficacy, or simply the esthetic of the video conference. Additionally, typical video conferencing techniques do not incorporate the user with virtual content being presented. And the traditional capture of the user and surrounding environment would be unnatural when juxtaposed against virtual content within a composite video. Such a display would be a significant departure from the familiar experience of a face-to-face interaction with a presenter discussing content on a whiteboard or projected on a screen. Also, typical techniques require that the user manipulate content using the keyboard.