The proliferation of networked computing devices equipped with cameras has enabled users to frequently participate in real-time video-enabled teleconferencing sessions, such as conventional videoconferencing sessions (VCSs). Some devices (e.g., conventional mobile devices, desktops, and laptops) include a front-facing camera (FFC) (i.e., a “selfie” camera) that is configured to, as a user views a display of the device, capture image data of the user. Other conventional computing devices may employ a standalone auxiliary camera, such as a web-enabled camera (i.e., a webcam), that is communicatively coupled to the computing device and faces the user. Such conventional FFCs and webcams enable users participating in a VCS to view one another, while remotely communicating in real-time. For example, users may each employ a large camera-equipped display screen to remotely conduct a business meeting. The display employed in such a conventional arrangement may be an interactive screen enabled to simultaneously display shared multi-media content and video of the participants. The screen may take up a significant portion of an office wall and be sized to provide video that at least approximates the real-world physical dimensions of the users and objects in their environment.
Such conventional VCSs have proven very useful in facilitating business meetings across distances, as well as personal communications between peoples. However, such conventional VCSs are limited in providing the realistic experience of face-to-face communications between the users. That is, a VCS employing conventional cameras and displays may fail to provide the users with a natural telepresence experience that at least simulates a realistic face-to-face interaction between the users.
For example, although both the camera and display may be facing in the same direction, regions near the display may be outside of the camera's field of view (FOV). As such, when a user is outside their camera's FOV (e.g. the user is too close to the display), the camera will not capture image data depicting the user. This is often experienced when a user employs their hand and/or fingers to point to content rendered on the display, walks too close to the display, and the like. For these and other reasons, conventional VCSs often lack a realistic and natural telepresence experience for the users.