As computing technology becomes increasingly integrated with daily life, there has been an increase in the use of image data from a user's environment by computing applications. For example, cameras may be used to capture images for use in a variety of purposes, such as video conferencing, live presentations, augmented reality, virtual reality, remote assistance, education, and research. In some cases, two cameras are used together to capture left and right images for the purpose of providing “stereo vision”. Stereo vision is particularly useful for determining depth, such as for the purpose of sensing user input (e.g., through gestures), creating depth images, planar projections of three-dimensional point clouds (e.g., for overlaying three-dimensional graphics onto images of a user's environment), color-and-depth images, and the like.
Special stereo or depth cameras are available, but these devices are often costly, and are not frequently integrated with user devices, such as mobile phones. It is becoming more common, however, for user devices to include dual cameras. Smartphones, for example, are increasingly being equipped with dual cameras that face in the same direction. Because these dual cameras are not inherently configured to work together for stereo vision, the image data from each camera must be processed separately. For many applications, such as videoconferencing or augmented reality applications, image data is transmitted over a network for processing, such as by a remote server, in order to provide various functionality. Transmitting image data requires a significant amount of bandwidth, and can often result in dropped frames and/or timing issues. In particular transmitting image data from two separate cameras to a destination (e.g., a virtual desktop hosted in a datacenter) on a network may result in loss of data and synchronization issues, as images from the two cameras are unlikely to reach the destination simultaneously.