Advancements in electrical component design and fabrication have resulted in computing hardware becoming smaller and smaller, allowing them to fit in smaller, more compact form factors. In turn, those smaller computing hardware components have been integrated into mobile computing devices, including smartphones, tablets, wearables, etc. Such components include touchscreen displays, various sensors (e.g., proximity sensors, light sensors, barometers, accelerometers, magnetometers, gyroscopes, etc.), cameras, wireless communication interfaces, etc.
As a result, mobile computing devices have become seemingly ubiquitous. Technologies have emerged to leverage the components of the computing devices. One such technology is computer-mediated reality and augmented reality. Computer-mediated reality manipulates the perception of a user's environment by adding information to or subtracting information from the environment through the use of a mobile computing device. Augmented reality blends a user's environment with digital information (e.g., virtual objects, computer generated content, etc.), generally in real time. In other words, the digital information is embedded, or overlays, the actual environment.
However, placement of such computer generated content into an image of an augmented reality scene, such that it appears to be in direct contact with real physical planar surfaces, is non-trivial due to a lack of context with respect to the size, height, and distance of the computer generated content relative to a view of the camera from which the image was captured. Using data collected from a depth camera (e.g., an RGB-D camera, a ranging camera, a time-of-flight (ToF) camera, etc.), for example, in which RGB information is accompanied with per-pixel depth information, determining the intersection between computer generated content and physical objects can be rather trivial, since the depth and surface structure of physical surroundings relative to the camera is explicitly known. Within standard RGB imagery, however, there is no inherent ground-truth information to determine the position and orientation of computer generated content relative to either the imaging camera or physical surfaces within the image itself.