Augmented reality is when virtual objects are rendered into an image or video of a real scene, “augmenting” it with additional information such as travel directions, game characters, advertising, etc. Typically, a user views the scene, either through a head-mounted display, or through a rendered video stream captured by a camera (e.g., on a phone), and the virtual objects are placed into the scene. Techniques such as computer vision are used to estimate the position and orientation of the viewer with respect to the scene, so that virtual objects are rendered appropriately.
For example, a user may use a camera on their phone to view a video stream of a street scene that is in front of the user. The street scene may be augmented to identify one or more landmarks in the scene or to include reviews or comments on restaurants or other items of interest in the video stream. These comments or reviews may be rendered in the video stream as virtual objects using one or more icons and may appear to be part of the street scene.
One problem with such an approach is how to deal with occlusions caused by the user “interacting” with one or more of the virtual objects. For example, if the user puts their hand or other occluder in front of the camera, one or more virtual objects that are behind the user's hand should no longer be visible. While real objects in the street scene will be naturally obscured from view by the occluder, the same is not true for the virtual objects. Consequently, the illusion that the virtual objects are a part of the scene will be broken, leading to an unsatisfactory user experience.