Modern computing and display technologies have facilitated the development of systems for so-called “virtual reality” or “augmented reality” experiences, wherein digitally reproduced images or portions thereof are presented to a user in a manner where they seem to be, or may be perceived as, real. A virtual reality (VR) scenario typically involves presentation of digital or virtual image information without transparency to other actual real-world visual input, whereas an augmented reality (AR) scenario typically involves presentation of digital or virtual image information as an augmentation to visualization of the actual world around the user.
For example, referring to FIG. 1, an augmented reality scene 4 is depicted wherein a user of an AR technology sees a real-world park-like setting 6 featuring people, trees, buildings in the background, and a concrete platform 8. In addition to these items, the user of the AR technology also perceives that he “sees” a robot statue 10 standing upon the real-world platform 8, and a cartoon-like avatar character 12 flying by which seems to be a personification of a bumble bee, even though these elements 10, 12 do not exist in the real world. As it turns out, the human visual perception system is very complex, and producing a VR or AR technology that facilitates a comfortable, natural-feeling, rich presentation of virtual image elements amongst other virtual or real-world imagery elements is challenging.
VR and AR display systems can benefit from information regarding the head pose of a viewer or user (i.e., the orientation and/or location of user's head).
For instance, head-worn displays (or helmet-mounted displays, or smart glasses) are at least loosely coupled to a user's head, and thus move when the user's head moves. If the user's head motions are detected by the display system, the data being displayed can be updated to take the change in head pose into account.
As an example, if a user wearing a head-worn display views a virtual representation of a three-dimensional (3D) object on the display and walks around the area where the 3D object appears, that 3D object can be re-rendered for each viewpoint, giving the user the perception that he or she is walking around an object that occupies real space. If the head-worn display is used to present multiple objects within a virtual space (for instance, a rich virtual world), measurements of head pose can be used to re-render the scene to match the user's dynamically changing head location and orientation and provide an increased sense of immersion in the virtual space.
Head-worn displays that enable AR (i.e., the concurrent viewing of real and virtual elements) can have several different types of configurations. In one such configuration, often referred to as a “video see-through” display, a camera captures elements of a real scene, a computing system superimposes virtual elements onto the captured real scene, and a non-transparent display presents the composite image to the eyes. Another configuration is often referred to as an “optical see-through” display, in which the user can see through transparent (or semi-transparent) elements in the display system to view directly the light from real objects in the environment. The transparent element, often referred to as a “combiner”, superimposes light from the display over the user's view of the real world.
In both video and optical see-through displays, detection of head pose can enable the display system to render virtual objects such that they appear to occupy a space in the real world. As the user's head moves around in the real world, the virtual objects are re-rendered as a function of head pose, such that the virtual objects appear to remain stable relative to the real world. At least for AR applications, placement of virtual objects in spatial relation to physical objects (e.g., presented to appear spatially proximate a physical object in two- or three-dimensions) may be a non-trivial problem. For example, head movement may significantly complicate placement of virtual objects in a view of an ambient environment. Such is true whether the view is captured as an image of the ambient environment and then projected or displayed to the end user, or whether the end user perceives the view of the ambient environment directly. For instance, head movement will likely cause a field of view of the end user to change, which will likely require an update to where various virtual objects are displayed in the field of the view of the end user. Additionally, head movements may occur within a large variety of ranges and speeds. Head movement speed may vary not only between different head movements, but within or across the range of a single head movement. For instance, head movement speed may initially increase (e.g., linearly or not) from a starting point, and may decrease as an ending point is reached, obtaining a maximum speed somewhere between the starting and ending points of the head movement. Rapid head movements may even exceed the ability of the particular display or projection technology to render images that appear uniform and/or as smooth motion to the end user.
Head tracking accuracy and latency (i.e., the elapsed time between when the user moves his or her head and the time when the image gets updated and displayed to the user) have been problems for VR and AR systems. Especially for display systems that fill a substantial portion of the user's visual field with virtual elements, it is critical that the accuracy of head-tracking is high and that the overall system latency is very low from the first detection of head motion to the updating of the light that is delivered by the display to the user's visual system. If the latency is high, the system can create a mismatch between the user's vestibular and visual sensory systems, and generate motion sickness or simulator sickness. In the case of an optical see-through display, the user's view of the real world has essentially a zero latency while his or her view of the virtual objects has a latency that depends on the head-tracking rate, processing time, rendering time, and display frame rate. If the system latency is high, the apparent location of virtual objects will appear unstable during rapid head motions.
In addition to head-worn display systems, other display systems can benefit from accurate and low latency head pose detection. These include head-tracked display systems in which the display is not worn on the user's body, but is, e.g., mounted on a wall or other surface. The head-tracked display acts like a window onto a scene, and as a user moves his head relative to the “window” the scene is re-rendered to match the user's changing viewpoint. Other systems include a head-worn projection system, in which a head-worn display projects light onto the real world.
Approaches to addressing head tracking accuracy and latency may include increasing the actual frame rate or effective frame rate, for instance view strobing or flashing or via other techniques. Predictive head tracking may be employed to reduce latency. Predictive head tracking may rely on any of a large variety of factors or approaches, including historical data or attributes for a specific end user. Also, blanking of display or presentation may be effectively employed, for instance, blacking during rapid head movements.
Regardless of the type of display system used, the 3D objects are rendered from the current viewpoint or a predicted viewpoint at the time when the renders are displayed. In order to keep latency to a minimum, the rendered images are adjusted at the last moment to “chase the beam” in scanned displays. This is typically accomplished by warping the images; that is, the images are time warped to decrease the latency between the time the user moves his or her head and the time when the image gets updated. For example, assuming that images can only be presented to the user at 60 frames per second (FPS), an image rendering process that does not utilize time warping may determine the position of the user's head immediately after the previous image has been rendered and presented to the user, and may then render and display the next image to the user based on that head position. If the system presents images to the user at 60 frames per second (FPS), each image may take as long as 16.7 ms from the time that the head position is determined to the time that it is presented to the user, which is unacceptable. An image rendering process that utilizes time warping will determine or estimate the head position at the last moment possible before the image is presented to the user by warping an image previously rendered at an actual or estimated head position of the user.
Typically, images are typically warped using parallax. That is, because objects that are closer to the viewer move faster than objects that are further away from the viewer as a point of view changes (i.e., as the user's head moves), the warping process utilizes three-dimensional data to perform a two-dimensional warp on the image. Because an image of a scene rendered at a particular point of view may not contain all of the three-dimensional data of the same scene from a different particular point of view (e.g., one object completely hidden behind another object in the rendered image may be only partially hidden or not hidden at all at the different point of view), the parallax warping process may introduce holes in the resulting image due to the differing displacement of objects of different depths.
There, thus, is a need to reduce the frequency and size of holes in a warped image that has been rendered in a virtual reality or augmented reality environment.