Modern computing and display technologies have facilitated the development of systems for so called “virtual reality” (VR) or “augmented reality” (AR) experiences, wherein digitally reproduced images, or portions thereof, are presented to a user in a manner wherein the images seem to be, or may be perceived as, real. A VR scenario typically involves presentation of digital or virtual image information without transparency to other actual real-world visual input. An AR scenario typically involves presentation of digital or virtual image information as an augmentation to visualization of the actual world around the user.
For example, referring to Figure (FIG.) 1, an AR scene 4 is depicted wherein a user of an AR technology sees a real-world park-like setting 6 featuring people, trees, buildings in the background, and a concrete platform 8. In addition to these items, the user of the AR technology also perceives that they “see” a robot statue 10 standing upon the real-world concrete platform 8, and a cartoon-like avatar character 2 flying by which seems to be a personification of a bumble bee, even though these elements (e.g., the avatar character 2, and the robot statue 10) do not exist in the real-world. Due to the extreme complexity of the human visual perception and nervous system, it is challenging to produce a VR or AR technology that facilitates a comfortable, natural-feeling, rich presentation of virtual image elements amongst other virtual or real-world imagery elements.
One major problem is directed to modifying the virtual image displayed to the user based on user movement. For example, when the user moves their head, their area of vision (e.g., field of view) and the perspective of the objects within the area of vision may change. The overlay content that will be displayed to the user needs to be modified in real time, or close to real time, to account for the user movement to provide a more realistic VR or AR experience.
A refresh rate of the system governs a rate at which the system generates content and displays (or sends for display) the generated content to a user. For example, if the refresh rate of the system is 60 Hertz, the system generates (e.g., renders, modifies, and the like) content and displays the generated content to the user every 16 milliseconds. VR and AR systems may generate content based on a pose of the user. For example, the system may determine a pose of the user, generate content based on the determined pose, and display the generated content to the user all within the 16 millisecond time window. The time between when the system determines the pose of the user and when the system displays the generated content to the user is known as “motion-to-photon latency.” The user may change their pose in the time between when the system determines the pose of the user and when the system displays the generated content. If this change is not accounted for, it may result in an undesired user experience. For example, the system may determine a first pose of the user and begin to generate content based on the first pose. The user may then change their pose to a second pose in the time between when the system determines the first pose and subsequently generates content based on the first pose, and when the system displays the generated content to the user. Since the content is generated based on the first pose and the user now has the second pose, the generated content displayed to the user will appear misplaced with respect to the user because of pose mismatch. The pose mismatch may lead to an undesired user experience.
The systems may apply a correction to account for the user change in the user pose over an entire rendered image frame for example, as a post-processing step operating on a buffered image. While this technique may work for panel displays that display an image frame by flashing/illuminating all pixels (e.g., in 2 ms) when all pixels are rendered, this technique may not work well with scanning displays that display image frames on a pixel-by-pixel basis (e.g., in 16 ms) in a sequential manner. In scanning displays that display image frames on a pixel-by-pixel basis in a sequential manner, a time between a first pixel and a last pixel can be up to a full frame duration (e.g., 16 ms for a 60 Hz display) during which the user pose may change significantly.
Embodiments address these and other problems associated with VR or AR systems implementing conventional time warp.