Virtual reality (VR) systems present to the user computer-generated images that simulate the user's presence in real or imaginary worlds. In fully immersive VR systems, the user's view of their actual surroundings is completely replaced by the simulated surroundings, which may be real, artificial, or both. Another type of VR system combines images of the real world in the vicinity of the user with computer-generated images (CGI) that provide additional information to the user. This type of VR system is herein referred to as an augmented reality (AR) system. Unlike fully immersive VR systems, AR systems allow the user to see at least a portion of their actual surroundings, usually overlaid with CGI. AR systems may be divided into two categories: those in which the user directly sees their actual surroundings, referred to as “see-through” displays, and those where a camera captures images of their actual surroundings and presents the captured image to the user via a display screen, referred to as “opaque” displays.
FIG. 1A shows a conventional VR system which is fully immersive and presents artificial surroundings to the user. In FIG. 1A, a user 100 wears a head-mounted display 102 which presents to user 100 a view of a computer-simulated image, usually rendered by a computer generated image (CGI) unit 104. An external pose tracker system (PT) 106 determines the pose of user 100, e.g., using cameras or lasers to detect special markers 108 worn by user 100 and using the detected markers to derive pose information. As used herein, the term “pose” refers to information about the location of the user's head in three-dimensional space, referred to as the “position”, as well as the direction in which the user's face is pointing, referred to as the “orientation”. Thus, pose includes both position and orientation. A rendering unit (RU) 110 combines scene information from CGI unit 104 with pose information from PT 106 and renders an artificial scene which is displayed by head-mounted display 102. The conventional system shown in FIG. 1A is fully immersive, i.e., it does not include or display local scene information, and thus head-mounted display 102 typically uses opaque display devices.
FIG. 1B shows a conventional VR system which is fully immersive and presents real, remote surroundings to the user. The conventional system in FIG. 1B is almost identical to the one in FIG. 1A, except that in FIG. 1B, scene information is provided by a scene acquisition (SA) device 112, such as a remote camera or endoscopic camera, which captures real images, rather than the artificial or simulated images generated by the system shown in FIG. 1A.
FIG. 1C shows a conventional AR system in which user 100 wears a see-through display 114, which allows user 100 to see the local scene information directly through the transparent lens of the display. Rendering unit 110 generates an augmented reality image which appears to the user to be overlaid with the local scene image. This overlay may be text or simple graphics. In some systems, a scene acquisition unit SA 112 may provide limited scene acquisition capability, such as gesture detection. Because these systems do not have any pose-tracking capability, their usefulness is limited to providing announcements to the user, e.g., to alert the user to an incoming call or text message or provide driving directions, or to allow the user to perform simple tasks, such as viewing email, using gestures instead of a mouse or keyboard.
FIG. 1D show the conventional AR system of FIG. 1C, but with the addition of pose tracking information, which is provided by an external pose tracker 106. The addition of pose information would allow rendering unit 110 to adjust the position of the virtual image based on the user's pose. While this allows for more sophisticated AR effects, e.g., a virtual direction arrow shown in the user's display to indicate to the user the location of a restaurant, subway station, and so on, would rotate out of view when the user turns his or her head. However, conventional technologies still require an external pose tracker, which limits the usefulness of such a system.
FIG. 1E shows a conventional telepresence application being used by two users, user A 100A and user B 100B, who are in separate locations and each wearing a display unit (102A and 102B, respectively.) Each location includes a scene acquisition device (112A and 112B, respectively), a pose tracker (106A and 106B, respectively), and a rendering unit (110A and 110B, respectively.) User A is local to scene A and remote to scene B; user B is local to scene B and remote to scene A. Scene A information, including the image of user A 100A, is sent to user B's headset 102B, which also receives user B 100B pose information from PT 106B to generate an image of user A 100A in local scene B based on the current pose of user B 1008. Likewise, scene B information, including the image of user B 100B, is sent to user A's headset 102A, which also receives user A 100A pose information from PT 106A to generate an image of user B 100B in local scene A based on the current pose of user A 100A.
There are disadvantages to the conventional VR systems shown in FIGS. 1A through 1E. Not all of the systems in FIGS. 1A through 1E have both pose tracking and scene acquisition, and those that do require an external pose tracker 106 that is separate from the head-mounted display.
Accordingly, in light of these disadvantages associated with conventional VR systems, there exists a need for methods, systems, and computer readable media for unified scene acquisition and pose tracking in a wearable display.