While the advent of Head-Mounted Displays (HMDs) and affordable real-time computer graphics engines has given rise to much research in the field of Virtual Reality (VR), comparatively little work has been done in the field of Augmented Reality (AR). A VR system immerses the user in a totally synthetic computer-generated environment. An AR system, on the other hand, merges computer synthesized objects with the user's space in the real world. In an AR system, computer generated graphics enhance the user's interaction with, or perception of, the real world.
For AR systems to become truly beneficial, these systems should provide accurate registration between computer generated graphics and real objects. A virtual object should appear at its proper place in the real world, otherwise the user it is difficult for the user to correctly determine spatial relationships. Furthermore, the registration of the computer generated graphics should be dynamic in that it can account for changes in the real world perspective. Dynamic registration is particularly important when the user moves around in the environment. The relative position between real and computer generated (synthetic) objects should be constant.
An AR system must also provide a reasonable image generation rate (10 Hz) and stereopsis. Both image generation rate and stereosis are important for good depth perception. The lack of kinetic or stereoscopic depth cues greatly reduces the believability of an augmented environment.
An AR system should also be simple to set up and use. Users of AR applications should not have to be familiar with the specific techniques used in AR systems. As many of the applications of augmented reality environments involve tasks which are carried out by users who are typically not versed in the intricacies of computer graphics systems, a simple set up and use are important to the proliferation of AR systems.
The AR system should also put minimal constraints on user motion. In many applications the user wants to move without restriction.
Finally, an AR system should have minimal latency. There should be as little as possible delay between the user's movement and the display update. Reduction in latency between movement and reflection of that movement in the environment is generally required for smooth and effective interaction.
Among the requirements for an effective AR system, the accurate registration of the computer generated graphics can have a significant impact on the perception of the augmented reality. To the best of the inventors' knowledge, typical existing AR systems do not convincingly meet this requirement. Typically, in current AR systems, a virtual object appears to swim about as the user moves, and often does not appear to rest at the same spot when viewed from several different positions. In current AR systems, most of these registration errors are due to the limitations of the tracking systems.
Conventional magnetic trackers may be subject to large amounts of error and jitter. An uncalibrated system can exhibit errors of 10 cm or more, particularly in the presence of magnetic field disturbances such as metal and electric equipment. Carefully calibrating a magnetic system typically does not reduce position errors to much less than about 2 cm. Despite their lack of accuracy, magnetic trackers are popular because they are robust and place minimal constraints on user motion.
Other existing AR systems have used mechanical or optical tracking systems. Both of these systems generally have better accuracy than magnetic trackers, but may be burdensome. Mechanical systems often tether the user and generally have a limited working volume. The optical tracker also generally requires four dedicated tracking cameras mounted on the user's HMD.
Another method of tracking is a vision-based tracking system which uses image recognition to track movement. In a video see-through AR system, video images of the user's view are available. However, recovering 3D information from 2D images is generally difficult. One common problem of utilizing image recognition to track movement and register computer generated graphics in a VR system is that an almost infinite number of possibilities may need to be considered for the images to be interpreted correctly. Model-based vision which assumes a prior knowledge of the 3D geometry of visible objects reduces the problem from shape recovery to mere camera motion tracking, however, even by simplifying the problem this way, model-based vision methods typically still extract object features from images. This generally involves special-purpose image processing hardware to achieve real-time updates. Despite the speed and complexity disadvantages of a vision-based system, nearly perfect registration can be achieved under certain conditions.
One possible problem of vision-based methods is their instability. To save computation cost, vision based systems often make numerous assumptions about the working environment and the user's movements, but those assumptions may be impractical. For example, vision-based systems typically assume temporal coherence of camera movement in order to avoid frequent use of costly search algorithms that establish the correspondence between image features and model features. Thus, vision-based systems may be unable to keep up with quick, abrupt user movements. Furthermore, typical vision based trackers can become unstable from the occlusion of features caused by deformable objects (e.g. hands). If a vision tracker's assumptions fail, the results can be catastrophic. Since image analysis and correspondence finding may be costly and error-prone, and because landmarks can be occluded, obscured, or may disappear from the camera's view at any time, it is generally impractical to attempt to continuously track a large number of features in real time.
In view of the above, there exists a need for improvement in AR systems to allow for highly accurate registration of computer generated graphics while still providing acceptable performance in terms of frame rate, freedom of movement of the user, simplicity of setup and use and acceptable latency between motion and reflection of that motion in the augmented environment.