It is sometimes desired to allow a person to combine virtual reality with a real world surface. For example, a user may point the camera of a computer or smart-phone or tablet at a wall and the wall will appear on the display of the camera device. In addition, some virtual object will appear in the image as well, and appear as if it is part of the real world environment, such as, for example, a basketball hoop appearing to be affixed to the wall. This is referred to as Augmented Reality.
Augmented Reality is a process that combines Computer Vision and Computer Graphics to present augmented information on view of the real world. It creates the illusion that the added information resides in the real world. This information can be helpful for applications such as navigation, education, entertainments and games.
To achieve accurate registration of information, Computer Vision methods are used to register regions of interest between camera frames. Based on those registered regions the desired information is registered. The Computer Vision method will usually try to estimate the camera pose in real time.
When triggering an application on a certain surface in order to see augmented information on that surface it is possible to calculate the relative camera pose between the pose of the camera when the application was triggered to its poses in consecutive camera frames. When the information presented is 3D content it is extremely important to register the camera frames correctly (which are 2D by nature). Small errors in the 2D registration will be reflected in large misalignments of the 3D content. Augmented Reality applications are real time applications by their nature. In real time applications computation speed and efficient algorithms are extremely important.
Augmented Reality applications and applications for entertainment and games are usually targeting a mass audience. In those cases it cannot be assumed the user is trained by any mean to use the application. It follows the registration should work on many different surfaces and in many different realistic scenarios.
The registration process used for Augmented Reality on planar surfaces is known as planar tracking or homography tracking. In the past, planar tracking or homography tracking has been done in contexts such as aligning different patches taken from space satellites. In Augmented Reality the goal in many cases is displaying 3D content registered in real time to a real world surface or environment. One prior art approach tries to identify strong local features in the image (such as corners) and track those local features as the camera is moving to register the image. With a sizable amount of local features on the real world surface, it is possible to track the plane reliably and in real time. The local features approach can only work on surfaces that are well textured which limits the usability of the application.
Another approach (sometime called the direct approach) tries to use all the pixels in the image and match between frames. The methods using the direct approach tend to be computationally intensive and are typically unable to deal with significant illumination changes. In addition, the approach has been limited in the number of degrees of freedom (DOF) that are available.
Six degrees of freedom registration means the relation between the camera and the planar surface on which information is being augmented is practically the full range of motions one can expect and in particular: moving the camera up and down, left and right, forward and backward and tilting it both in rotation and skewed angles with respect to the surface being imaged. The same applies the other way around meaning moving the surface with respect to the camera. 2DOF registration accommodates only for a limited set of motions and in particular up and down and left and right. Different degrees of freedom can be defined in between these two but only 6DOF supports the full set of motions that can be done in reality.
Only a few efforts have been done to register content to a plane in 6DOF using the direct approach and most of the existing work registers with less than 6DOF. The existing 6DOF methods are usually sensitive to local minima which means they fail for no visible reasons as the camera moves around. Existing 6DOF methods are usually theoretical and have not matured to robust products that are stable and can support a system for the mass market. Existing 6DOF methods have not applied a gradually growing complexity model (from 2DOF to 6DOF) so given a surface they will either work or not depending on the appearance of the plane limiting the amount of surfaces around us that can be used to trigger the augmented experience.