Simultaneous localization and mapping (SLAM) is used in augmented reality systems and robot navigation to build a target from an environment or scene. Visual SLAM (VSLAM) uses camera or visual sensor data or images as input to build a target or model of the environment. When VSLAM used in conjunction with an Augmented Reality (AR) system, virtual objects can be inserted into a user's view of the real world and displayed on a device (e.g., a mobile device, cell phone or similar).
One common pre-requisite for VSLAM to track or determine camera position and orientation (pose) is to use a known reference. For example, a known or previously acquired reference can be a 3-Dimensional (3D) model of the environment or artificial marker inserted into the real world. Traditional VSLAM may also require the first reference image to be a precise frontal view of a planar surface in the environment before initialization and tracking. Otherwise, without a known reference or precisely captured initial image, objects can appear at the wrong location or mapping of the environment may fail altogether.
A tracking system utilizing VSLAM with a single camera may also rely upon initializing a 3D target from two separate reference images captured by the single camera. Creating a 3D target using traditional techniques based on the two reference images is only possible if the camera motion between the two reference images is appropriate, and also maintains enough overlap between the scenes in both images. Reference images may be determined as appropriate when there is sufficient minimum translation between two specifically defined reference images.
Traditional VSLAM implementations may also rely on direct user input to select the two reference images or to provide an additional visual target in order to record 6 Degrees of Freedom (6 DoF) camera motion before a 3D target can be initialized. For example, some tracking methods require the user to perform a specific unintuitive motion sequence without visual feedback so that 3D reconstruction methods can be used to find a real plane in the environment and initialize the 3D target from this plane.
As a result of the above limitations of traditional VSLAM methods, the current augmented reality user's experience can often be frustrating and feel unnatural. Moreover, most users are unlikely to know or understand the camera motions necessary for traditional VSLAM initialization. Typical users are also frequently confused as to why they should have to perform the specific motions before an augmented reality system can display tracking updates for a scene.
Accordingly, improved VSLAM initialization and tracking are desirable.