Computer vision is a field that includes methods and systems for acquiring, analyzing, processing, and understanding images (e.g., real world image captures) to provide an event or result. For example, one computer vision technique is Simultaneous Localization and Mapping (SLAM), which can process the input of a single camera and continuously build up a three dimensional (3D) model (e.g., reconstructed map) of an environment as the camera moves in Six Degrees of Freedom (6DOF). SLAM systems can simultaneously track the pose of the camera with respect to the 3D model while mapping the 3D model. Keyframe-based visual SLAM systems can process discretely selected frames from the incoming camera image stream or feed. Keyframe-based visual SLAM systems assume general camera motion and apply structure-from-motion techniques to create 3D feature maps.
Modern keyframe-based computer vision (e.g., SLAM) systems subdivide work into parallel tracking and mapping (PTAM) threads. Both the tracking and mapping threads may be processed in parallel, but asynchronously. The tracking thread may perform at a full frame rate, while mapping is typically more computationally intensive and thus slower. Scaling computer vision to large areas and letting multiple clients/users or robots participate in the processing of computer vision work creates the need for stitching two or more separate map pieces/sections together. In general, stitching refers to the discovery of overlapping portions from two or more maps and determining the corresponding 7DOF similarity transform (composed of a 3DOF orientation, a 3DOF position, and a 1D scale). If one of the maps covers a much larger area than the other, this is sometimes called “place recognition.” After successful stitching or place recognition, map fusion may be performed. Map fusion, or simply “fusion” typically describes the processing of data or information from separate maps to combine into a single map. For example fusion may be performed with a form of Structure from Motion (SfM) technique applied to the image information from the separate source maps.
With respect to some types of computer vision techniques, the four tasks of tracking, mapping, stitching, and fusion may have increasing computational requirements as additional data or information is processed. To support many maps, a single and independent user/client may be unable to process all data associated with tracking, mapping, stitching, and fusion. However, offloading mapping to a server may cause clients to become reliant upon the server for content. Clients may rely upon the connection to the server to generate real-time local map dependent content. For example, such content may be used in Augmented Reality (AR) applications. Additionally, maps on servers are typically not scalable or well organized. Therefore, improved techniques are desirable.