In general, a Vision-aided Inertial Navigation System (VINS) fuses data from a camera and an Inertial Measurement Unit (IMU) to track the six-degrees-of-freedom (d.o.f) position and orientation (pose) of a sensing platform through an environment. In this way, the VINS combines complementary sensing capabilities. For example, an IMU can accurately track dynamic motions over short time durations, while visual data can be used to estimate the pose displacement (up to scale) between consecutive views. For several reasons, VINS has gained popularity to address GPS-denied navigation. During the past decade, VINS have been successfully applied to robots, spacecraft, automotive, and personal localization (e.g., by use of smartphones or laptops), demonstrating real-time performance.
Creating an accurate 3D map within a GPS denied area is required in many applications, such as human (or robot) indoor navigation and localization, augmented reality, and search and rescue. However, creating a complex map with a single mobile device, such as a mobile phone, tablet or wearable computer, presents certain challenges. For example, the device used for recording data may not have sufficient resources (e.g., storage space or battery) to collect data covering a large area. Additionally, it may not be convenient for a single user to navigate the whole building at once. Furthermore, anytime a portion of the map changes (e.g., due to lighting conditions and building reformations), or is deemed of insufficient quality or accuracy, the mapping process must be repeated.