Bundle adjustment (BA) is an essential part of Structure from Motion (SfM) and Multi-View-Stereo (MVS) 3D reconstruction. In aerial photogrammetry and computer vision, it is essential to have the camera poses refined, in order to perform any further processing of the imagery data. BA is the most popular solution and a gold standard [1], [2] to obtain precise camera poses. It receives initial estimates of camera poses and minimizes the errors based on some cost functions [3]. Despite many reports presented in this old area of research, BA is still a bottleneck in related applications.
Mostly, initial camera poses (inputs to BA) are obtained through applying a RANSAC-based model estimation algorithm (e.g. Five-Point algorithm [4]-[6]). However, nowadays in aerial imagery systems, these parameters are often available and known as a priori which can be directly measured with on-board sensors (GPS and IMU). Nevertheless, these parameters are too noisy [7] and must be refined before being used in the downstream processing stages (e.g. 3D reconstruction)
Wherever the term ‘BA pipeline’ is used herein it refers to an end-to-end BA system (or SfM) whose inputs are raw images and outputs are refined camera poses and 3D point cloud. Likewise, when the term ‘BA’ is used it will refer to just the optimization stage where initial camera poses and point cloud are already available.
In the computer vision community, camera parameters are known as intrinsic and extrinsic. In photogrammetry, the same parameters are known as interior and exterior parameters. Having precise values of these parameters are very crucial for relevant applications (e.g. 3D reconstruction). BA is considered as the gold standard for refinement [1], [2], [11] of camera parameters. It is a classical and well-studied problem in computer vision and photogrammetry dating back more than three decades [3], [11]. A comprehensive introduction to BA can be found in [3] which covers a wide spectrum of topics involved in BA. Due to recent interest in large scale 3D reconstruction from consumer photos as well as aerial imagery there have been renewed interests in making BA robust, stable and accurate [5], [12]-[15]. Recently, several BA methods have been proposed, such as Sparse BA [16], [17], incremental BA [8] and Parallel BA [18], [19]. Several methods of BA have been compared in [13] while proposing some new methods which lead to better BA in terms of computation and convergence.
There have been many reports presenting the use of GPS and IMU measurements for refining camera parameters. However, to the best of knowledge, so far such measurements have been mostly used as complementary values and just together with other pose estimation methods through essential matrix estimation (in computer vision) [8], [9] or resectioning in photogrammetry. E.g., in [8], [9], [20], [21], available GPS and IMU measurement are fused with SfM approach using Extended Kalman Filter (EKF) or/and as extra constraints in BA. A SfM method, called Mavmap, is proposed in [10] which leverages the temporal consistency of aerial images and availability of metadata to speed up the performance and robustness. In [10], VisualSFM [18] has been considered as the most advanced and publicly available system for automated and efficient 3D reconstruction from images. However, as stated in [10], it has no integration of IMU priors.