Structure from motion (SfM) refers to a process of estimating three-dimensional structures from a plurality of two-dimensional images. Two images, taken from two spatially different positions and/or taken in different rotational angles, might provide different views of a scenery depicting corresponding points in the scenery. By analysing how the corresponding points are related, a three-dimensional structure of objects in the scenery may be formed. The information of the corresponding points also allows relative positions and rotational angles of the acquired images to be determined.
By adding more images depicting further related views, the three-dimensional structure can be created with higher accuracy. This may be done in different ways. Incremental SfM is discussed in e.g. N. Snavely, et al, “Photo tourism: Exploring photo collections in 3D”, ACM Transactions on Graphics (SIGGRAPH Proceedings), 25(3), 2006, 835-846.
Incremental SfM starts by reconstructing a model of the scenery from the view in two images and continue adding images one by one to the reconstruction until no additional images can be added. By carefully ensuring that each image is added without errors, the method can be used to reconstruct large sceneries with thousands of images. The main issues with incremental SfM are drifting of the model as images are added due to outliers and error accumulation. Also, incremental SfM is computationally complex since the complete reconstruction must be adjusted after each addition of a new image.
Use of additional information about the images, such as a GPS position or compass data of a camera that acquired an image, may provide a more accurate model. However, since the images are added one by one, such additional information can only be used for already added images, and therefore does not fully solve the problem of error accumulation and drifting.
An alternative approach to creating three-dimensional structures of objects in a scenery may be referred to as non-incremental SfM, or batch SfM, wherein all available images for creating three-dimensional structures are handled simultaneously, or in batch. Such an approach is described e.g. in M. Havlena, et al, “Randomized structure from motion based on atomic 3D models from camera triplets”, IEEE Conference on Computer Vision and Pattern Recognition, 2009, 2874-2881.
Non-incremental SfM involves computing a three-dimensional reconstruction of the scenery or parts of scenery based on pairs or triplets of images. Then, a complete three-dimensional reconstruction is made, finding the absolute positions of all cameras that captured the images, where the absolute positions are most compatible with the computed three-dimensional reconstruction. Usually, the rotational angles of the cameras are first determined, and then the positions, or translations, of the cameras are determined. Finding the translation of cameras are harder since the relation between cameras in pairs of images does not give any information about the actual distance between cameras. Several methods suggest using point correspondences to fix the scale ambiguity. However, in such case, it is hard to use additional information such as GPS information, since the rotational angles of the cameras are calculated first.