For decades, modeling the world from images has been a major goal of computer vision, enabling a wide range of applications including virtual reality, image-based localization, and autonomous navigation. One of the most diverse data sources for modeling is Internet photo collections, and the computer vision community has made tremendous progress in large-scale structure from motion (LS-SfM) from Internet datasets over the last decade. However, utilizing this wealth of information for LS-SfM remains a challenging problem due to the ever-increasing amount of image data. For example, it is estimated that 10% of all photos have been taken in the last year alone [26]. In a short period of time, research in large-scale modeling has progressed from modeling using several thousand images [21, 22] to modeling from city-scale datasets of several million [7]. Major research challenges that these approaches have focused on are:                Data Robustness: Enable the modeling from unorganized and heterogeneous Internet photo collections.        Compute & Storage Scalability: Achieve efficiency to meet the true scale of Internet photo collections.        Registration Comprehensiveness: Identify as many camera-to-camera associations as possible.        Model Completeness: Build 3D scene models that are as extensive and panoramic as possible.        
In practice, these goals have been prioritized differently by existing LS-SfM frameworks [21, 22, 2, 1, 7]. The approach of Frahm [7] has emphasized scalability to enable modeling from millions of images. While it achieves impressive city-scale models, this emphasis leads to limitations in the model completeness. In contrast, the approach of Agarwal [2, 1] prioritizes model completeness, but can only model from hundreds of thousands of images, instead of millions.