Most existing structure from motion (SFM) approaches for reconstructing three-dimensional (3D) scene geometry from unordered images generally have serious difficulty handling multiple instances of the same structure in a scene. One problem that occurs with duplicate structure is that large self-consistent sets of geometrically valid pairwise (or triplet-wise) image matches between instances can be in fact incorrect. Previous work towards addressing this issue has primarily used geometric reasoning about the consistency of relative camera pose estimates. Such previous methods work better when there is relatively less ambiguity in pairwise matches or work in datasets where the incorrect matches are random and not self-consistent.
One family of geometric reasoning approaches is based on reasoning about large-scale structure instead of just pairwise matches, hoping that errors in data association causing conflicting measurements at a global scale can be discovered. For example, in FIG. 1A-1D, by looking only at small neighborhoods of matches, it is unclear whether the matches across the two oat boxes are correct, or if the conflicting matches between the oat boxes and the square box are correct. Indeed, the former outnumber the latter. By looking at the measurements in the entirety, at least the conflict between these two sets of edges can be detected. When image pairs that contain different instances of a duplicate structure are matched based on visual similarity, the resulting pairwise geometric relations as well as the correspondences inferred from such pairs can be erroneous, which can lead to catastrophic failures in the reconstruction.