1. Technical Field
The invention is related to identifying corresponding points among multiple images of a scene, and more particularly, to a system and process that quickly extracts features and finds correspondences across a large number of partially overlapping images of the scene.
2. Background Art
Finding corresponding features in images, which is commonly referred to as image matching, is an essential component of almost any vision application that tries to extract information from more than one image. Early work in image matching fell into two camps—feature based methods and direct methods. Feature based methods attempted to extract salient features such as edges and corners, and then to use a small amount of local information (e.g. correlation of a small image patch) to establish matches [8]. In contrast to the feature based methods, which used only a small amount of the available image data, direct methods attempted to use all of the pixel values in order to iteratively align images [1, 9]. Other approaches to matching and recognition have used invariants to characterize objects, sometimes establishing canonical frames for this purpose [14, 15].
At the intersection of these approaches are invariant features, which use large amounts of local image data around salient features to form invariant descriptors for indexing and matching. The first work in this area was by Schmid and Mohr [16] who used a jet of Gaussian derivatives to form a rotationally invariant descriptor around a Harris corner. Lowe extended this approach to incorporate scale invariance [10, 11]. Other researchers have developed features which are invariant under affine transformations [3, 20, 5]. Interest point detectors vary from standard feature detectors such as Harris corners or Difference-of-Gaussians (DOG) maxima to more elaborate methods such as maximally stable regions [12] and stable local phase structures [7].
Generally, interest point extraction and descriptor matching are considered as two basic steps, and there has been some progress in evaluating the various techniques with respect to interest point repeatability [17] and descriptor performance [13]. There have also been compelling applications to multi-view matching in the context of structure from motion [19] and panoramic imaging [6]. However, to date, none of these procedures provides the capability to quickly extract features and find correspondences across a large number of partially overlapping images of the scene.
It is noted that in the preceding paragraphs, as well as in the remainder of this specification, the description refers to various individual publications identified by a numeric designator contained within a pair of brackets. For example, such a reference may be identified by reciting, “reference [1]” or simply “[1]”. Multiple references will be identified by a pair of brackets containing more than one designator, for example, [2, 3]. A listing of references including the publications corresponding to each designator can be found at the end of the Detailed Description section.