Knowing image relations between two images of a same scene or object is important in many applications, including but not limited to computer vision, image rectification, video compression, virtual reality, augmented reality, and 3D-TV content generation. An image relation is understood as a description or representation of position changes, from either a temporal or viewpoint perspective, of object points in the image planes. In applications image relation is conveniently represented by an image relation model, also referred to as an (image) motion model, examples of which include fundamental matrix, homography matrix, essential matrix, and affine transform.
For example, in a computer vision application the task may be to generate a three-dimensional (3D) representation of an object or scene from a video sequence of 2D images, such as can be obtained from a hand-held camera or camcorder, for instance, wherein different 2D images are taken from different camera positions relative to the same object or scene. If the suitable-form image relations between different video frames are obtained, this information can then be used, for example, for building a 3D graphical model from the image sequence, or to compress the video sequence of images. Furthermore, estimating the motion of an object in an image sequence is useful in background/foreground segmentation and video compression, as well as many other applications.
In recent years, advanced computer vision systems have become available also in hand-held devices. Modern hand-held devices are provided with high resolution sensors, making it possible to take pictures of objects with enough accuracy to process the images with satisfying results.
One known method for determining the image relations between two images is based on image point correspondences acquired from the two images. Image feature points can be identified either manually, or automatically by a feature detector, as described for example in C. Harris and M. Stephens, “A combined corner and edge detector,” In Proc. 4th Alvey Vision Conference, pages 147-151, 1988. Correspondences between feature points of two images can then be established either manually or automatically, for example by identifying similarities in the textures surrounding the feature points in the two images, such as by using a cross-correlation coefficient as a criterion. One example of a method for the determination of image relations between two images is disclosed in U.S. Pat. No. 7,359,526, which describes determining camera pose parameters from point correspondences, and which is incorporated herein by reference. U.S. Pat. No. 6,741,757, which is also incorporated herein by reference, discloses another exemplary method wherein correspondence between respective feature points of two images are established using an image pyramid.
Ideally, a relatively small number of known point correspondences can be used to reconstruct the image relation between two images. For example, an image taken from a hand-held device gives rise to rotations and perspective effects between consecutive images. In order to extract and interpret the desired information about the objects in the images, a projective transformation is needed. Such a projective transformation requires four different point correspondences where no three points are collinear. However, due to the noise introduced by image capturing and the errors originated from feature matching, different point correspondences may have different validity, and some may be mismatched and thus not be valid at all. Accordingly, having a technique for estimating the image relation model that is robust with respect to inaccurate and noisy data is essential to reduce negative effects of mismatched point correspondences, often referred to as outliers. Prior art robust estimation methods can be classified into one of three categories: the M-estimator, case deletion diagnostics, and random sampling consensus paradigm (RANSAC). The M-estimator, as described for example in R. A. Maronna, “Robust M-estimators of multivariate location and scatter,” Ann. Stat. Vol. 4, No. 1, pp. 51-67, 1976, follows maximum-likelihood formulations by deriving optimal weighting for the data under non-Gaussian noise conditions. Outliers have their weights reduced rather than being rejected completely. The estimators minimize the sum of the weighted errors. The case deletion diagnostics method, as described for example in S. Chaterjeem and A. S. Hadi, Sensitivity Analysis in Linear Regression, John Wiley, New York, March 1988, is based on influence measures. Small perturbations are introduced into parameters of the problem formulation and the consequent changes of the outcome of the analysis are assessed. Based on the assessment, the method of the case deletion diagnostics monitors the effect on the analysis of removing the outlier. RANSAC method, described in M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with application to image analysis and automated cartography,” Comm. of the ACM, Vol. 24, pp 381-395, 1981, which is incorporated herein by references, is a hypothesis and verification algorithm. It proceeds by repeatedly generating solutions estimated from minimal sets of correspondences gathered from the data and then testing each solution for support from the complete set of putative point correspondences to determine the consensus for the motion model to be estimated. A comparison study of these three strategies for robust estimation of image relations indicated advantages of the random sampling techniques of RANSAC, see P. H. S. Torr and D. W. Murry, “The development and comparison of robust methods for estimating the fundamental matrix,” Int. J. Computer Vision, Vol. 24, No. 3, pp. 271-300, 1997.
The RANSAC approach employs a hypothesis scoring technique to evaluate each motion model hypothesis that is generated from a minimal set of putative point correspondences. The standard RANSAC algorithm counts the number of inliers for each generated motion model hypothesis by binarizing the errors with a given threshold. The MSAC (M-estimator sample consensus) estimator, described in P. H. S. Torr and A. Zisserman, “Robust computation and parameterization of multiple view relations,” Proc. Int'l Conf. Computer Vision (ICCV '98), pp. 727-732, Bombay, India, Jan. 1998, which is incorporated herein by reference, measures the quality of this hypothesis in such a way that outliers are given a fixed penalty while inliers are scored on how well they fit the data. The MLESAC algorithm, described in P. H. S. Torr and A. Zisserman, “MLESAC: a new robust estimator with application to estimating image geometry,” Computer Vision and Image Understanding, Vol. 78, No. 1, pp. 138-156, 2000, which is incorporated herein by reference, evaluates the likelihood of the model hypothesis instead of heuristic measures. It requires the estimation of a parameter representing the proportion of valid point correspondences and employs the expectation maximization (EM) algorithm. The aforementioned prior art methods assume equal constant validities of point correspondences. The Guided-MLESAC algorithm, disclosed in B. J. Tordoff and D. W. Murry, “Guided-MLESAC: fast image transform estimation by using matching priors,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 27, No. 10, pp. 1523-1535, 2005, which is incorporated herein by reference, extends the MLESAC algorithm by adding prior validity information for each individual point correspondence. The prior validities of point correspondences are however calculated only from the feature matcher and keep constant during estimating the parameters of the motion model. In the absence of meaningful matching scores the performance Guided-MLESAC is no better than that of the MLESAC algorithm.
In addition, several techniques have also been proposed to speed up the verification phase of the standard RANSAC algorithm. For instance, Matas and Chum in O. Chum and J. Matas, “Optimal Randomized RANSAC,” IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 30, No. 8, pp. 1472-1482, August 2008, designed a randomized sequential sampling evaluation to enable the early termination of the hypothesis evaluation of the RANSAC. An article by D. Nister, “Preemptive RANSAC for live structure and motion estimation,” Proc. Int'l Conf. Computer Vision (ICCV '03), vol. 1, pp. 199-206, October 2003, presents a preemptive RANSAC method to efficiently select, with predefined confidence, the best hypothesis from the fixed number of generated hypotheses. Disadvantageously, if a good solution is not among the fixed number of the generated hypotheses that can be evaluated in the time available, the preemptive RANSAC method will fail to reach the correct hypothesis. In addition, the preemptive scoring is not very helpful in improving efficiency of the standard RANSAC when the scoring is computationally cheap compared to the hypothesis generation.
These prior art methods employing the random sampling consensus algorithm have at least the following two shortcomings. First, with respect to the conventional random sampling consensus method, a given sampling is independent from the previous samplings. No information from the previous samplings is analyzed and exploited. However, a single sampling can be viewed as a random event. The probability theory states that if an event is repeated many times the sequences of the random events will exhibit certain statistical patterns, which can be studied and predicted. In other words, the statistical patterns evolved in the previous samplings could be determined and should be further exploited to benefit the analysis of the subsequent sampling. Another shortcoming is that an outlier might not necessarily be an incorrect correspondence, but may simply disagree with the particular model that is to be estimated. The validity values for the point correspondences used in the aforementioned Guided-MLESAC approach are based on matching scores from a feature matcher and do not take into consideration the model hypothesis.
Recently, approaches have been suggested wherein the random sampling is adaptively guided by previous sampling results. For example, a so called hill climbing (HC) algorithm, disclosed in Pylvanainen, T., Fan, L.: “Hill climbing algorithm for random sampling consensus methods,” ISVC 2007, Part I. LNCS, vol. 4841, pp. 672-681, Springer, Heidelberg (2007), which is incorporated herein by reference, attempts to improve upon the RANSAC method by utilizing guided sampling, wherein weights assigned to individual data points to guide the probability of their selection, are updated during the execution of the algorithm based on a currently best sample with the largest number of inliers. In this method, the probability of selecting an inlier to the current model in a next sampling step is increased proportional to the number of inliers for the current model. Since the number of inliers may be large, the HC algorithm may overly emphasize some data points over others based on the results for a single sample, and therefore may get stuck climbing a local maximum missing a true optimal solution.
Accordingly, it is an object of the present invention to improve upon the prior art by providing an efficient method for determining an image relation model for two images that is free from at least some of the disadvantages of the prior art.
It is a further object of the present invention to provide an efficient method for determining an image relation model for two images of a same scene or object by randomly sampling a plurality of point correspondences using a weighted random selection algorithm wherein weight parameters used in selecting individual point correspondences are dynamically updated to assist in the selection of subsequent samplings using information obtained in previous samples.
It is another object of the present invention is to provide a method for assessing the validity of individual point correspondences in the general framework of a RANSAC-like process for determining a motion model for two images.