This invention relates generally to the correspondence of features, such as pixels, between a pair of images, such as from a video sequence, and more particularly to such correspondence using an image pyramid.
An increasingly common computer application is computer vision. In one type of computer vision application, the three-dimensional (3D) structure, as well as other information, is desired to be obtained from a video sequence of images, such as can be obtained from a hand-held camcorder, for instance. This information can then be used, for example, for building a three-dimensional (3D) graphical model from the image sequence. This information can also be used to compress the video sequence of images.
One of the bottlenecks in such computer vision applications remains matching different images of a single scene, which is also referred to as feature correspondence between images. For example, in FIG. 2, between consecutive images 200 and 202 of a video sequence, a bird 204 has moved from the left to the right. Feature correspondence, as represented by the arrow 206 in FIG. 2, refers to the tracking generally of the bird 204 as it has moved between the images 200 and 202, and more particularly to the identification of corresponding pixels, or matching key pixels, of the images 200 and 202. That is, feature correspondence particularly refers to identification of the location of key pixels within the image 200, and their correspondence to pixels within the image 202, upon movement of the bird 204.
A pair of images from a single scene can generally be obtained in one of two manners. First, a single camera can be used to obtain sequential (from a time perspective) images of a scene, such as a hand-held camcorder recording a video sequence. This is shown in FIG. 4, where a single camera 402 is recording the scene 400 over time. Thus, different images over time are obtained by the camera 402. Second, two cameras can be used to obtain images from different perspectives of a scene. This is shown in FIG. 5, where a first camera 500 and a second camera 502 obtain images, such as snap-shop images, of the scene 400, from different perspectives. The single camera case of FIG. 4 thus represents a two-view motion geometry, while the two camera case of FIG. 5 represents a two-view stereo geometry.
In general, to obtain the feature correspondence between two images, such as two consecutive images obtained from a single camera, or two images obtained from two different cameras at different perspectives, what is known in the art as a fundamental matrix must be obtained. The fundamental matrix encodes the only geometric constraint that can be estimated between the two images, which is known in the art as the epipolar (geometry) constraint. In particular, the epipolar geometry represents the fact that a given point in a given scene and the two optical centers of two given images of the scene lie on the same (epipolar) plane. This means that a given a point in one of the images, its corresponding point in the other image must lie on a known (epipolar) line.
The epipolar geometrical constraint between two images is shown in FIG. 6. For a first image 600 and a second image 602, there is a first optical center 604 and a second optical center 606, respectively, such that a baseline 608 connects the optical centers 604 and 606. For a three-dimensional point 610, the first image 600 has a corresponding point 612, while the second image 602 has a corresponding point 614. The plane defined by the optical centers 604 and 606, and the three-dimensional point 610 is known as the epipolar plane. Furthermore, for the first image 600, the line passing through the point 612 and the point 616 of intersection of the baseline 608 with the image 600 is known as an epipolar line; the point 616 itself is known as an epipole. Likewise, for the second image 602, the line passing through the point 614 and the point 618 of intersection of the baseline 608 with the image 600 is also known as an epipolar line; the point 618 is also itself known as an epipole. Thus, the epipolar geometrical constraint between two images arises because, for image points 612 and 614 of the images 600 and 602, respectively, that correspond to the same three-dimensional point 610, the points 612 and 614, the three-dimensional point 610, and the optical centers 604 and 606 are coplanar.
As has been mentioned, the fundamental matrix is used to represent the epipolar geometrical constraint between two images, for feature correspondence therebetween. In particular, where {(xxcex1, yxcex1)} and {(xxe2x80x2xcex1, yxe2x80x2xcex1)}, xcex1=1, . . . , N, are image coordinates in pixels of two sets of N points of two different imagesxe2x80x94i.e., the image-coordinate system can be defined arbitrarily for each cameraxe2x80x94hen two vectors can be defined,             x      α        =          (                                                                  x                α                            ⁢                              /                            ⁢                              f                0                                                                                                        y                α                            ⁢                              /                            ⁢                              f                0                                                                          1                              )        ,            x      α      xe2x80x2        =          (                                                                  x                α                xe2x80x2                            ⁢                              /                            ⁢                              f                0                                                                                                        y                α                xe2x80x2                            ⁢                              /                            ⁢                              f                0                                                                          1                              )        ,
where f0 is a scale factor in pixels so that xxcex1/f0, yxcex1/f0, xxe2x80x2xcex1/f0, yxe2x80x2xcex1/f0 each have an order
1. The two sets of points are said to satisfy the epipolar constraint if there exists a matrix F of determinant 0 such that
xe2x80x83(xxcex1,Fxxcex1xe2x80x2)=0, xcex11, . . . , N,
where the notation (a, b) specifies the inner product of vectors a and b. The matrix F is the fundamental matrix. The equation above is the necessary condition that the given corresponding points are images of the same points in the scene viewed from two cameras, or one moving camera.
Determining the fundamental matrix, therefore, is typically necessary in order to determine the feature correspondence between two images. Determining the fundamental matrix, however, can be a very difficult, as it often it must be estimated from image correspondences (a chicken and egg problem) which can only be determined when the fundamental matrix is known. The prior art particularly has difficulty with feature correspondence and determining the fundamental matrixxe2x80x94in two situations:
First, when the objects within an image have moved as many pixels compared to its location in the other image, or its image positions are otherwise far apart between the two images. This is referred to as the wide baseline problem. An example of this is shown in FIG. 2, where the bird 204 has moved not incrementally from the image 200 to the image 202, but rather has moved significantly from the left side as shown in the image 200, to the right side as shown in the image 202.
The second situation with which the prior art has particular difficulty with feature correspondence is when the objects within an image undergo large rotations (or other large image deformations such as shear caused by turning the camera or divergence caused by the camera zooming) as compared to another image. This is referred to as the image deformation problem. An example of this is shown in FIG. 3, where the bird 204 has been rotated ninety degrees counter-clockwise from its original orientation in the image 300, to its ultimate orientation in the image 302. Feature correspondence in such an example, as represented by the arrow 306, is generally difficult to accomplish within the prior art. For these and other reasons, therefore, there is a need for the present invention.
The present invention relates to feature correspondence between images using an image pyramid. In one embodiment, a computer-implemented method is proposed for generating a fundamental matrix between a first and a second image. The method first generates an image pyramid that has a predetermined number of fineness levels, ordered in fineness from a coarsest to a finest level. Each of the images has significant features at each level of the pyramid. The method next generates a plurality of hypotheses, also referred to as particles, for the fundamental matrix at the coarsest level of the pyramid, based on the significant features of the images at the coarsest level that match one another.
In an iterative procession through the levels of the image pyramid, starting at a present (also referred to as the current) level initially set to the coarsest level and then subsequently advanced by one fineness level upon each iteration, the method first formulates an importance sampling function from the hypotheses. The method then generates the plurality of hypotheses at the next level of the image pyramid based on the function, and on the significant features of the images at this next level. The iteration is complete when the next level has reached the finest level of the image pyramid. The hypotheses generated at the finest level represent the probability distribution of likely fundamental matrices between the images, together with the most likely fundamental matrix.
Embodiments of the invention provide for advantages not found within the prior art. In particular, it has been demonstrated that obtaining the fundamental matrix in accordance with embodiments of the invention provides for feature correspondence in wide baseline and image deformation situations, either better as compared to the prior art, or where the prior art could not provide for feature correspondence. Thus, embodiments of the invention provide for feature correspondence in more situations as compared to the prior art. Embodiments of the invention can also include construction of a three-dimensional model, compression, or other processing as a result of the fundamental matrix generated.