1. Field of Invention
The present invention generally relates to matching pixel points in multiple images having different view angles, and more specifically relates to edge pixel matching in multiple images taken with catadioptric cameras.
2. Description of Related Art
Edge detection algorithms are part of many image manipulation operations. Edge detection is fundamental to image processing and computer vision, particularly in the areas of feature detection and feature extraction. Edge detection aims to identify points, i.e. pixels that outline objects within an image. There are many edge detection algorithms, but generally they attempt to identify pixels at which discontinuities occurs, i.e. where the image brightness changes sharply. In the ideal case, the result of applying an edge detector to an image leads to a set of connected curves that indicate the boundaries of objects, the boundaries of surface markings, and discontinuities in surface orientation. Once the boundaries have been identified, various image processing operations may be applied to the digital image.
For example FIG. 1A shows a typical digital image, and FIG. 1B shows the results of applying edge detection to the image of FIG. 1A. Edge detection may be designed to identify thick or thin lines, or may be optimized to separately identify thick and thin lines. In the example of FIG. 1B, both thick and thin lines are separately identified, which permits them to be separately processed. This permits the processing of the digital image to be more specialized by adjusting the size of a pixel-processing window according to line thickness. As a result, application of a specific image processing algorithms, such a bilateral filter, may be optimized along the edge of objects according to line thickness to achieve a sharper final image, as shown in FIG. 1C.
Another use of edge detection is feature detection. As an example, if one has a library of identifying features of a specific object, then one may search an input digital image for those identifying features in an effort to determine if an example of the specific object is present in the input digital image. When this is extended to multiple digital images of a common scene taken from different view angles, it is possible to index, i.e. match or correlate, feature points from one image to the other. This permits the combined processing of the multiple digital images.
For example in FIG. 2, images 2, 4, 6 and 8 each provide partial, and overlapping, views of a building in a real-world scene, but none provide a full view of the entire building. However, by applying edge detection and indexing (i.e. identifying matching pairs of) feature points in the four partial images 2, 4, 6 and 8 that correlate to the same real feature point in the real-world scene, it is possible to stitch together the four partial images (i.e. applying an image stitching tool) to create one composite image 10 of the entire building. The four partial images 2-8 of FIG. 2 are taken from the same view angle, but this approach may be extended to the field of correspondence matching, where images of a common scene are taken from different view angles.
In the field of computer vision, correspondence matching (or the correspondence problem) refers to the matching of objects (or object features or feature points) common to two, or more, images. Correspondence matching tries to figure out which parts of a first image correspond to (i.e. are matched to) which parts of a second image, assuming that the second image was taken after the camera had moved, time had elapsed, and/or the pictured objects had moved. For example, the first image may be of a real-world scene taken from a first view angle with a first field of vision, FOV, and the second image may be of the same scene taken from a second view angle with a second FOV. Assuming that the first and second FOVs at least partially overlap, correspondence matching refers to the matching of common features points in the overlapped portions of the first and second images.
Correspondence matching is an essential problem in computer vision, especially in stereo vision, view synthesis, and 3D reconstruction. Assuming that a number of image features, or objects, in two images taken from two view angles have been matched, epipolar geometry may be used to identify the positional relationship between the matched image features to achieve stereo view, synthesis or 3D reconstruction.
Epipolar geometry is basically the geometry of stereo vision. For example in FIG. 3, two cameras 11 and 13 create 2D images 15 and 17, respectively, of a common 3D scene 10 consisting of a larger sphere 19 and a smaller sphere 21. 2D images 15 and 17 are taken from two distinct view angles 23 and 25. Epipolar geometry describes the geometric relations between points in 3D scene 10 (for example spheres 19 and 21) and their relative projections in 2D images 15 and 17. These geometric relationships lead to constraints between the image points, which are the basis for epipolar constraints, or stereo constraints, described more fully below.
FIG. 3 illustrates a horizontal parallax where, from the view point of camera 11, smaller sphere 21 appears to be in front of larger sphere 19 (as shown in 2D image 15), but from the view point of camera 13, smaller sphere 21 appears to be some distance to the side of larger sphere 19 (as shown in 2D image 17). Nonetheless, since both 2D images 15 and 17 are of a common 3D scene 10, both are truthful representations of the relative positions of larger sphere 19 and smaller sphere 21. The geometric positional relationships between camera 11, camera 13, smaller sphere 21 and larger sphere 19 thus establish geometric constraints on 2D images 15 and 17 that permit one to reconstruct the 3D scene 10 given only the 2D images 15 and 17, as long as the epipolar, or stereo, constraints are known.
Epipolar geometry is based on the pinhole camera model, a simplified representation of which is shown in FIG. 4. In the pinhole camera model, cameras are represented by a point, such as left point OL and right point OR, at each respective camera's focal point. Point P represents the point of interest in the 3D scene being imaged, which in the present example is represented by two crisscrossed lines.
Typically, the image plane (i.e. the plane on which a 2D representation of the imaged 3D scene is captured) is behind a camera's focal point and is inverted. For ease of explanation, and to avoid the complications of a an inverted captured image, two virtual image planes, ImgL and ImgR, are shown in front of their respective focal points, OL and OR, to show non-inverted representations of captured images. Point PL is the 2D projection of point P onto left virtual image ImgL, and point PR is the 2D projection of point P onto right virtual image ImgR. This conversion from 3D to 2D may be termed a perspective projection, and is described by the pinhole camera model, as it is known in the art. It is common to model this projection operation by rays that emanate from a camera and pass through its focal point. Each modeled emanating ray would correspond to a single point in the captured image. In the present example, these emanating rays are indicated by dotted lines 27 and 29.
Epipolar geometry also defines the constraints relating the positions of each camera relative to each other. This may be done by means of the relative positions of focal points OL and OR. The focal point of a first camera would project onto a distinct point on the image plane of a second camera, and vise-versa. In the present example, focal point OR projects onto image point EL on virtual image plane ImgL, and focal point OL projects onto image point ER on virtual image plane ImgR. Image points EL and ER are termed epipoles, or epipole points. The epipoles and the focal points they project from lie on a single line, i.e. line 31.
Line 27, from focal OL to point P is seen as a single point, PL in virtual image plane ImgL, because point P is directly in front of focal point OL. This is similar to how image 15 of camera 11, in FIG. 3, shows smaller sphere 21 in front of larger sphere 19. However, from focal point OR, the same line 27 from OL to point P is seen a displacement line 33 from image point ER to point PR. This is similar to how image 17 of camera 13, in FIG. 3, shows smaller sphere 21 displaced to the side of larger sphere 19. This displacement line 33 may be termed an epipolar line. Conversely from focal point OR, line 29 is seen as a single point PR in virtual image plane ImgR, but from focal point OL line 29 is seen as displacement line, or epipolar line, 35 on virtual image plane ImgL.
Epipolar geometry thus forms the basis for triangulation. For example, assuming that the relative translation and rotation of cameras OR and OL are known, if projection point PL on left virtual image plane ImgL is known, then the epipolar line 33 on the right virtual image plane ImgR is known by epipolar geometry. Furthermore, point P must projects onto the right virtual image plane ImgR at a point PR that lies on this specific epipolar line, 33. Essentially, for each point observed in one image plane, the same point must be observed in another image plane on a known epipolar line. This provides an epipolar constraint that corresponding image points on different image planes must satisfy.
Another epipolar constraint may be defined as follows. If projection points PL and PR are known, their corresponding projection lines 27 and 29 are also known. Furthermore, if projection points PL and PR correspond to the same 3D point P, then their projection lines 27 and 29 must intersect precisely at 3D point P. This means that the position of 3D point P can be calculated from the 2D coordinates of the two projection points PL and PR. This process is called triangulation.
As is explained above, however, epipolar geometry and the stereo constraint are based on the pinhole camera model, and thus do not apply to cameras that do not adhere to the pinhole camera model. Consequently, the availability of corresponding matching tools for non-pinhole cameras has been limited.
Examples of cameras that do not adhere to the pinhole model are orthographic cameras, pushbroom cameras, cross-slit cameras and catadioptric cameras. Such cameras, however, typically provide a larger field of vision than is possible with pinhole cameras.
It would be desirable to facilitate the extension of correspondence matching to the larger field of vision available to non-pinhole cameras, and in particular to catadioptric cameras.