This section is intended to provide a background or context to the invention disclosed below. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived, implemented, or described. Therefore, unless otherwise explicitly indicated herein, what is described in this section is not prior art to the description in this application, and is not admitted to be prior art by inclusion in this section. Abbreviations that may be found in the specification and/or the drawing figures are defined below at the end of the specification, but prior to the claims.
Various embodiments relate to stereoscopic imaging, and particularly to view interpolation and to image in-painting from multiple views.
In view interpolation, two images of the same scene taken from two different locations are used to estimate a new camera view of the scene appearing as if it had been captured from a viewpoint between the two locations. The two images may be taken, for example, simultaneously by two cameras, or sequentially by one camera which is moved from one location to the next after the first of the two images is taken.
For example, FIG. 1 shows two views of a building taken from two different positions, which are a left view 102 and a right view 104, and a combined view 106 which provides a new view of the building from a viewpoint between those of the left view and the right view 104. The combined view 106 is generated using elements taken from each of the two original views 102, 104.
By way of contrast, the goal of image in-painting from two images is to fill a region in one of the images with corresponding information from the other image. For example, referring to FIG. 2, the statue blocks part of the building in the left image 202, but that part of the building is visible in the middle image 204. By using image in-painting, information from the middle image 204, blocked by the statue in the left image 202, is transferred to the left image 202 to permit the removal of the statue and its replacement with information from the middle image to produce the right image 206.
View interpolation and image in-painting are related to traditional stereo imaging, in which two views are used to infer three-dimensional information about a scene. FIG. 3 illustrates the geometry of a stereo pair, also called epipolar geometry. Given a point X in the world, imaged at xL in the left view, the corresponding location xR of X in the right view is found, and triangulating this point correspondence gives the depth of X. The problem of finding the correspondence, that is, which point xR from the right view corresponds to xL, is central to stereo imaging. It is known from epipolar geometry that xR should lie along a specific line in the right view, the epipolar line, and this line can be computed from the relative geometry of both views, which, in turn, can be computed using a calibration procedure. This property is useful, since it implies that the search for xR across the right view can be simply conducted along the epipolar line, rather than looking for xR across the entire right view.
Another useful procedure is called rectification. Referring to FIG. 4, where, given the geometry of the stereo pair in the top two pictures, both images are warped so that epipolar lines become parallel and horizontal. In this way, the search for correspondences is reduced to traversing the pixels of a scanline of the image. In other words, the search for correspondences becomes a one-dimensional search along a horizontal scanline.
The aforementioned task of finding the point correspondences between images is typically performed by minimizing a similarity measure between pixels; in other words, assuming the image pair, given by images f1(x,y) and f2(x,y), has been rectified, so that the corresponding points are on the same scanline, and therefore have the same y coordinate value, the goal is to minimize a matching cost D(x1,x2,y) of pixels f1(x1,y) and f2(x2,y). By minimizing the matching cost, as measured by mathematical algorithms used for this purpose by those of ordinary skill in the art, one is optimizing the match to obtain the best solution. For example, D may be defined as the sum of squared differences in a 3×3 pixel patch around f1(x1,y) and f2(x2,y). However, in practice, this process may become quite challenging due to several issues, such as occlusion, where a point is visible in one of the images but is not visible in the other image; textureless areas, where a point could be matched to several points having a similar appearance; and repeating structures, where, as with textureless areas, there may be multiple similar candidates for matching the point.
In typical stereo matching for three-dimensional reconstruction of the prior art, the goal is to minimize the measured error against a ground truth disparity, where disparity is defined as the distance in pixels between the location of a point P in image 1 and its corresponding location in image 2 along the corresponding scanline.
A possible way to find a match for f1(x1,y) is to search for pixels in the corresponding scanline on a neighborhood around (x1,y) in f2, that is, the neighborhood going from f2(x1,y) to f2(x1+k,y). The search looks for the pixel x2 that minimizes a matching cost D(x1,x2,y) for the pixels in the neighborhood.
A better approach is to perform a global match along the scanline. This can be done in the following way:
Let P=((x1,x2)1; (x1,x2)2; (x1,x2)3; . . . ) be a sequence of correspondences along a scanline. These sequences are assumed to be monotonically increasing such that the correspondences are unique (no two pixels on one stereo image correspond to the same pixel in the other image) and in order (if an object is to the left of another in one stereo image, it is also to the left in the other image).
As an example, assume each image has 6 pixels on the scanline. A possible sequence of correspondences from the first pixel to the last is:                ((1,1); (2,3); (3,4); (4,5); (6,6))        
This path can be visualized as follows:
                                                             1                    2                    3                    4                    5                    6                            1                    X                                                                                                                                                                                                                          2                                                                                                X                                                                                                                                              3                                                                                                                                      X                                                                                                        4                                                                                                                                                                            X                                                                  5                                                                                                                                                                                                                                                                6                                                                                                                                                                                                                  X            
Note that the coordinates denote the x coordinate on images 1 and 2, respectively; the images are assumed to be rectified so the y coordinate on both is the same.
Each unmatched pixel is penalized with a cost E; in general E>D(x1,x2,y). In this example, there are two unmatched pixels (pixel #2 on one image and pixel #5 on the other image).
As a consequence, the total difference along this sequence would be:D(1,1,y)+D(2,3,y)+D(3,4,y)+D(4,5,y)+D(6,6,y)+2E 
Let us define Dp(xstart1, xstart2, xend1, xend2,y) as the minimum total difference sequence from (xstart1, xstart2,y), (xend1, xend2,y), that is, the minimum total difference sequence for the segments [(xstart1,y), . . . , (xend1,y)] in image 1, and [(xstart2,y), . . . , (xend2,y)] in image 2. If the sequence is limited to be monotonically increasing, the minimum sequence can be found efficiently using a technique called Dynamic Programming. This is known prior art in stereo matching on a scanline, as shown, for example, in Section 11.5.1 of Szeliski, Computer Vision: Algorithms and Applications, Springer 2010, the teachings of which are incorporated herein by reference. With this technique, the quality improves due to the ordering constraint imposed by the matching procedure, but visible artifacts are generally still present.