Reconstructing a three dimensional (3D) model from two dimensional (2D) images is useful for situational awareness, for example, when tele-operating a robot at a distance. Several 2D images can be taken of an object from different angles, such as images 10, 12, 14, and 16 shown in FIG. 1, and then the images can be used to form a 3D reconstruction or 3D model, such as 3D reconstruction 18.
In the prior art, Shape-from-Motion (SFM) approaches create a 3D model by solving for camera poses by matching key features between camera views, as described by N. Snavely, S. M. Seitz, and R. Szeliski in “Photo Tourism: Exploring image collections in 3D” in SIGGRAPH, 2006, which is incorporated herein by reference. These methods can vary by which features are matched, how the pose and perspective problem is solved and how points are filled in between matched key points. A common method is to extract Scale Invariant Feature Transform (SIFT) features, as described by D. G. Lowe in “Object recognition from local scale-invariant features” in the Seventh IEEE International Conference on Computer Vision, 1999, pp. 1150-1157, which is incorporated herein by reference. Then model fitting can be used, as described by M. A. Fischler and R. C. Bolles in “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24, pp. 381-395, 1981, which is incorporated herein by reference, to find inlier matches between images.
A gradient method such as described by K. Levenberg in “A Method for the Solution of Certain Non-Linear Problems in Least Squares,” Quarterly of Applied Mathematics vol. 2, pp. 164-168, 1944, which is incorporated herein by reference, may be used to minimize errors in pose and find the transformation matrix that maps key points in one image to another. A series of images can then be bundled together with known camera poses and 3D-mapped key points. Based on the camera poses and key points, a sparse point cloud can be computed, which is useful for matching camera viewpoints; however, it is not visually appealing.
Another method described by Y. Furukawa, B. Curless, S. M. Seitz, and R. Szeliski, “Towards Internet-scale Multi-view Stereo,” Computer Vision and Pattern Recognition (CVPR) conference, 2010, which is incorporated herein by reference, can be used to fill in stereo points between the key points as well as perform matching between 3D regions that SFM was unable to connect.
Probabilistic voxel methods are described by A. Miller, V. Jain, and J. L. Mundy in “A heterogeneous Framework for Large-Scale Dense 3-d Reconstruction from Aerial Imagery,” IEEE Transactions on Parallel and Distributed Systems, vol. IN PRESS, 2013, and by M. I. Restrepo, B. A. Mayer, A. O. Ulusoy, and J. L. Mundy in “Characterization of 3-D Volumetric Probabilistic Scenes for Object Recognition,” IEEE Journal of Selected Topics in Signal Processing, vol. 6, pp. 522-537, 2012, which are incorporated herein by reference. Another method uses surface mapping as described by J.-M. Frahm, P. Georgel, D. Gallup, T. Johnson, R. Raguram, C. Wu, Y.-H. Jen, E. Dunn, B. Clipp, S. Lazebnik, and M. Pollefeys, “Building Rome on a Cloudless Day,” presented at the European Conference on Computer Vision (ECCV), 2010, by B. Clipp, R. Raguram, J.-M. Frahm, G. Welch, and M. Pollefeys, “A Mobile 3D City Reconstruction System,” presented at the IEEE Vision Recognition (VR) workshop on Cityscapes, 2008, and by C. Wu, B. Clipp, X. Li, J.-M. Frahm, and M. Pollefeys in “3D model matching with viewpoint invariant patches (VIP),” in Computer Vision and Pattern Recognition (CVPR, 2008, which are incorporated herein by reference.
An issue that comes up during the final step of reconstruction is that there are frequently holes in the 3D model. If the surface is visible, this is typically the result of low feature or specular surfaces. However, any optical anomaly that prevents stereo feature matching can be the cause. When these holes are present in the data, the questions then become: when does the hole belong there and when is the hole due to a failure in the 3D constructing software.
FIGS. 2A, 2B and 2C show an example of filling in a hole 20 in a 3D reconstruction shown in FIG. 2C. This may be accomplished by projecting a region 22 of FIG. 2B, as imaged by the camera 24 shown in FIG. 2A, onto the hole 20 in FIG. 2C. To determine what region 22 of FIG. 2B to use to fill the hole 20, classification and segmentation is used.
It is important to be able to accurately identify the parts of an image, such as earth, manmade, or space, in order to form a 3D reconstruction, and to find and fill holes in the 3D reconstruction.
What is needed is an improved method for classification and segmentation of images that can be used to more accurately identify parts of an image and therefore construct a more accurate 3D model. The embodiments of the present disclosure address these and other needs.