1. Field of the Invention
This invention relates to computer vision and more particularly to the computation of the location and orientation (alternatively referred to as xe2x80x9cposexe2x80x9d) of an object in three dimensional (3D) space.
2. Prior Art
Apparatus for capturing 3-dimensional (henceforth, 3D) models of people""s faces are known in the art. These apparatus simultaneously capture pictures of the faces in registration with the 3D models that are captured. The 3D models consist of the 3-dimensional coordinates of a large number of points on the surface of the face, typically on the order of 640xc3x97640, along with the color picture value at each point. This provides the possibility of realistic computer graphics rendering of the faces from any vantage point. It is envisioned that this device can be used to capture a large number of face models. In some applications, this number could reach on the order of one million faces or more.
One application of this device is face identification, i.e., given a query consisting of an arbitrary two-dimensional (2D) photograph of a face, to search a large database of previously captured 3D models of faces to find the model which best matches the 2D photograph.
There are numerous obstacles to be overcome to accomplish this.
In general, there is no knowledge or control of the conditions under which the query photograph was acquired. Therefore, the pose (i.e. the position and orientation of the subject) and the lighting conditions of the query photograph are unknown.
There have been numerous approaches to the problem of face recognition. We focus on one class of paradigms for matching a query to a model in this setting, which begins with the task of determining the pose of the face in the query, where the xe2x80x9cposexe2x80x9d is defined as the 3D position and orientation of the face.
It is that task, pose determination, that the present invention, is concerned with. There is considerable prior art concerned with finding the pose of 3D objects given a 2D photograph. The present invention is interested in the case where a set of 3D models is available. The methods of the prior art for dealing with this problem can be divided into two classes:
1. Those based on xe2x80x9cfeaturesxe2x80x9d extracted from the 2D image which are compared to corresponding features in some 3D model; and
2. Those that use the image directly.
The methods of the present invention fall into the first class. Features that have been used by prior artisans range from hand-selected points, to automatically detected points such as xe2x80x9ccornersxe2x80x9d, extrema, lines, line junctions, and curves. The present invention deals only with the simplest of these methods, which takes as input 2 sets of points. The first set of points is marked by hand on a 3D model, resulting in 3D coordinates of the points. The second set of points is similarly marked by hand on a 2D image of a similar object. The correspondence between the 3D and 2D points is known, i.e., for each of the 3D points there is exactly one 2D point which is known to correspond to it, and vice versa. By saying that a 3D point xe2x80x9ccorrespondsxe2x80x9d to a 2D point, it is meant that the points represent the same feature of the object, e.g., the left corner of the left eye of some person""s face, or the left rear corner of the left wing of an airplane. The problem to be solved is to compute the rigid motion of the 3D point set, all points moving identically together, such that the perspective projection (i.e., the camera image) of the 3D point set will be the 2D point set. This is equivalent to finding the position and orientation of a perspective projection camera so that the picture it takes of the 3D point set will be the 2D point set. This is sometimes called the xe2x80x9cperspective n-point problem,xe2x80x9d where n is the number of points involved.
There have been many practitioners who have crafted solutions to the perspective n point problem. These range from relatively straightforward least squares solutions, to statistical selection techniques, through what is now commonly called xe2x80x9crecognition by alignment.xe2x80x9d
There are several problems with existing methods for this problem. The first problem is that when there are more than 3 pairs of corresponding points, it is not in general possible to obtain a closed form solution. For larger numbers of points, a least squares approach is required. In general, because of the nonlinear nature of the perspective projection (i.e., perspective contains a division by the z coordinate), and the nonlinear effects in the 2D image of rotations in 3D space, the least squares problem must be solved by iterative methods. Iterative methods are slow compared to closed form solutions, and in the applications of concern, it will eventually be required to solve this problem for up to hundreds of millions or more different sets of 3D points for each query, so that speed is important. It is further desirable that for graphic display purposes, the solution be so fast that it can be done in real time to provide a smooth graphic display of the effect of altering point positions on the resulting solution.
Therefore it is an object of the present invention to provide a method for computing the location and orientation of an object in three dimensional space which is fast as compared to the methods of the prior art.
It is a further object of the present invention to provide a method for computing the location and orientation of an object in three dimensional space which is not only fast as compared to the methods of the prior art but also robust.
It is yet a further object of the present invention to provide a method for computing the location and orientation of an object in three dimensional space which is well suited to an application, where a small number of manually marked points with known correspondences must be used to find pose.
It is yet a further object of the present invention to provide a method for computing the location and orientation of an object in three dimensional space which does not require knowledge of a camera model, or need to solve for a camera model.
It is still yet a further object of the present invention to provide a method for computing the location and orientation of an object in three dimensional space which is computationally simpler than the heuristics and iterative solutions of the prior art.
Accordingly, a method for computing the location and orientation of an object in three-dimensional space is provided. The method comprises the steps of: (a) marking a plurality of feature points on a three-dimensional model and corresponding feature points on a two-dimensional query image; (b) for all possible subsets of three two-dimensional feature points marked in step (a), computing the four possible three-dimensional rigid motion solutions of a set of three points in three-dimensional space such that after each of the four rigid motions, under a fixed perspective projection, the three three-dimensional points are mapped precisely to the three corresponding two-dimensional points; (c) for each solution found in step (b), computing an error measure derived from the errors in the projections of all three-dimensional marked points in the three-dimensional model which were not among the three points used in the solution, but which did have corresponding marked points in the two-dimensional query image; (d) ranking the solutions from step (c) based on the computed error measure; and (e) selecting the best solution based on the ranking in step (d).
In a first variation of the method of the present invention, the method further comprises, after step (a), the step of computing a predetermined number of perturbed points derived from each of the feature points wherein the computation of step (b) is on all possible subsets of three two-dimensional points marked in the query image, and in addition on the corresponding perturbed points. Typically, the predetermined number of perturbed points are obtained by sampling from a spherical Gaussian distribution.
In a second variation of the method of the present invention, the method further comprises, after step (d), the steps of: (i) choosing a subset of the solutions from step (c) based on the ranking of step (d); (ii) computing a predetermined number of perturbed points derived from each of the subset of the solutions; (iii) repeating step (b) on the predetermined number of perturbed points; and (iv) repeating step (c) for each solution found in step (iii); wherein the ranking of step (d) is based on the error computed in both steps (c) and (iv). Preferably, the subset of solutions chosen is a predetermined portion of the ranked solutions, such as the top 10% of the ranked solutions.
Still further provided is a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform the method steps of the methods and variations thereof of the present invention and a computer program product embodied in a computer-readable medium for carrying out the methods, and variations thereof, of the present invention. Preferably, the computer program has modules corresponding to the steps of the methods and variations thereof of the present invention.