The invention relates to digital image processing, and, more particularly, to a system for recognition of three-dimensional objects in a two-dimensional image and the method of recognition.
Computer vision includes the automatic machine recognition and localization of three-dimensional objects from two-dimensional images. FIG. 1 shows a computer vision system 100 with passive sensor 102, digitizer 104, recognition processor 106, and output 108. Passive sensor 102 may include a TV camera or an infrared imager for night vision; digitizer 104 may be a sampling analog-to-digital converter or may be partially incorporated into sensor 102 in the case of a CCD sensor. Recognition processor 106 analyzes the image from sensor 102 to determine the presence of certain target objects in the scene. Output 108 may be a display of recognized targets or may feed a controller for flight as in automatic target recogntion in a smart missle. Recognition processor 106 may use various target recognition systems.
Known target recognition systems include recognition by global features such as Fourier transform descriptors, moments, silhouette-based features, and so forth. These systems presume an open target. However, for images of target objects which may be partially occluded or with low signal-to-noise ratios the extraction of such global features may not be possible.
Alternative to global feature recognition is local feature recognition. Huttenlocher and Ullman, Recognizing Solid Objects by Alignment with an Image, 5 Int'l. J. Comp. Vision 195 (1990) and Lowe, Three-Dimensional Object Recognition from Single Two-Dimensional Images, 31 Artif. Intell. 355 (1987) describe model-based recognition approaches using vertices and edges. The model-based approach matches stored geometric models against features extracted from an image. Recognition of an object within an image entails finding a transformation (rotation, translation, perspective projection) from a set of features of a model of the object to a set of corresponding features extracted from the image. The larger the sets of model and image features, the better the match. Note that Huttenlocher and Ullman use a weak perspective projection in which the depth of objects is presumed small so the perspective is orthgraphic projection plus a common scale factor for all objects to account for distance. They compute hypothesized transformations from sets of three pairs of model and image points (corners) and verify the transformations with edge contour matches as follows. Given three pairs of points (a.sub.m, a.sub.i), (b.sub.m, b.sub.i), and (c.sub.m, c.sub.i), where the image points (subscript "i") are in two-dimensional sensor coordinates and the model points (subscript "m") are in three-dimensional object coordinates. First, rotate and translate the model so that the new a, is at the origin (0,0,0) and the new b.sub.m and c.sub.m are in the x-y plane. This operation is poerformed offline for each triple of model points.
Next, define the translation vector b=-a.sub.i, and translate the image points by b so that the new a.sub.i is at the origin (0,0), the new b.sub.i is at old b.sub.i -a.sub.i and the new c.sub.i is at old c.sub.i -a.sub.i.
Then, solve for the 2 by 2 linear transformation L with matrix elements L.sub.ij so that Lb.sub.m =b.sub.i and Lc.sub.m =c.sub.i. The translation b and linear transformation L define a unique affine transformation A as long as the three model points are not collinear.
Further, compute c.sub.1 and c.sub.2 as: EQU c.sub.1 =.+-.[w+(w.sup.2 +4q.sup.2).sup.1/2 ].sup.1/2 /2.sup.1/2 EQU c.sub.2 =-q/c.sub.1
where w=L.sub.12.sup.2 +L.sub.22.sup.2 -(L.sub.11.sup.2 +L.sub.21.sup.2) and q=L.sub.11 L.sub.12 +L.sub.21 L.sub.22.
Lastly, form the 3 by 3 matrix sR as: ##EQU1##
where s=[L.sub.11.sup.2 +L.sub.21.sup.2 +c.sub.1.sup.2 ].sup.1/2. This yields the complete transformation with translation vector b and scale and rotation sR. The image coordinates of a transformed model point, p'=sRp+b, are then given by the x and y coordinates of p'.
In constrast, Lowe uses a full perspective and feature groupings (parallelism, collinearity, and end point proximity) of edges to trigger Newton-Rapheson method computation of hypothesized transformations.
U.S. Pat. No. 5,173,946 (K. Rao) discloses a corner matching and distance array method of image matching.
The foregoing items are hereby incorporated by reference.