1. Field of the Invention
This invention pertains generally to a technique for recognizing and locating objects and more specifically a technique for estimating the pose of an object from a range image containing the object.
2. Description of the Related Art
In recent decades, a wide variety of instruments have been built to obtain range images; a range image being a two-dimensional array of numbers which gives the depth of a scene along many directions from a central point in the instrument. Instead of measuring the brightness of many points in a scene, as in a television camera, these instruments measure where each point is in a three-dimensional space. Both range images and the more conventional intensity images from digital cameras have been used in the computer vision research community to determine the pose of observed objects. The term xe2x80x9cobjectxe2x80x9d, as used herein, means a particular surface shape. xe2x80x9cPosexe2x80x9d means a complete description of an object""s position and orientation. For a rigid object this requires six numbers, such as X, Y, Z, pitch, yaw and roll, or six equivalent coordinates. The previous methods for pose estimation all suffer from either a lack of generality or from time inefficiency.
It is possible to do pose estimation using tripod operators (TO). See, Pipitone; TRIPOD OPERATORS FOR THE INTERPRETATION OF RANGE IMAGES; NRL Memorandum Report 6780, February 1991 for a crude and incomplete discussion. Tripod operators are a versatile class of feature extraction operators for surfaces. They are useful for recognition and/or localization (pose estimation) based on range or tactile data. They extract a few sparse point samples in a regimented way so that N surface points yield only Nxe2x88x923 independent scalar features containing all the pose-invariant surface shape information in these points and no other information. They provide a powerful index into sets or prestored surface representations. A TO consists of three points in 3-space fixed at the vertices of a triangle and a procedure for making several xe2x80x9cdepthxe2x80x9d measurements in the coordinate frame of the triangle, which is placed on the surface like a surveyor""s tripod. TOs can be embedded in a vision system in many ways and applied to almost any surface shape.
As stated above, a TO consists of three points in space fixed at the vertices of a triangle of fixed edge lengths and a procedure for making several depth measurements in the coordinate frame of the triangle, which is placed on the surface like a surveyor""s tripod. These measurements take the form of arc-lengths along xe2x80x9cprobe curvesxe2x80x9d at which the surface is intersected. FIGS. 1a through 1c shows three examples of TO""s. FIG. 1a shows a very simple TO with one line probe fixed symmetrically with respect to the rigid triangle ABC. The single scalar feature is the distance from the plane of ABC at which the probe intersects the surface. This resembles a mechanical optician""s tool called a spherometer. The number d of scalar features is called the order of the operator. FIGS. 1b and 1c show TO""s that can be viewed as a set of equilateral triangles hinged together so that all d+3 points can be made to contact a surface. The angles of the d hinges are the features. This type, called linkable TO""s, is preferred because of their symmetry and uniform sensitivity to noise. The application if this TO to a planar surface yields xcfx86xe2x89xa10 for all the hinges. Many variations of these TO""s could be constructed. Feature noise is related to range noise n by the approximate expression nxcfx86≈51xc3x97n/e, where nxcfx86is the feature error in degrees, and n is expressed in the same distance units as the edge length e.
From an N-point TO, the N sampled surface points yield only Nxe2x88x923 independent scalar features, and the order d is Nxe2x88x923. These features contain all the surface shape information in the 3N components of the points since they suffice to reconstruct the relative positions of the N points. They contain no other information. For example, they have complete six DOF invariance under rigid motions, the group R3xc3x97SO(3). Thus, they depend upon where the tripod lies on the surface, but upon nothing else. A key property is that for any dimensionality d of feature vector only a 3 (or fewer)-dimensional manifold of feature space points can be generated from a given surface, since the tripod can be moved in only 3 DOF on a surface. This allows objects to be densely sampled with TOs at preprocessing time with a manageable number of operator applications, typically a few thousand, to obtain almost all of the possible feature vector values obtainable from any range image of the object. This set is a kind of invariant signature. For brevity, this is called the signature of the object or surface, with respect to a particular type TO. It can be stored in an array of bins in feature space, e.g., of dimension 3 or 4, for later efficient access of near neighbors to TO features measured at recognition time. These bins can optionally contain precomputed probability densities, analytic expressions for distances to nearby signature manifolds, and partial or complete descriptions of the relative poses of tripods and models, all to serve various purposes in a recognition system.
Since in some applications of the tripod operator, the computation consists only of placement and a little indexing, the cost of placing the operator should be kept small. This can be done by efficiently implementing a procedure similar to the following. Consider placing the TO""s of FIGS. 1b or 1c on a dense range map. Point A can be chosen as any point on the image surface. Interpolation is to be done locally as needed, e.g., using piecewise triangular facets. Point B can be found by moving along a line at orientation xcex1 in image coordinates, pixel indices, until the 3D distance |AB|xe2x89xa1e. This can be done in logarithmic time, essentially constant here, using binary search. Then the circle of the radius 0.53 e oriented coaxially around the center of the segment AB, using binary search, to find a point C close to the surface. A similar circular search yields each remaining point. A key step in the circular search is the mapping, specific to a range scanner""s geometry, from a point (x, y, z) to the two indices of the range pixel whose ray (x, y, z) lies on. This allows the front/behind decision required by the binary search. In the case of a sequential random access range scanner, it may be efficient to monotonically search elliptical paths in image coordinates until the two distances being enforced, e.g., |AC| and |BC|; are both correct. The ellipses here are the projections of the previously described circles onto image coordinates. Finally, in the case of a tactile TO, the computation is mechanical; the feature values are to be read from position transducers, e.g., from linear potentiometers by an analog-to-digital (A/D) converter.
The following are a few of the symmetry properties of TOs of the types of FIGS. 1b and 1c. 
Surfaces with one symmetry, such as extrusions, surfaces of revolution, and helical projections produce only a 2-dimensional manifold in feature space (for FIGS. 1b and 1c). Cylinders, having two symmetries, produce only a nearly circular 1-dimensional curve, and spheres a single point. Scaling a TO by changing its edgelength does not affect the signature of surfaces swept by a line with one point fixed, e.g., cones, planar n-hedral vertices, and planar dihedral edges. Regardless of the surface, an operator with a 3-fold symmetry, e.g., those in FIGS. 1b and 1c, produces signatures unchanged by cyclically permuting each triple of corresponding features. In FIG. 1c, the three 3-cycles (1, 2, 3), (4, 5, 6) and (7, 8, 9) show this property, for features xcfx861 through xcfx869, respectively. This allows a 3-fold storage reduction, e.g., by permuting the features so that xcfx861 is the largest. If the TO, in addition, has handedness symmetry, the signature can be modified by a procedure that allows recognition of the xe2x80x9cother sidexe2x80x9d of any surface already recognizable. This is called inversion of a signature. It is done by transposing certain pairs of corresponding features, e.g., (7,5), (1,2), (4,8), and (6,9) in FIG. 1c, and replacing each feature value with xe2x88x92xcfx86. Also, the signature of the opposite-handed (reflected) version of a surface can be found by performing those transpositions without negating the features.
The signatures of order 3 operators, FIG. 1b, were rendered as a rotating cloud of points on a computer, selected 2D snapshots are shown in FIGS. 2a through 2d. In a special case of xe2x80x9csmoothxe2x80x9d surface regions, the signature is nearly a circular ring coaxial with the diagonal axis. The offset and radius of the ring can be readily used to compute estimates of the principle curvatures and other differential geometric parameters. Surfaces with C1 or C2 discontinuities tend to produce signatures with similar numbers and kinds of discontinuities, e.g., FIGS. 2c and 2d, and have roughly commensurate complexities of description. Thus, this umbrella-shaped 2-manifold can be well approximated with a few polynomials, whereas the discrete signature might need 20,000 points for thorough saturation.
This operator (in the inventions preferred form, the linkable TO) is essentially a 3D simulated structure consisting of several triangles hinged together. It is applied to a computer-represented surface, such as range image, by moving it until all of its points lie on that surface. Then the xe2x80x9chinge anglesxe2x80x9d provide information about the shape of the surface.
In a simplified application of tripod operators, there are two major steps. The first is training the system on a new object so that it will later be able for the system to recognize (or estimate the pose of) that object when seen again in some range image. The second step is the actual recognition (or, pose estimation).
The objective of this invention is to provide a technique for estimating the pose of surface shapes in six degrees of freedom from a range image containing an object possessing such a surface shape.
This and other objectives are accomplished by using a software procedure, with associated hardware, for estimating the pose of an object from a range image containing the object. A range image is a two dimensional array of numbers which represent the distances from a reference point in the range imaging instrument to observed surface points in a scene. All six parameters of the pose of an object are estimated; three translational and three angular parameters. This technique involves combining the previously existing method called tripod operators (TOs) with a new technique known as xe2x80x9cnon-pose-distinctive placement removalxe2x80x9d and with other new ideas. TOs are 3 geometric procedures which obtain small sparse sets of points from range images in a regimented way. They are useful for surface shape recognition and pose estimation. The technique is composed of two steps. The first is training the system on a new object so that it will be later able to estimate the pose of that object when seen again in some range image. The second is the actual pose estimation, where a TO is placed at a-random location on a new range image containing the object of interest. Then the nearest neighbor in the TO feature space signature from the training data is computed. If the distance to the nearpoint is less than some appropriate threshold, then the surface is recognized and pose estimation proceeds by computing the six pose parameters of a central triangle of the new TO placement in the coordinate system of the range imaging instrument. Then the pose parameters associated with the nearpoint are retrieved. An estimate of the pose of the surface shape in the new image is recovered using those two pose six-vectors; the pose of the central TO triangle in the new image and the retrieved pose of the central TO triangle in the training image are composed together to determine an estimate where the object actually is with respect to the location of its original model used in training.