Methods for mechanized detection of three-dimensional objects and their position in space are of fundamental importance in most applications of mechanized image processing and of image interpretation in fields of application such as robotics and production automation. In many of these cases of application, depth data is not available, and also cannot be determined with the aid of stereo images or temporal image sequences in which the object movement or the camera movement could be evaluated in order to obtain depth data. In such cases, it is necessary to use a single two-dimensional image of the object to be detected in order to reconstruct the object identity and its position in space from the two-dimensional image. For this purpose, it is generally necessary to use models for the objects to be detected and a model for the imaging process. For reasons of simplification, objects are frequently modelled with the aid of a corner point and edge lines. The optical imaging of space into the image plane can be approximated in many cases with sufficient accuracy, for example by central projection. The technical problem in detecting and estimating the position parameters of three-dimensional objects in space from a two-dimensional image, frequently also referred to as the n-point perspective problem, consists in determining the position and spatial arrangement of a three-dimensional object from n feature points which have been found with the aid of suitable preprocessing methods in a two-dimensional image of the object to be detected. Examples for such feature points are, inter alia, images of object points or images of other prominent points on the object surface.
The literature discloses various methods for the positional estimation of three-dimensional objects in space. Essentially, it is possible to distinguish two types of method: on the one hand, there are approaches to the analytical solution of the n-point perspective problem which concern themselves with the minimum number of correspondences which are required in order to solve the perspective problem of rigid objects, that is to say the problem of assignment between object points and feature points in the image plane. In these types of method, the relationship, which models the image, between points in space and points in the image plane, for example central projection, is used to set up a system of equations. Depending on the number of feature points in the image plane, this system of equations can be underdetermined, uniquely solvable or overdetermined. These methods therefore generally use a subset of all the available feature points in the image plane, which lead to a uniquely solvable and well-conditioned system of equations. Such a method is described, for example, in the publication by R. M. Haralick, "Using Perspective Transformations in Scene Analysis", Computer Vision, Graphics and Image Processing 13, 1980, pages 191- 221. Methods of this type generally lead to overdetermined or underdetermined systems of equations, and therefore require preprocessing of the image with the aim of selecting a suitable set of feature points, and are generally not very robust with respect to disturbances.
A second type of method exploits the iterative solution of overdetermined systems of equations with the aid of nonlinear minimization methods. In this case, a cost function which is determined by means of correspondences previously found between image features and model parameters is minimized. An example of this second type of method is described in the publication by D. G. Lowe "Three-Dimensional Object Recognition from Single Two-Dimensional Images", Artificial Intelligence 31, 1987, pages 355-395. This second type of method is certainly far less sensitive to disturbances, but like the first type of method has the disadvantage that a nonlinear analytical model of the optical imaging process is required and must be processed using numerical methods.