1. Field of the Invention
The present invention relates to an image processing apparatus for detecting the position and attitude of an object. In particular, the present invention relates to an image processing apparatus for recognizing and detecting the attitude of randomly piled objects having an identical form, the positions and attitudes of which vary three-dimensionally.
2. Description of the Related Art
In automatic machines such as robots, a method of recognizing the position and attitude of an object (workpiece) which is not accurately positioned from a captured image of the object is typically employed to enable handling of the object. However, it is extremely difficult to recognize the position and attitude of an object that is capable of taking an arbitrary three-dimensional position and attitude, for example being piled up.
In the technology described in Japanese Unexamined Patent Application 2000-288974, a plurality of images of an object captured from various directions are stored in advance as teaching model images, whereupon the stored teaching model images are compared with an input image captured upon detection of the position and attitude of the object in order to select the teaching model which most closely resembles the captured image. The position and attitude of the object are then determined on the basis of the selected teaching model. Then, on the basis of the determined position and attitude of the object, a visual sensor is caused to move to the position and attitude at which the object is to be recognized, and thus the position and attitude of the object are recognized accurately using this visual sensor.
In the technology described in Japanese Unexamined Patent Application H8-153198, an image cutting recognition apparatus is constituted by an object learning apparatus and an object recognition apparatus. In the object learning apparatus, the region of an object is extracted from image data obtained by capturing images of the object from various directions using first image input means, and image processing data are obtained by normalizing the image data value of the extracted object region. By modifying the size of the image processing data in various ways, a learned image data set is obtained. The form of a manifold is then calculated from the learned image data set and characteristic vectors determined from the learned image data set. Meanwhile, in the object recognition apparatus, the region of the object is extracted from image data obtained using second image input means, whereupon a distance value is calculated from data obtained by normalizing the image data value of the extracted region, the aforementioned characteristic vectors, and the aforementioned manifold form, and thus the position, direction, and magnitude of the object are outputted.
The technology described in the Japanese Unexamined Patent Application 2000-288974 is applied in particular to an operation to pick up randomly piled objects of the same shape one by one. In this case, a plurality of teaching model images is determined by capturing images of the objects from various directions. The teaching model images are then compared to input images obtained by capturing images of several of the piled objects, whereupon the teaching model image which most closely resembles the captured image is selected. The selected teaching model is then used to determine the three-dimensional position and attitude of the object. Then, on the basis of the determined three-dimensional position and attitude of the object, a visual sensor is moved to the position and attitude at which the object is to be recognized, and the position and attitude of the object are recognized accurately by the visual sensor. In this technology, the position and attitude of the object can be recognized with greater accuracy as the number of teaching models is increased.
When a large number of objects are piled up as in the case described above, the individual objects take various three-dimensional attitudes. To recognize the position and attitude of such objects with even greater accuracy, more (more detailed) teaching model images need to be prepared in accordance with the possible attitudes of the objects.
However, obtaining a large number of teaching model images requires time and larger storage means for storing the teaching model images. A further problem arises in that when attempts are made to recognize the position and attitude of an object, comparison with the teaching model images takes time due to the large number of teaching model images.