In recent years, along with the development in robotics, robots have begun to perform complex tasks that have conventionally been performed by human hand, such as assembling of industrial products. When such robots hold and assemble the parts using end effectors including hands, it becomes necessary to measure a relative position and orientation between the parts to be held and the robot (hand).
The position and orientation of an object can be measured by employing model fitting in which a three-dimensional model of an object is fitted to features detected from a two-dimensional image or to a range image. When performing model fitting with respect to the two-dimensional image, the position and orientation is estimated so that a projected image acquired when projecting the three-dimensional model on the image based on the position and orientation of the object matches the detected features. When performing model fitting with respect to the range image, each of the points in the range image is converted to a three-dimensional point group having three-dimensional coordinates. The position and orientation is then estimated so that the three-dimensional model fits the three-dimensional point group in a three-dimensional space.
However, a detected position of the feature in the two-dimensional image or the three-dimensional coordinates of the point groups contain errors. Such errors are caused by a quantization error of a pixel, blur, accuracy of a feature detection algorithm, and correspondence between cameras. Processes are thus performed to improve the measurement accuracy of the position and orientation, such as averaging an effect of the measurement errors included in a plurality of pieces of measurement information (i.e., features of the image and point group).
The position and orientation of an object can be measured with high accuracy by estimating the position and orientation using gradients of an intensity image and a range image without explicitly performing feature detection (Hiura, Yamaguchi, Sato, Ikenouchi, “Real-Time Tracking of Free-Form Objects by Range and Intensity Image Fusion”, Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J80-D-II, No. 11, November 1997, pp. 2904-2911). In such a method, it is assumed that brightness and the range vary smoothly when the object moves. An orientation parameter of the object is then calculated from the change in the brightness of the intensity image and the change in the range of the range image based on a gradient method. However, since the dimensions are different between the two-dimensional intensity image and the three-dimensional range image, it is difficult to effectively fuse the two images. It thus becomes necessary to perform manual tuning to calculate the orientation parameter.