In recent years, complex tasks that have conventionally been performed by people have come to be performed more and more by robots instead. One typical example of a complex task is the assembly of industrial products. Such a robot grasps parts with an end effector, such as a hand, and assembles them. In order for a robot to grasp a part, it is necessary to measure the relative positions and orientations of the part to be grasped and the robot (hand).
In general, as an example of a method to perform such position and orientation measurement, there is known to be technology for model-fitting (applying) a three-dimensional shape model of an object to image features detected based on a two-dimensional image captured by a camera and a depth map obtained from a range sensor.
Technology that employs edges as image features to be detected based on a two-dimensional image is disclosed in “Real-time visual tracking of complex structures” by T. Drummond and R. Cipolla, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 932-946, 2002 (hereinafter, referred to as “Document 1”). With this technology, the shape of an object is represented by a set of three-dimensional line segments, and given that information indicating the approximate position and orientation of the object is known, a projection image of the three-dimensional line segments is model-fitted to edges detected in the two-dimensional image. Performing measurement that employs edges is suited for cases of measuring the position and orientation of an object in, for example, an environment that contains many texture-less straight-line-based artificial objects.
Here, in order to accurately measure the position and orientation of an object based on image features detected based on a two-dimensional image, it is necessary to accurately associate the detected image features and geometric features in the three-dimensional shape model.
In the aforementioned Document 1, three-dimensional line segments are associated with edges detected in the vicinity of positions at which the three-dimensional line segments were projected on a two-dimensional image. In other words, with this method, the edge detected in the closest vicinity of a three-dimensional line segment projection image is considered to be the correctly associated edge. For this reason, if the edge detected in the closest vicinity is an edge that should not originally be associated, it is impossible to accurately measure the position and orientation of the object, and the precision in measurement decreases. In particular, in the case where the approximate position and orientation of an object are inaccurate, or the case where a two-dimensional image is complex, and a large number of edges are detected as association candidates, erroneous associations will arise in the association of edges and line segments in a three-dimensional shape model.
In order to solve such a problem, a technique of improving precision in the association of line segments in a three-dimensional shape model and edges in a grayscale image by employing luminance values in the periphery of the edges is disclosed in “Adaptive line tracking with multiple hypotheses for augmented reality” by H. Wuest. F. Vial, and D. Stricker, Proc. The Fourth Int'l Symp. on Mixed and Augmented Reality (ISMAR05), pp. 62-69, 2005 (hereinafter, referred to as “Document 2”). Specifically, learning images are used to teach in advance what kind of luminance distribution segments in a three-dimensional shape model are to be detected as in an image line. Degrees of similarity with luminance distributions in the periphery of edges actually detected in the grayscale image are then calculated, and edges are associated based on the results of the calculation. This technology is useful in, for example, the case where a luminance distribution that can be uniquely identified is included as surface information of the target object.
As described above, a technique of associating edges using luminance in a grayscale image is effective in the case where the apparent luminance distribution has high reproducibility, such as the case where the target object has uniquely identifiable surface color information.
However, luminance in a grayscale image varies diversely depending on the surface information of the object, the state of the light source, and the viewpoint from which the object is observed, and therefore depending on the surface color of the target object and the environmental situation, there is a high possibility of erroneous association occurring between image features and geometric features in a three-dimensional shape model.