In recent years, along with the growth of robot technologies, robots perform complicated tasks (for example, assembling processes of industrial products), which were, up until now, performed manually. Such robots assemble parts by gripping them using an end effecter such as a hand.
In order to control the robot to grip a part, a relative position and orientation between the part to be gripped and the robot (hand) have to be measured (estimated). Such position and orientation measurements are used not only when the robot grips a part but also in various purposes such as self position estimation required for the robot to autonomously move and registration between a real space and virtual object in augmented reality.
In the position and orientation measurements, methods using a two-dimensional image captured by a camera and a range image obtained from a range sensor are known. For example, measurements by means of model fitting are known. In the measurements, a three-dimensional shape model of an object is fitted to features detected from the two-dimensional image or to the range image.
In the model fitting to a two-dimensional image, the position and orientation are measured by fitting a projected image obtained by projecting a three-dimensional shape model onto an image to features detected from a two-dimensional image. In the model fitting to a range image, respective points of the range image are converted into a three-dimensional point group having three-dimensional coordinates, and a three-dimensional shape model is fitted to the three-dimensional point group on a three-dimensional space, thereby measuring the position and orientation.
As a method of measuring the position and orientation using a two-dimensional image, a method of measuring the position and orientation of a camera using edges is known (T. Drummond and R. Cipolla, “Real-time visual tracking of complex structures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 932-946, 2002. (to be referred to as reference 1 hereinafter)). With this method, a three-dimensional shape of an object is expressed by a set of line segments (wire frame model), and projected images of three-dimensional line segments are fitted to edges detected on an image, thereby measuring the position and orientation of the object. More specifically, three-dimensional line segments are projected onto the image based on approximate values of the position and orientation, so as to detect edges in the vicinity of the projected line segments. Next, the position and orientation of a target object are measured by nonlinear optimization, so as to minimize a sum total of distances on the image between the projected images of the line segments based on the approximate values of the position and orientation, and the corresponding edges.
On the other hand, as a method of measuring the position and orientation using a range image, a method using an ICP (Iterative Closest Point) algorithm is known (P. J. Besl and N. D. McKay, “A method for registration of 3-D shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239-256, 1992. (to be referred to as reference 2 hereinafter)). With this method, the position and orientation of an object are measured by fitting a three-dimensional shape model of the object to three-dimensional point group data converted from a range image. Processing for searching geometric features of the three-dimensional shape model closest to three-dimensional points based on approximate values of the position and orientation, and updating the position and orientation so as to minimize a sum total of distances between points and the geometric features of the three-dimensional model is iteratively executed.
The method of measuring the position and orientation of an object using a range image requires heavy-load arithmetic processing upon searching for planes of the three-dimensional shape model corresponding to respective points of the point group data. In order to cope with this, Japanese Patent Laid-Open No. 2006-202152 (to be referred to as reference 3 hereinafter) discloses a technique that speeds up association search processing at the time of registration of a plurality of range images. In this method, small planes (meshes) assigned with index values are fitted to a range image, the index values are converted into non-overlapping colors to obtain colors of the meshes, and an index image is generated by rendering the meshes based on the imaging position and orientation of the range image. Then, a three-dimensional point group converted from another range image is projected onto the index image based on the position and orientation at the time of imaging, and corresponding colors on the index image are acquired from the coordinate values of the projected three-dimensional points. After that, the index values corresponding to the acquired colors are inversely converted to associate the meshes with the three-dimensional points converted from the range image. In this manner, the association search processing is speeded up.
In the aforementioned registration method (reference 3) using the index image, in order to register the positions of a plurality of range images, all three-dimensional points extracted from the range images are projected onto a two-dimensional image to associate them with each other. For this reason, for example, when the position and orientation of a target object are measured from an image including an object other than the target object, since all measurement points including those of the object other than the target object have to be densely projected, resulting in wasteful processing.
In case of the method of reference 3, depending on three-dimensional shape models, the model has to be re-meshed so as to prevent meshes from being crushed at the time of projection. For this reason, it is difficult to use a CAD model of a target object intact as a three-dimensional shape model. Furthermore, in order to render meshes on a two-dimensional plane at high speed, dedicated hardware such as a GPU (Graphic Processing Unit) is required.
Furthermore, in case of the method of reference 1, all line segments as geometric features of a three-dimensional shape model as well as those on a back surface (portion which is not measured) of a target object have to be projected onto an image. Hence, the method of reference 1 also requires wasteful processing.
In this case, the method using a two-dimensional image is suited to, for example, an environment including many artificial objects on the premise of lines, and the method using a range image is suited to, for example, an object having a plurality of smooth planes.
Since the method using a two-dimensional image and that using a range image require different properties of information to be measured, the position and orientation measurement precision is expected to be improved by combining the model fitting to a two-dimensional image and that to a range image. In the model fitting to a two-dimensional image, processing for projecting geometric features of a three-dimensional shape model such as edges onto a two-dimensional image, and searching for corresponding geometric features on the two-dimensional image is executed, as described above. In the model fitting to a range image, measurement points are projected in place of geometric features of a three-dimensional shape model, as described above. That is, these methods use different association methods.
For this reason, the association processing cannot be executed using an identical framework, and has to be executed independently upon measuring the position and orientation of a target object using both a two-dimensional image and range image.