Along with the development of robot technologies in recent years, robots perform complicated tasks, which, up until now, were performed manually. As an example of such complicated tasks, assembling processes of industrial products are known. Such a robot has to hold a component and to fit the component to another component by an end effecter such as a robot hand, so as to control the robot to perform assembling processes. For this purpose, a relative position and orientation between the component to be held or fit and the robot have to be measured. Then, a moving plan of the robot hand has to be designed based on the measurement result so as to control actuators required to actually drive the robot hand.
Conventionally, the position and orientation of the robot are measured using a camera or distance sensor mounted on the robot, and methods using a two-dimensional image and range image are typical. Especially, the method using a range image has following advantages compared to that using a two-dimensional image. That is, the method using a range image can directly obtain position information in an image capturing direction, can obtain geometric information even when a target object has poor texture information, and is insusceptible to a surrounding light source environment. For this reason, the method using a range image takes on increasing importance in practical applications such as product assembling in a factory.
As a method to estimate the position and orientation of an object in a captured scene from a range image, a method to fit a geometric model of the object to the range image predominates, many studies have been conventionally made. For example, non-patent reference 1 (P. J. Besl and N. D. McKay, “A method for registration of 3-D shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, no. 2, pp. 239-256, 1992.) discloses a method of measuring the position and orientation of an object by converting a range image into three-dimensional point cloud data, and fitting a three-dimensional model to the point cloud data (model fitting method). That is, the position and orientation are calculated by searching for neighboring planes of the three-dimensional model in association with respective points of the point cloud data based on approximate values of the position and orientation, and repeating optimization of the position and orientation so as to minimize a sum of distances between the points and planes.
The aforementioned model fitting method searches for correspondences between measurement data and a model, and minimizes differences between corresponding pairs. However, obviously wrong correspondence pairs are often detected due to causes such as noise of measurement data and errors of initial position and orientation estimated values. A method group called robust estimation which sets a small weight for such obvious errors (“outliers”) to prevent outliers from adversely affecting the estimation result is used. As a typical method of the robust estimation, M-estimation is known.
Depending on the measurement principle of an image capturing apparatus which captures a range image, data may often be lost in the captured range image. For example, in a light-section method, light which is projected by an illumination apparatus onto a target object is observed by a camera from a direction different from the projection direction. At this time, a light projected region may be partially occluded by the target object itself, and may often not be observed by the camera. In such case, data of the range image is lost for the region on the object, which cannot be observed from the camera. FIGS. 1A and 1B show an example in which data of a range image is lost. FIG. 1A shows a range image obtained by capturing an image of a target object, and FIG. 1B shows, for example, regions such as concave portions when viewed from the surface of the target object, where data (distance values) are lost since depth values cannot be measured.
When a geometric model of the target object is to be fitted to such range image which suffers lost data, the range image and geometric model are often erroneously associated with each other. The concept of an association error will be described below with reference to FIG. 2. A sectional shape 201 is formed by cutting the geometric model by a plane parallel to the plane of drawing. A sectional shape 202 is that of a plane defined by a point cloud on a three-dimensional space corresponding to pixels of the range image. Also, a spatial region 203 represents a spatial region where distance values cannot be measured due to occlusion from the camera. As for a three-dimensional point cloud (a portion indicated by the broken line of the sectional shape 202), depth data of the range image are lost. A case will be examined below wherein, for example, of a measurement point cloud, a point on a model having a minimum distance from a point A is searched for as a corresponding point of the point A. Since the point A is located on the bottom surface of a concave portion of the sectional shape, it is desirably associated with a point on the bottom surface of the model such as a point B. However, in this case, a corresponding point of the point A having the shortest distance is a point C on the model, thus causing an association error. When the geometric model and measurement point cloud are erroneously associated with each other, since the influence of attempting to minimize distances between wrong correspondence pairs appears in position/orientation estimation calculations, thus lowering the position/orientation estimation accuracy.
Use of the aforementioned robust estimation cannot always reduce the adverse influence of wrong correspondence pairs. This is because distances between wrong correspondence pairs due to measurement losses are not always sufficiently larger than distances between correct correspondence pairs, and their degrees of contribution to the position/orientation estimation calculations are equivalent to those of the correct correspondence pairs.