In recent years, robots perform complicated tasks, which were manually performed so far. As a representative example of such complicated tasks, assembling processes of industrial products are known. Such robot grips parts by an end effecter such as a hand, so as to autonomously perform assembling processes. In order to control the robot to grip a part, a relative position and orientation between the part to be gripped and the robot are measured. Then, a moving plan has to be designed based on the measurement result so as to control actuators.
In mixed reality, that is, a so-called MR technique as well, in order to seamlessly merge real and virtual worlds in real time, the position and orientation have to be measured. The position and orientation are conventionally measured using a camera and distance sensor. As a representative method of such measurement, methods using a two-dimensional image and range image are known. In the field of the MR technique, a technique for measuring the position and orientation of a head mounted display (to be abbreviated as HMD hereinafter) using an image captured by a camera mounted on the HMD has been studied.
D. G. Lowe, “Fitting parameterized three-dimensional models to images,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, no. 5, pp. 441-450, 1991. (to be referred to as reference 1 hereinafter) discloses a technique for measuring the position and orientation of an object by performing model fitting to a two-dimensional image without requiring sufficient texture information. With this technique, a three-dimensional geometric model of an object is fitted to line segments (edges) detected from a two-dimensional image. Thus, the position and orientation of the object are measured. More specifically, after edge detection is performed from the entire two-dimensional image, the two-dimensional image is separated into line segments using local connection information. The position and orientation of the object are measured using a Gauss-Newton method so as to minimize a sum total of distances on the image between the end points on both sides of the detected line segments and the line segments of the three-dimensional geometric model based on approximate values of the position and orientation of the object.
Also, T. Drummond and R. Cipolla, “Real-time visual tracking of complex structures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 932-946, 2002. (to be referred to as reference 2 hereinafter) discloses a technique for conducting an edge search only in the vicinity of line segments of a three-dimensional geometric model based on approximate values of the position and orientation of an object. With this technique, since edges are not detected from an entire two-dimensional image, the position and orientation of the object can be quickly measured. Also, reference 2 also describes a technique for calculating reliabilities of respective edges detected from a two-dimensional image according to color contrasts and distances from other edges, and weighting the edges based on the calculated reliabilities. Then, contribution ratios of the respective edges in position/orientation calculation processing are changed.
Furthermore, Japanese Patent Laid-Open No. 2007-20751 (to be referred to as reference 3 hereinafter) discloses a technique for generating a three-dimensional geometric model by leaving only edges which are observed to have higher probabilities in association with generation of a three-dimensional geometric model used in model fitting. With this technique, a plurality of images are generated by observing a three-dimensional geometric model of an object from different view-points, and only edges which are commonly observed from the plurality of view-point images are left. Then, a model with a high edge observability is generated.
The technique of reference 1 is premised on that edges are correctly detected from an image, and the correspondence between the edges detected from the image and those of a three-dimensional geometric model is correct. For this reason, the position and orientation of an object can be measured with high precision as long as edges that obviously indicate texture and shape changes of a target object are always stably detected.
However, when an imaging view-point is moved at the time of imaging of an object, it is difficult to stably detect edges from a captured image. Since appearances of colors of the object in the image change depending on the imaging view-point, edges detected in the image normally change accordingly. In such case, the correspondence between the edges detected from the image and those of a three-dimensional geometric model readily causes errors, and the position/orientation calculation processing precision in the technique of reference 1 drops.
Also, with the technique of reference 2, since edge reliabilities are calculated in real time, the processing load on the position/orientation calculation processing is heavy. Since edges are also detected from a background region other than a measurement object in an image, there is a risk of matching of wrong edges.
Furthermore, with the technique of reference 3, as described above, a model having only edges with high observabilities is generated by leaving edges observed from a plurality of view-point images. This technique is premised on that edges with high observabilities are uniformly and sufficiently distributed on the entire three-dimensional geometric model.
However, in practice, the edge observabilities largely vary for respective view-points. For this reason, when only edges with high observabilities at a plurality of view-points are to be left, only an insufficient number of edges may be obtained. Even when a large number of edges with high observabilities are left, these edges may be biased to a portion of a model. In such case, the position/orientation calculation processing precision drops due to insufficiency or bias of edges.