1. Field of the Invention
The present invention relates to a position and orientation measurement device and a position and orientation measurement method for measuring a position and orientation of an object to be measured.
2. Description of the Related Art
In recent years, as robot technology develops, robots are beginning to perform complex tasks, such as an assembly of industrial products, which have been performed by human beings. An arm type robot is mainly used to perform such tasks, and the robot grips a part by an end effector such as a hand attached to the tip of the arm and performs an assembly. It is necessary to accurately measure relative position and orientation between the part and the robot (end effector) for the robot to appropriately grip the part. When a robotic assembly is applied to an actual assembly operation, the measurement of position and orientation of a part needs to be quick and accurate. Such a position and orientation measurement technique is required in various fields such as self-position estimation for a robot to move autonomously and creation of three-dimensional model data from actual measurement data in addition to assembly of industrial products by a robot.
To measure position and orientation of a part in a manufacturing site of industrial products, a grayscale (color) image obtained from a camera and a distance image obtained from a noncontact distance sensor are mainly used. Measurement of position and orientation for each object is generally performed by fitting a three-dimensional shape model of the object to measurement data (grayscale image or distance image). In T. Drummond and R. Cipolla, “Real-time visual tracking of complex structures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 932-946, 2002 (hereinafter referred to as Non-Patent Document 1), a method is disclosed in which a three-dimensional shape model of an object is represented by a wireframe model that is a set of line segments and a projection image of the line segments is fitted to edges on a grayscale image, so that the position and orientation of the object is measured. In the method using the edges on the grayscale image, the position and orientation of the object is measured so that “a distance on a two-dimensional image plane” between a projection image of a portion to be an edge on the image such as a contour portion of the object and a boundary of surfaces and an edge on the image becomes minimum. Therefore, a measurement accuracy of a component of the position and orientation, which largely changes the distance on the two-dimensional image plane, is high. However, the measurement accuracy of components other than the above component is not necessarily high. Specifically, the measurement accuracy of a position component in a direction perpendicular to the optical axis of the camera and an orientation component around the optical axis is high. On the other hand, the measurement accuracy of a position component (depth) in a direction of the optical axis of the camera is low.
In D. A. Simon, M. Hebert, and T. Kanade, “Real-time 3-D pose estimation using a high-speed range sensor,” Proc. 1994 IEEE International Conference on Robotics and Automation (ICRA'94), pp. 2235-2241, 1994 (hereinafter referred to as Non-Patent Document 2), a method is disclosed in which a three-dimensional shape model (polygon model) of an object is fitted to a three-dimensional point group data on the surface of the object which is obtained from a distance image, so that the position and orientation of the object is measured. The method that uses a distance image as disclosed in Non-Patent Document 2 is a method for directly minimizing a distance “in a three-dimensional space” between the point group and the model, so that the measurement accuracy of the position and orientation is basically high. However, in an active stereo method disclosed in Satoh, Iguchi, “Distance image input by space code” Journal of Electronics, Information and Communication, vol. J68-D, no. 3, pp. 369-375, 1985, a distance of the contour portion of the object may not be able to be measured stably, so that the position and orientation at which the contour portion correctly corresponds cannot be measured depending on the shape and the observation direction of the object.
In view of the characteristics of the method using a grayscale image and the method using a distance image described above, it can be said that there is a complementary relationship between information obtained from the grayscale image and information obtained from the distance image in the estimation of the position and orientation. Therefore, the position and orientation is measured so that the three-dimensional shape model fits to both the grayscale image and the distance image, and thereby the measurement accuracy of the position and orientation can be improved. In the methods described below which are disclosed in Non-Patent Document 1 and Non-Patent Document 2, the sum of squares of errors between the model projection image and edges on a two-dimensional image plane and the sum of squares of errors between the model and the point group in a three-dimensional space are minimized respectively as an evaluation function. The scale of the distance on the two-dimensional image plane is different from that of the error in the three-dimensional space, so that, in a simple method that estimates the position and orientation so that the sum of the two evaluation functions becomes minimum, there is a problem that influence of one of the two evaluation functions becomes large. Conventionally, a measurement method of the position and orientation is proposed in which information of the grayscale image and information of the distance image are complementarily used by collectively evaluating errors of different scales by using a common scale. Here, as one of common scales, errors in the two-dimensional image plane and errors in the three-dimensional space are respectively represented by probabilities of occurrence (likelihoods) in probability distributions of these errors, and a highly accurate measurement of the position and orientation is performed by maximizing the product of the likelihoods.
There is a strict restriction of time in a manufacturing process of industrial products, so that a measurement of a position and orientation of a part, which is a part of the manufacturing process, needs to be performed as fast as possible. There are many other cases, in which the measurement of a position and orientation needs to be performed quickly, such as self-position estimation of a robot. The estimation of the position and orientation by fitting the three-dimensional shape model to the grayscale image and the distance image includes two steps of (1) associating the model with the image and (2) calculating the position and orientation based on a result of the association. To estimate a highly accurate position and orientation, these steps are generally repeated a plurality of times. In the two steps described above, the calculation time taken to associate the model with the image often becomes a problem.
In the method disclosed in Non-Patent Document 1 described above, line segments in the three-dimensional shape model are projected onto the image on the basis of an initial value of the position and orientation, and corresponding edges are searched near the projected images on the image, so that the association process of the model with the image is speeded up. On the other hand, when associating the distance image with the three-dimensional shape model, it is necessary to search for a nearest point in the three-dimensional space.
In the Non-Patent Document 2 described above, the nearest point is searched by using a kd-tree. However, in this method, a calculation in which the order is O(N log M) is required (N is the number of points in data, M is the number of points in the model), so that it takes time to use data having a large number of measurement points, such as the distance image in which each pixel becomes a measurement point.
On the other hand, conventionally, in the same manner as the method of Non-Patent Document 1 described above, a method in which the model is projected onto the image to speed up the association is proposed. Specifically, a calculation for projecting a geometric feature in the three-dimensional shape model onto the image is performed on the basis of an initial value of the position and orientation, and the position of the geometric feature on the image is calculated. Next, the geometric feature is associated with a pixel in the distance image on the basis of the position of the geometric feature on the image, so that a quick association is realized.
However, to associate the image with the three-dimensional shape model on the basis of the projection of the three-dimensional shape model onto the image, it is required that the projection image of the three-dimensional shape model and the image of the actual object sufficiently overlap each other. Therefore, if a shift between the initial value of the position and orientation used for projection and the actual position and orientation is large, a large number of association errors occur and the calculation of the position and orientation in a later process fails. When associating edges in the grayscale image, explicit features (edges) are used, so that the association is relatively robust against a shift of initial value.
On the other hand, when associating the distance image, explicit features such as edges in the grayscale image are not used, so that the association is not so robust against a shift of initial value and the calculation of the position and orientation may fail. Conversely, if the shift of initial value is small and the distance image can be correctly associated with the model, it is possible to measure the position and orientation at a high degree of accuracy by using the distance image. As described above, the robust property against the shift of initial value is different between the grayscale image and the distance image. In other words, conventionally, the characteristics described above are not considered, and the robust property against the shift of initial value can be further improved by selectively using measurement data with good characteristics step-by-step.