1. Field of the Invention
The present invention relates to a position and orientation measurement apparatus and method thereof which measure a position and orientation of an imaging apparatus relative to an observation object, the measurement using an image of the observation object captured by the imaging apparatus and three-dimensional model data which represents a surface shape of the observation object.
2. Description of the Related Art
A conventional technique has been discussed in which a relative position and orientation of an imaging apparatus which captures the observation object relative to an observation object is measured by the imaging apparatus such as a camera which captures a real space. Such a position and orientation measurement technique is considerably beneficial for a mixed reality system which integrates and displays the real space and a virtual space, and also for measurement of the position and orientation of a robot.
There is known a method for acquiring the position and orientation of the imaging apparatus based on a corresponding relationship between an edge in an image and the three-dimensional model of the observation object. A region where luminance observed via the observation object captured into a captured image discontinuously changes is referred to as an edge. Since the edge never changes with respect to a scale or an observation direction, positioning using the edge can be characterized by high accuracy thereof. The positioning using the edge discussed in Literature 1 (formal citation provided below) can be implemented by processing (1) to (3).    (1) The three-dimensional line segment model is projected onto the image based on the position and orientation of the camera of a previous frame and a specific parameter of the camera which has been previously corrected.    (2) Each projected line segment is divided at a constant space on the image, and dividing points are set. The edge is searched on the line segment (also referred to as a “searching line”) which passes the dividing point and is oriented in a direction of a normal line of the projected line segment. A point which has an extreme slope of a luminance value on the searching line and is closest to the dividing point is detected as a corresponding edge.    (3) Correction values of the position and orientation of the camera are calculated to acquire a smallest sum of distance on the image between the corresponding edge detected at each division point and the projected line segment. The position and orientation of the camera is corrected with the correction values.
In positioning using the edge, the edge in the three-dimensional line segment model may not be detected from the captured image. If many edges which cannot be recognized as the three-dimensional line segment model are detected from the captured image, wrong points can be detected as the corresponding points, thus resulting in erroneous correspondence. The erroneous correspondence can cause a local solution in which a calculation for optimizing the position and orientation is repeated and never converges. Thus, the accuracy of the position and orientation of the imaging apparatus may be deteriorated.
Addressing the problem, Literature 2 (formal citation provided below) discusses a method for using an M estimation. More specifically, to minimize a sum of weighing errors, Literature 2 puts less weight on data in which the distance between the corresponding point and the line segment is long and more weight on data in which the distance is short. Thus, an effect of the erroneous correspondence can be eliminated.
Literature 4 (formal citation provided below) improves accuracy of correspondence between the three-dimensional line segment model and the edge in the captured image by holding information about an appearance of a periphery of the line segment on the image. The information about an appearance of the edge corresponding to the three-dimensional line segment model in the captured image is updated as needed to eliminate the effect of the erroneous correspondence caused by a change of illumination or a change of a viewing location.
Furthermore, according to Literature 3 (formal citation provided below), to help prevent erroneous positioning using the edge, in addition to the information about the three-dimensional line segment model, information about a feature of a point extracted from the captured observation object is also used in the captured image.
The position and orientation of the camera of a current frame is calculated by repeating an operation based on the correspondence between point features extracted from the captured images of frames, and the calculated position and orientation is integrated with a result of positioning of the edge to improve stability for estimating the position and orientation.
In positioning using the edge, it is required to include the three-dimensional line segment model of the observation object described by a set of the three-dimensional line segments. According to Literatures 1, 2, 3, and 4, the three-dimensional line segment model of the observation object is manually generated by using an actual observation object.
However, selecting and manually generating the three-dimensional edge which is to be observed from the actual observation object are comparatively complicated and highly time consuming. Furthermore, the edge based on a contour formed by a curved surface is hard to be described as the off-line three-dimensional line segment model.
In contrast, Japanese Patent Application Laid-Open No. 2007-207251 discusses a method for estimating the position and orientation with the edge by generating the three-dimensional line segment model from the three-dimensional model data which represents a shape of the observation object such as a computer-aided design (CAD) model. According to this method, to generate the three-dimensional line segment model, the three-dimensional model is previously drawn and the edge based on geometric information about the three-dimensional model data is detected from the drawing result.
Further, to generate a three-dimensional line segment model, Literature 5 (formal citation provided below) previously draws the three-dimensional model data of the observation object and extracts the edge based on the geometric information from a value of depth of the viewing location at which the image is drawn. According to the methods described above, the three-dimensional model of the observation object is previously drawn and the edge is extracted from the drawn image to generate the three-dimensional line segment model of the observation object. Therefore, a labor for manually generating the three-dimensional line segment model of the observation object can be decreased.
However, there is a problem in which the method described above cannot be applied for such a case where a previously-drawn appearance of the observation object and an on-line appearance thereof are not always the same, for example, where an object having a curved surface is viewed from various directions. Addressing the problem, Literature 6 (formal citation provided below) discusses a method for estimating the position and orientation by drawing on-line the three-dimensional model in which texture is set and by generating the three-dimensional line segment model from the drawn image as needed.
According to this method, the three-dimensional model data in which the texture is set is drawn as needed to have the similar appearance to that of the observation object in the real space. Thus, the three-dimensional model data can be applied to any case when being viewed from various directions.
On the other hand, Literature 7 (formal citation provided below) discusses a method for extracting the feature and calculating the three-dimensional position only from the captured image without using the three-dimensional model data, and estimating the position and orientation while generating the three-dimensional line segment model on line. According to this method, the position and orientation can be estimated by generating three-dimensional line segment model only from the images captured in time series without using the three-dimensional model of the observation object.
The conventional technique discussed in Japanese Patent Application Laid-Open No. 2007-207251 generates the three-dimensional line segment model based on the geometric information about the three-dimensional model data which represents the shape of the surface on the observation object. However, there is no guarantee for the method for generating the three-dimensional line segment model by previously drawing the three-dimensional model data off line that the feature corresponding to the generated three-dimensional line segment model is always detected from the captured image.
That is because whether the feature corresponding to the three-dimensional line segment model can be detected from the captured image depends on a color of a surface on the observation object or an illumination environment when the image is captured. When the feature corresponding to the generated three-dimensional line segment model cannot be detected from the captured image, the line segment is highly likely to cause the erroneous correspondence. Thus, the accuracy for estimating the position and orientation can be deteriorated.
Further, since the method discussed in Literature 5 previously draws the three-dimensional model and generates the three-dimensional line segment model from a depth value of the drawn image, the feature corresponding to the generated three-dimensional line segment model cannot be always detected from the actual captured image.
Addressing the problems of the erroneous correspondence, it can be considered to make the correspondence more robust over the erroneous correspondence by using the M estimation as discussed in Literature 2. However, since the method is directed only to the three-dimensional line segment model formed of only the edges consistently detected from the captured image, the line segment is merely eliminated as an outlier when the line segment registered with the three-dimensional line segment model is not observed from the captured image.
Therefore, if there are plenty of features to be eliminated as erroneous correspondence, for example, if the three-dimensional line segment model itself is inaccurate, the position and orientation cannot be stably estimated. Furthermore, the method for improving robustness of the correspondence by using the appearance of the periphery of the line segment as discussed in Literature 4 merely eliminates the feature as the outlier when the feature corresponding to the three-dimensional line segment model is not detected from the captured image. Thus, even if the method of Literature 4 is used, the position and orientation is not stably estimated when the three-dimensional line segment model is inaccurate.
If the feature which cannot be geometrically detected but can be detected as a region having a discontinuous color, for example a design of a surface on an object, exists on the observation object, the conventional method cannot use the feature based on the color. On the contrary, the feature can cause the erroneous correspondence. The method discussed in Literature 6 uses the three-dimensional model data in which texture data is set to draw a similar appearance to that when being viewed and generates the three-dimensional line segment model including the feature based on the color to estimate the position and orientation.
However, the method of Literature 6 is directed to using the three-dimensional model data in which the texture data is set to draw the similar appearance to that when being viewed.
Thus, if the appearance of the three-dimensional model data is different from that of the actual observation object, for example when the setting of the three-dimensional model data is inaccurate, or a light source changes in an actual environment, the generated three-dimensional line segment model is inaccurate. Therefore, the three-dimensional model data needs to be generated to have the similar appearance to that of the observation object. However, when the actual environment changes, there is no method for dealing with the problem.
The method discussed in Literature 3 extracts the feature based on the color as needed from the captured image on which the observation object is captured, adds the feature to the three-dimensional line segment model, and estimates the position and orientation using the extracted feature.
The method emphasizes using the feature extracted from the captured image together with the three-dimensional line segment model, which is directed to be formed of only the edges sufficiently observed as the features. Accordingly, when the three-dimensional line segment model is inaccurate, similarly to the above-described method, the erroneous correspondence frequently occurs between the three-dimensional line segment data and the edge in the captured image. Thus, the position and orientation cannot be stably estimated.
Further, the method for estimating the position and orientation by generating the three-dimensional line segment model from only the captured image as discussed in Literature 7 does not have enough accuracy for estimating the position and orientation. That is because the accuracy of the three-dimensional line segment model is deteriorated compared to the method for generating the three-dimensional line segment model using the three-dimensional model data.
Formal citations of the Literature 1-7 are as follows:    (Literature 1) T. Drummond and R. Cipolla, “Real-time visual tracking of complex structures,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 7, pp. 932-946, 2002.    (Literature 2) L. Vacchetti, V. Lepetit, and P. Fua, “Combining edge and texture information for real-time accurate 3D camera tracking,” Proc. The 3rd IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR04), pp. 48-57, 2004.    (Literature 3) E. Rosten and T. Drummond, “Fusing points and lines for high performance tracking,” Proc. The 10th IEEE International Conference on Computer Vision (ICCV'05), pp. 1508-1515, 2005.    (Literature 4) H. Wuest, F. Vial, and D. Stricker, “Adaptive line tracking with multiple hypotheses for augmented reality,” Proc. The Fourth Int'l Symp. on Mixed and Augmented Reality (ISMAR05), pp. 62-69, 2005.    (Literature 5) G . Bleser, H. Wuest, D. Stricker, “Online camera pose estimation in partially known and dynamic scenes,” Proc. The 5th IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR06), pp. 56-65, 2006.    (Literature 6) G. Reitmayr and T. W. Drummond, “Going out: robust model-based tracking for outdoor augmented reality,” Proc. The 5th IEEE/ACM International Symposium on Mixed and Augmented Reality (ISMAR06), pp. 109-118 2006.    (Literature 7) E. Eade and T. Drummond, “Edge landmarks in monocular SLAM,” Proc. BMVC06, pp. 7-16, 2006.