1. Field of the Invention
The present invention relates to a measurement apparatus and a control method, in which a model of a measurement object is used to measure the position and orientation of an image-capturing apparatus as the image-capturing apparatus captures an image of the measurement object.
2. Description of the Related Art
Measurement of the position and orientation of an image-capturing apparatus, such as a camera capturing an image of real space (hereafter referred to as a camera as appropriate), is required, for instance, in mixed reality systems providing a combined display of real space and virtual space.
In the past, a method employing markers having known 3D positions was known as a method used for measuring the position and orientation of an image-capturing apparatus. In this method, the distance between the positions of markers detected in a captured image within the captured image and projected positions obtained by projecting the 3D positions of the markers onto an imaging plane based on the general position and orientation of the image-capturing apparatus is used as an error, and the position and orientation are estimated so as to optimize a target function that minimizes the error. Moreover, the markers employed are often provided with special easy-to-detect geometric or hue-related features.
Furthermore, there has been disclosed a method utilizing a model of the measurement object for position and orientation measurement, in which the structure of the measurement object is employed and the boundaries of planes constituting the measurement object are viewed as edges. It should be noted that the term “edge”, as used herein, refers to a region in which considerable changes in density are observed in a captured image.
A method, in which the position and orientation are estimated with the help of line segments (called measurement line segments) as the geometric features of a measurement object, has been described by Tom Drummond and Roberto Cipolla in “Real-time visual tracking of complex structures”, IEEE Transactions of Pattern Analysis and Machine Intelligence. Vol. 24, No. 7, pp. 932-946, 2002 (hereafter referred to as Document 1). Under this method, the three-dimensional positions of measurement line segments are projected onto an imaging plane, as viewed from the general position and orientation of an image-capturing apparatus, and the position and orientation are estimated by utilizing the distances between edges detected in the captured image and the projected measurement line segments as a target function.
An outline of this technique is provided below.
Using a general estimated position and orientation, measurement line segments from a measurement object are projected onto a captured image.
The positions of the edge regions, in which density undergoes local changes, are obtained by searching pixels in the captured image in the vicinity of the projected measurement line segments.
Optimization is carried out such that the distance between the projected measurement line segments and the position of the edge regions is reduced.
The value of the estimated position and orientation is updated.
Such position and orientation estimation based on using measurement line segments present in measurement objects has a wide range of applications because, so long as the target 3D model is known, it can be used as the measurement object of the measurement apparatus. In addition, during the above-mentioned estimation, the range of edge search within a captured image is limited to the image around the projected measurement line segments. Accordingly, this provides the advantage that processing time can be shortened in comparison with the method, in which the distance of the model is obtained upon detection of edges by image processing from the entire captured image. For this reason, it has been used for image-capturing apparatus alignment requiring real-time processing in the mixed reality, such as head position estimation and the like.
In Document 1, a measurement object of a relatively simple shape is used as the measurement object, and the distance between the image-capturing apparatus and the measurement object does not change much either. For this reason, little change occurs in the observed edges while slight changes take place in orientation of the measurement object model in the field of view, which makes estimation of position and orientation possible.
In real environments, the shadows etc. of measurement objects are often viewed as edges, which often makes position and orientation estimation unreliable. In L. Vacchetti, V. Lepetit and P. Fua, “Combining edge and texture information for real-time accurate 3D camera tracking”, Proceedings of International Symposium on Mixed and Augmented Reality, pp. 48-57, 2004 (hereafter referred to as Document 2), one of multiple observed edges is associated with a measurement line segment and optimization calculations are carried out such that the distance between the associated edge and the measurement line segment projected on the imaging plane is minimized. In accordance with Document 2, in environments comprising edges other than the edges to be measured, robust position and orientation estimation can be performed by means of convergence carried out so as to minimize the error by repeatedly using the above-described association-related assumption.
Methods that have been proposed so far work well when the relative positional relationship between the measurement object and the image-capturing apparatus that views it does not change very much. However, problems arise when the image-capturing apparatus is combined with human motion, that is, when a person is holding the image-capturing apparatus itself and moving for the purpose of navigation, etc. These problems are due to the fact that considerable changes take place in the relative position of the measurement object and the image-capturing apparatus when a person walks around a building or outdoors.
Here, as an example, FIGS. 2A-2H illustrate a situation in which a person holding an image-capturing apparatus is walking along an indoor corridor. FIGS. 2A, 2C, 2E, and 2G represent edges of building structures, as viewed from the viewpoint position of a person walking inside. FIGS. 2B, 2D, 2F, and 2H are figures showing the indoor environment from above, wherein the black dot shows the position of the walking person and the triangle attached thereto shows the direction of gaze of the walking person. Overhead views obtained in the respective positions in FIGS. 2A, 2C, 2E, and 2G correspond to FIGS. 2B, 2D, 2E, and 2H.
If we look at the edge of the door in FIG. 2A and the edge of the door in FIG. 2C, we can see that despite the fact that we are looking at the same observed object, the same door, the configuration of the image geometric features that can be observed has changed due to the difference in the relative position of the viewpoint and the object. Furthermore, FIG. 2E and FIG. 2G illustrate a situation in which the walking person quickly looks back in the direction the person had come from. As can be seen, at such time, the object viewed by the walking person abruptly changes from the edges of a proximate door to the edges of a corridor stretching away. Thus, the viewpoint changes illustrated in FIGS. 2A, 2C, 2E, and 2G frequently occur in applications, in which the measurement object is an indoor or outdoor structure and the image-capturing apparatus is carried around.
When the relative position of the observed object and image-capturing apparatus changes, problems arise in projection onto a captured image and geometric features of the measurement object. Namely, when viewing detailed geometric features located relatively far from the image-capturing apparatus, the spacing between projection geometric features proximate each other on the projection image narrows down, and, in certain situations, multiple geometric features may be projected within less than 1 pixel. In order to handle such circumstances, it is contemplated to switch geometric features in accordance with the relative positional relationship of the geometric features and the image-capturing apparatus, but it is necessary to set the relationship between the relative position of the geometric features and the image-capturing apparatus in advance in a systematic fashion.
However, the task that arises in environments such as corridors, where one can glance back, is to delay switching between multiple models. Moreover, basically, it is difficult to detect segments smaller than one pixel in captured images. For this reason, when proximate geometric features are far away, it becomes difficult to detect both image geometric features in the respective projection geometric features within a captured image, which affects the results of estimation. At the same time, it becomes more difficult to implement efficient processing because image geometric feature search regions obtained from the proximate projection geometric features overlap.
Furthermore, nonlinear calculations have to be repeatedly carried out during optimization calculations used to obtain the position and orientation. For this reason, when a lot of time is spent on processing such as image geometric feature detection and the like, iteration is stopped before sufficient accuracy is reached. This is undesirable in mixed reality technology, which requires real-time processing at the video frame rate and accuracy of position and orientation.