1. Field of the Invention
The present invention relates to a position and orientation measurement technique for measuring the position and orientation of an image capturing apparatus based on the distance between a position where a model of a measurement object is projected onto a captured image, and the position of the measurement object observed from the captured image.
2. Description of the Related Art
The position and orientation measurement of an image capturing apparatus (to be also referred to as “camera” hereinafter as needed) such as a camera that captures an image of a physical space and the like is required in, for example, a mixed reality system which blends and displays the physical space and a virtual space.
In general, as a method of measuring the position and orientation of an image capturing apparatus, a method of attaching indices, whose three-dimensional (3D) positions are given, to a measurement object is disclosed. In the position and orientation measurement method of this type, the position and orientation are estimated to minimize the differences between the projected positions obtained by projecting 3D positions of the indices onto an image capturing frame according to the estimated position and orientation, and the image positions of the indices detected from a captured image. In this case, as the indices serving as targets, those which have geometric features or those having specific colors or luminance values are used since it is desired to allow easy detection of such indices by image processing.
However, upon assuming the use over a broad range, the indices used to detect the position and orientation cannot always be attached to a measurement object. For example, when a measurement object is an architectural structure such as a building or the like, it is difficult to attach giant indices onto its wall surface.
As a method of solving this problem, many methods of estimating the position and orientation of a physical camera using the ridge lines of planes that configure an object have been proposed. With these methods, edges of a measurement object are extracted from a captured image, and the position and orientation of an image capturing apparatus are calculated based on correlation between the extracted edges and an edge image obtained by rendering a CAD model by CG.
If a measurement object is designed by CAD, it is easy to create a reference image by CG. However, even in this case, since it is required to exhaustively calculate possible positions and orientations, preparation processes are required.
As one of these methods, Japanese Patent Laid-Open No. 2004-326314 (to be referred to as reference 1 hereinafter) discloses a method using a CAD model. According to reference 1, edges are detected from 3D model projected images observed from a plurality of viewpoints, line segments of the edges are approximated using parameters and held, and correlations between an actually observed captured image and the line segments reconstructed based on the parameters are calculated. The position and orientation of a virtual camera of a 3D model projected image having a highest correlation are acquired as those of the image capturing apparatus.
With this method, reference images observed from a plurality of viewpoints are generated in advance, and image correlations with edges observed on a captured image are calculated. For this reason, huge amounts of time are required with increasing number of reference images. In order to precisely calculate the position and orientation, sufficient reference images need to be prepared. Hence, the method of reference 1 can only be applied to a case in which processing can be done for a long period of time after image capturing.
On the other hand, in mixed reality or object recognition of a robot, the position and orientation estimation processing of an image capturing apparatus is required to be executed at an update interval of captured images. The update interval of captured images in this case is 33 msec in case of an NTSC video signal. For this reason, it is difficult to apply the method that calculates the image correlations between the edges of many reference images and those of a captured image like in reference 1.
As another problem, when a measurement object is configured by a plurality of structural objects, and some of these structural objects are free to move, since the motion of each movable object is unknown, reference images cannot be acquired in advance. As another method of solving this problem, Tom Drummond and Roberto Cipolla, “Real-time visual tracking of complex structures,” IEEE Transaction of Pattern Analysis and Machine Intelligence, Vol. 24, No. 7, pp. 932-946, 2002 (to be referred to as reference 2 hereinafter) discloses a method of estimating the position and orientation of an image capturing apparatus online based on the edges in a captured image captured by the image capturing apparatus. Unlike in reference 1, this method does not require pre-acquisition of reference images, and can be applied to the estimation of position and orientation of an image capturing apparatus in real time. Since this method does not require any correlation calculations between the reference images and a captured image, the processing time is short.
In the method disclosed in reference 2, the position and orientation of a camera are estimated based on a technique for projecting a 3D model onto a captured image upon updating the position and orientation, and searching only an image around a search point for edges. Furthermore, as will be described later, this method is designed to shorten the processing time associated with image processing. As for movement of the measurement object, since the position and orientation can be individually estimated, the method is not limited in terms of motions.
An outline of the processing sequence of this method is as follows.
[1] Measurement line segments of a measurement object are projected onto an image capturing frame using a rough estimating position and orientation.
[2] The positions of edge regions where the density changes locally are acquired by searching pixels around the projected measurement line segments.
[3] The position and orientation are optimized to minimize the distances between the positions of the edge regions and the projected measurement line segments.
[4] The values of the estimating position and orientation are updated by the optimized position and orientation.
Since a 3D model of the measurement line segments to be projected uses only line segments that allow easy detection of edges, the position and orientation of an image capturing apparatus can be calculated with higher precision and at higher speed. In practice, since it takes much time to search all pixels around the projected measurement line segments, points obtained by further sampling the projected measurement line segments at appropriate intervals are used as search points.
With this method, a vertical, finite search in a one-dimensional direction is conducted at each search point in the direction of each projected measurement line segment. Assume that the edge to be searched is a region having a local density change in a captured image, and a region in which the density gradient is locally large is detected using an edge detection image filter such as a Laplacian filter or the like.
As the estimated position and orientation are closer to the actual position and orientation, the distances between the found edge positions of the captured image, and the measurement line segments projected onto the image become smaller. However, an actual captured image includes many noise components. Factors that influence the estimation precision include erroneous detection and non-detection of edges. For this reason, the measurement line segments had better be determined considering correspondence with edges observed on the captured image.
In this method, processing for excluding a line segment on a region hidden by another object, a line segment located at a position opposite to a line of sight, and the like from search points of an edge is executed. With this processing, the estimation of position and orientation can be applied to a case in which a plurality of measurement objects overlap in the line of sight direction or to a non-flat, complicated shape.
On the other hand, in a computer graphics technique, a hidden-line removal technique for selectively rendering a line hidden by a plane and a visible line is known. By applying this technique, a line hidden by a plane is removed, a line segment which is not observed on the measurement line segment captured image can be excluded from a search target of an edge. Transformation of a 3D model onto a captured image uses the estimating position and orientation of an image capturing apparatus. The positions of configuration planes and line segments of the 3D model undergo coordinate transformation to those observed from the line of sight direction corresponding to the estimating position and orientation, and a distance in the depth direction at that time is calculated. The distance in the depth direction is held as a range image corresponding to the captured image. The calculated distance in the depth direction is compared with the value of the range image, and when the calculated distance assumes a smaller value, the value of the range image is updated.
In reference 2, a range image is generated from a plurality of planes which configure a measurement object. Next, measurement line segments undergo coordinate transformation to have a rough position and orientation of the image capturing apparatus as a viewpoint. Of edge search point candidates of the measurement line segments, the values in the depth direction are compared with the value of the range image based on the configuration plane. If the value of the edge search point candidate is equal to or smaller than the value of the range image, edge search processing is then executed using that candidate as an edge search point.
Furthermore, in this method, in order to improve processing efficiency, rendering of the configuration planes of the measurement object is sped up using a rendering function of graphics rendering hardware. Information of the configuration planes of the measurement object is important upon determining whether or not line segments to be measured on an image are hidden by another measurement object. A line segment which is not observed on a captured image is selected as a measurement object unless it is checked whether or not that line segment is hidden by another plane, and the position and orientation estimation precision cannot be improved. For example, when measurement line segments located on the back side of a measurement object are used, they are not actually observed on the captured image in some cases. However, since other edges are erroneously detected as corresponding edges, such line segments are handled as points having large errors.
Upon application of the aforementioned method, data of measurement line segments, and data of the configuration planes of a measurement object used to test whether or not that line segment is actually observed need to be prepared. If the measurement object is designed using CAD or the like, the configuration planes and line segments can use that design.
In case of a 3D model configured by combining simple planes, since such 3D model has a configuration of edges to be observed similar to the configuration planes, the configuration plane data can be the same as data of the measurement line segments used in registration. However, in this case, since the measurement object needs to be a 3D model such as a simple rectangular parallelepiped or rectangle, such object has a narrow application range to an actual scene. In general, upon handling a measurement object having a complicated shape, it is difficult to set information of the configuration and that of the measurement line segments to be the same.
Even in a measurement object having a complicated shape, a complicated curved surface or object shape can be described using polygons such as simple triangles or rectangles. However, such polygons have a problem in the use as measurement line segments.
The condition required for line segments to be observed as edges on a captured image includes discontinuity of the boundaries of configuration planes from adjacent configuration planes, and discontinuity of the physical properties of configuration planes with respect to light from adjacent configuration planes. For this reason, a complicated plane is normally configured by polygons of a plurality of triangles or rectangles having an identical normal. Not all line segments of the polygons which configure such complicated plane are observed as edges on an actual captured image. In case of planes which configure a curved surface, even when the planes are continuous, they have different overall reflection angles in terms of an illumination position depending on the observation position, and they may be consequently observed as edges upon observation from far.
In this manner, when the configuration planes are used as measurement line segments, even for line segments located on the front side in the line of sight direction, if one plane is configured by a plurality of polygons, a line segment sandwiched between planes of a 3D model does not normally correspond to an edge observed on the image capturing frame. Therefore, the configuration planes of a 3D model cannot be used as information of measurement line segments for registration intact.
Therefore, the measurement line segments and line segments used to form the configuration planes of a 3D model may be held as independent data. However, such configuration suffers the following problem. For example, it is difficult to maintain consistency between information of the configuration planes of the measurement object, and actual measurement line segments, and a flexible change of the measurement object cannot be coped with. In other words, no means for using a list of a huge number of vertices and line segments that express shapes, and data sets of measurement line segments as a partial set of such list while maintaining consistency has been proposed. For example, in the studies of reference 2 or the like, a measurement object for verification is given, and information of the configuration planes and information of the measurement line segments, which are adjusted in advance, are independently held, thus implementing the position and orientation registration of the given measurement object.
A technique for estimating the position and orientation by reducing differences between line segments obtained by projecting a 3D model of a measurement object onto a captured image, and the corresponding edge positions on the captured image is known. In this technique, in order to handle a complicated measurement object, line segments of a model to be measured are associated with actually observed edges, thus improving the precision. Since there are no procedures using the aforementioned method of estimation of position and orientation while maintaining consistency between data sets of the configuration planes of objects used there and those used in measurement, such position and orientation estimation apparatus has poor convenience.