Field of the Invention
The present invention relates to a technique capable of obtaining position and orientation information of an imaging apparatus based on an image captured by the imaging apparatus.
Description of the Related Art
There is a conventional method for measuring a position and an orientation of a physical object in a space which includes capturing an image of the physical object which moves in the space, with a camera and detecting a position of a projection image (referred to as a “image feature”) on an image of a geometric shape or a color feature (referred to as a “geometric feature”). The method further includes capturing the image of the geometric feature in the space and estimating a position and an orientation of the camera in the space. The above-described position/orientation measurement method requires preparing three-dimensional location information of each geometric feature involved in an image (hereinafter, referred to as “feature location information” or “location information”).
There is a simple method using an appropriate measurement device or a ruler to measure location information of each geometric feature in a physical space. However, such a method is not useful since handling such a measurement device or a ruler is troublesome for a user and measurement accuracy may not be obtained as expected. To solve this problem, as discussed in Bolan Jiang, Ulrich Neumann: “Extendible Tracking by Line Auto-Calibration,” isar, IEEE and ACM International Symposium on Augmented Reality (ISAR'01), p.p. 97-106, 2001 (hereinafter, referred to as “literature 1”), location information of a geometric feature in a space can be automatically calculated based on coordinates of an image feature detected by images captured by a plurality of cameras whose positions and orientations are known beforehand.
When the method discussed in the literature 1 is used to automatically generate location information, location information including a large error may be registered. Even if the location information is accurately generated, it may not be suitable for estimating camera position/orientation. For example, if a moving object is present in a space and geometric information on the object is registered, a geometric feature may deviate from a reference coordinate system and may not serve as a reference in obtaining the camera position/orientation.
Therefore, the moving object influences an estimation value of the camera position/orientation and causes an error. Further, if a registered geometric feature has a repetition pattern that is difficult to discriminate (e.g., a blind or a floor design), the geometric feature tends to be erroneously identified. Accuracy in estimating the camera position/orientation will deteriorate. If a number of unnecessary geometric features increases, processing speed will decrease.
Match moving technique is usable to generate a moving path of a camera based on image features in a moving image captured by a moving camera (referred to as “background art 1”). The background art 1 can automatically obtain feature location information by tracking a plurality of characteristic points on a moving image. However, according to the background art 1, if an object moving in a space is automatically extracted and tracked, an appropriate camera moving path may not be generated. To solve this problem, a user interface is available to manually select image feature(s) that disturb generation of the camera moving path. For example, the interface enables a user to designate a two-dimensional closed region encompassing an image feature which is unnecessary to track on each captured still image, so that no tracking is performed in the designated region.
Further, in the background art 1, when a moving image includes consecutively captured images of an object (or region) to be removed, it is possible to designate a specific region on a plurality of still images (key frames) of the moving image and remove the related regions from a tracking target in the moving image by performing interpolation between the images. However, if a selected region is once out of a frame image, work efficiency deteriorates because it is necessary to select the region again.
FIG. 2 is a block diagram illustrating a functional configuration of an apparatus capable of estimating position/orientation of a camera in each frame of a moving image using the match moving technique discussed in the background art 1. As illustrated in FIG. 2, a desk 100A (physical object) is located in a physical space. In the physical space, a video camera (hereinafter, simply referred to as a “camera”) 110 can freely move to capture a moving image. The moving image captured by the camera 110 is stored in a moving image storage unit 210 via an image input unit 120.
The following processing is basically applied to a moving image (an assembly of still images) recorded in the moving image storage unit 210, if not specifically mentioned. A moving image input unit 220 reads a moving image as a processing target from the moving image storage unit 210. The moving image input unit 220 transmits the readout moving image to an image feature detection unit 130B.
The image feature detection unit 130B detects an image feature (a projection image of a geometric feature (corner points of the desk 100A, setup points of legs of the desk 100A, etc.) on a captured image) from each frame constituting the moving image. An image feature storage unit 230 stores, for each frame, image coordinates of the image feature detected by the image feature detection unit 130B. The image feature detection unit 130B can detect image features, which are associated with each other between consecutive frames, using, for example, a conventional image processing technique referred to as a “Harris operator” that is capable of detecting corners of an object on an image.
A display unit 250 displays, at image coordinates detected from a frame image in the moving image storage unit 210 and stored in the image feature storage unit 230, a composite image (i.e., a result reflecting a composite display representing an image feature detected at the image coordinates). The display unit 250 which is configured to display such a composite image enables a user to determine whether a normal camera moving path can be generated.
If a moving image includes a moving object (an object moving in a physical space), a user is required to operate an image feature selection input unit 225 to remove an image feature on the moving object from a frame image. For example, a removal method includes searching a frame image including an image feature to be removed and designating a region of the image feature on a detected frame image as a removal target. For example, a removal region designation method includes setting a polygonal region surrounding a removal target region in all frame images including the removal target region, removing an image feature in the polygonal region of each frame image, and deleting image coordinates representing the removed image feature from the image feature storage unit 230.
An imaging unit position/orientation estimation unit 240 tracks image coordinates remaining in the image feature storage unit 230. The imaging unit position/orientation estimation unit 240 calculates a three-dimensional position (location information) of each geometric feature in the physical space which corresponds to each image feature having been tracked and a moving path of the camera 110 using a conventional method (referred to as a “bundle adjustment method”).
The bundle adjustment method includes optimizing the three-dimensional position of an estimated geometric feature and the position/orientation of the camera 110 to minimize a difference between a position of the three-dimensional position of the estimated geometric feature projected on an imaging plane and image coordinates of an actually detected image feature.
The display unit 250 displays three-dimensional position/orientation of the obtained moving path of the camera 110 to let a user confirm a result.
The background art 1 can automatically generate location information even if the image feature is unknown. In the background art 1, when a moving image includes consecutively captured images of an object (or region) to be removed, it is possible to designate a specific region on a plurality of still images (key frames) of the moving image and remove the related regions from a tracking target in the moving image by interpolating between the images.
However, a method for removing an unnecessary image feature (which may cause an estimation error) from a frame image is limited to selecting a two-dimensional image feature in a two-dimensional image. Therefore, as described above, after the selected region is once out of the frame image, it is necessary to re-designate the region of the image feature to be removed if the selected region is necessary for tracking. Thus, work efficiency deteriorates.
FIG. 3 is a block diagram illustrating a functional configuration of an apparatus capable of automatically generating location information of an unknown geometric feature, which is discussed in the literature 1, and estimating position/orientation of the camera 110 (referred to as “background art 2”). In FIG. 3, components similar to those discussed in FIG. 2 are denoted by the same reference numerals.
In FIG. 3, the desk 100A is located in a physical space and a marker 100D is located on the desk 100A. According to the example illustrated in FIG. 3, as discussed in the literature 1, the marker 100D is a square marker whose location information is known beforehand. The marker 100D is, for example, a marker discussed in Kato et al: “An Augmented Reality System and its Calibration based on Marker Tracking”, Journal of the Virtual Reality Society of Japan, Vol. 4, No. 4, pp. 607-616 (1999) (hereinafter, referred to as “literature 2”).
The camera 110 captures a moving image in the physical space, and successively transmits frame images (images of captures frames) to an image feature detection unit 130C and a display unit 190 via the image input unit 120. The following processing is performed on the frame image of each frame, if not specifically mentioned.
The image feature detection unit 130C performs binarization processing on the frame image received from the image input unit 120 and generates a binary image. The image feature detection unit 130C detects the marker 100D from the generated binary image. Further, the image feature detection unit 130C detects ridges of the desk 100A and line segments based on color changes. For example, a method discussed in the literature 2 is usable to detect the marker 100D. More specifically, the method includes detecting a rectangular region from the binary image and recognizing the marker 100D based on a pattern in the detected rectangular region.
When the object is a line feature (a line segment connecting two points on the frame image), the image feature detection unit 130C extracts the line segment from the frame image and records image coordinates of two endpoints of the extracted line segment in the location information generation unit 140. If at least one of the endpoints is present on an edge of the frame image, namely when geometric information of the line feature is not completely within the screen, the line feature is discarded. The location information generation unit 140 stores the image coordinates representing the marker 100D and the extracted line feature(s).
The location information generation unit 140 outputs the image coordinates of the marker 100D to an imaging unit position/orientation estimation unit 185 to obtain initial position/orientation of the camera 110. First, the imaging unit position/orientation estimation unit 185 obtains the position/orientation of the camera 110 based on the image coordinates of the marker 100D received from the location information generation unit 140 and “location information of marker 100D” stored beforehand in a location information storage unit 310.
A conventional non-linear optimization calculation can be used to obtain a relative position/orientation relationship between the camera 110 and the marker 100D, as discussed in the literature 2. More specifically, the imaging unit position/orientation estimation unit 185 can obtain the position/orientation of the camera 110 by repetitively performing non-linear optimization to minimize errors between the image coordinates of four vertices of the detected marker 100D and image coordinates obtained when location information of the four vertices of the marker 100D is projected on an imaging plane.
Such processing can be referred to as “non-linear optimization of projection errors” which requires, as initial values, the position/orientation of the camera 110. For example, the position/orientation of the camera 110 estimated in the processing for a preceding frame is usable as the initial values. Further, a reference coordinate system in the physical space defines the location information of the marker 100D. Therefore, by performing coordinate transformation, the imaging unit position/orientation estimation unit 185 can obtain the position/orientation of the camera 110 in the reference coordinate system.
Next, the location information generation unit 140 acquires, from the imaging unit position/orientation estimation unit 185, the position/orientation of the camera 110 obtained by the imaging unit position/orientation estimation unit 185. Then, the location information generation unit 140 obtains a plane including the position of the camera 110 and two points of the line feature on the imaging plane in the reference coordinate system (referred to as a “line candidate plane”), and stores the obtained plane data.
Further, the location information generation unit 140 refers to one or more line candidate planes in the past frame images which are generated from visual points including a line segment similar to the line segment in the frame image of a present frame and exceeding a threshold with respect to a difference from the position/orientation of the camera 110 in the present frame. Then, the location information generation unit 140 generates location information of the line feature based on a crossing line of a plurality of line candidate planes. The literature 1 describes detailed line feature generation processing.
As described above, the location information generation unit 140 can generate the location information of the line feature from the frame image including the line feature. Through such processing, even if the location information of the line feature is unknown, the location information generation unit 140 can generate the location information of the unknown line feature referring to the past information.
The location information storage unit 310 stores the location information of any unknown line feature generated by the location information generation unit 140. The imaging unit position/orientation estimation unit 185 can use the location information stored in the location information storage unit 310 to obtain the position/orientation of the camera 110 based on the frame image of the next frame.
The imaging unit position/orientation estimation unit 185 can use a method for estimating the position/orientation of the camera 110 based on only a line feature which is similar to the non-linear optimization of projection errors applied to the marker 100D. More specifically, the imaging unit position/orientation estimation unit 185 updates the position/orientation of the camera 110 so as to minimize a distance between the image coordinates of projected location information of the line feature and the line feature on the frame image. If the marker 100D is detected on the frame image, the imaging unit position/orientation estimation unit 185 needs not use the line feature or can estimate the position/orientation of the camera 110 by combining the marker 100D and the line feature.
The display unit 190 displays a composite image including an edge model resembling the line feature at a position corresponding to the position/orientation of the camera 110 obtained by the imaging unit position/orientation estimation unit 185 on the frame image entered via the image input unit 120. A user can identify which the line feature is used while confirming contents displayed on the display unit 190.
As described above, the method discussed in the literature 1 can use the line feature which is not registered beforehand as information usable for estimation of the camera position/orientation. In other words, inputting location information of a line feature is unnecessary. Accordingly, work efficiency can be improved. However, an automatically registered line feature may include erroneous location information due to an accumulated error when it is positioned far from a marker or known information, or when it is erroneously detected in an image capturing operation. According to the method discussed in the literature 1, the erroneous location information cannot be removed. Estimation accuracy deteriorates if the erroneous location information is present. The position/orientation of the camera 110 cannot be accurately estimated.