1. Field of the Invention
The present invention relates to an image processing apparatus, a processing method therefor, and a non-transitory computer-readable storage medium.
2. Description of the Related Art
In recent years, for practical applications to mixed reality and automatic traveling of robots, techniques that measure the position and orientation of a camera relative to a physical object based on a three-dimensional geometric model of the physical object are being studied. Among others, techniques that represent a three-dimensional geometric model as a set of line segments, and measure the position and orientation of a camera so as to fit projected images of the line segments to edges in an image captured by the camera are under extensive study (for example, T. Drummond and R. Cipolla, “Real-time visual tracking of complex structures,” IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 24, no. 7, pp. 932-946, 2002 (to be referred to as reference 1 hereinafter)).
To measure the position and orientation of a camera based on a three-dimensional geometric model, a three-dimensional geometric model of a target physical object must be prepared in advance. With a known method, a three-dimensional geometric model of a physical object is generated based on, for example, the corresponding relationship between image features in a plurality of images obtained by capturing the physical object. Z. Zhang, “Estimating Motion and structure from correspondences of line segments between two perspective images,” IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 17, no. 12, pp. 1129-1139, 1995 (to be referred to as reference 2 hereinafter) and C. J. Taylor and D. J. Kriegman, “Structure and motion from line segments in multiple images,” IEEE Transactions on Pattern Recognition and Machine Intelligence, vol. 17, no. 11, pp. 1021-1032, 1995 (to be referred to as reference 3 hereinafter) disclose techniques of detecting line segments such as the contour of a physical object from two or three or more images obtained by capturing the physical object at different angles, and generating a three-dimensional geometric model which represents the physical object based on the corresponding relationships of the line segments between the images. In this case, the corresponding relationships of the line segments between the two or three or more captured images must be obtained using an appropriate method. When the number of line segments is relatively small, for example, the user may manually associate the line segments in the plurality of images displayed on the screen with each other. In contrast, as the number of line segments increases, it becomes more difficult to manually associate these line segments. To automatically, efficiently perform this association, line segment-specific information, which is independent of the observation position, must be assigned to the line segment detected in the image as a feature of this line segment.
To meet this requirement, Japanese Patent Laid-Open No. 2004-334819 discloses a technique of generating, as features of line segments detected in a plurality of images, data streams of pieces of information on pixels in the vicinities of these line segments, and comparing the data streams with each other, thereby associating a plurality of line segments detected from a plurality of images, respectively, with each other. Considering, for example, image noise and a change in illumination, it is difficult for this technique to accurately perform the association.
C. Schmid and A. Zisserman, “Automatic line matching across views,” Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 1997, pp. 666-671, 1997 (to be referred to as reference 4 hereinafter) discloses a technique of associating and comparing one point on a line segment with another point on the line segment using the epipolar geometry between the images assuming that this epipolar geometry is known. In this technique, the epipolar geometry between the images must be estimated in advance from information (for example, feature points with high distinguishability) other than line segments.
H. Bay, V. Ferrari, and L. Van Gool, “Wide-baseline stereo matching with line segments,” Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2005, vol. 1, pp. 329-336, 2005 (to be referred to as reference 5 hereinafter) discloses a technique of associating the detected line segments based on the color distributions on the two sides of each of these line segments. In this technique, color histograms of pixel groups spaced apart from the detected line segment by several pixels are generated on the two sides, respectively, of this line segment and used to compare this line segment with other line segments. In this case, preliminary information such as the epipolar geometry between the images is unnecessary.
In the method disclosed in reference 5, the detected line segments are associated with each other using the color distributions on the two sides of each of these line segments as their features, so the association may fail if line segments with similar color distributions are present.
Again, in the method disclosed in reference 5, color histograms of pixels spaced apart from the detected line segment by several pixels are used as features of this line segment, so the association may fail if line segments are adjacent to each other or are densely populated.