One example of conventional object pose estimating and matching system is disclosed in Shimada, et. al. “Method of constructing a dictionary for personal identification independent of face orientation” IEICE TRANSACTIONS D-II, Vol. J78-D-II, No. 11, pages 1639-1649, 1995 (hereinafter referred to as “first prior art”). As shown in FIG. 1, the object pose estimating and matching system according to the first prior art has image input unit 10, normalizer 15, matching and pose selecting unit 41, and pose-specific reference image storage unit 85.
The conventional object pose estimating and matching system thus constructed operates as follows: Pose-specific reference image storage unit 85 stores at least one pose-specific reference image captured of one or more objects under one or various pose conditions. Each pose-specific reference image is generated from one image or an average of images captured for each pose. Image input unit 10 is implemented by a camera or the like, and stores a captured input image in a memory (not shown). Input images may be read from a recorded file or acquired through a network. Normalizer 15 aligns an input image using feature points extracted from the object, and generates a normalized image. In the illustrated system, normalizer 15 aligns an input image by detecting, as feature points, the positions of characteristic parts, e.g., an eye and a mouth. The pose-specific reference image is also normalized and stored. Normalized images often use features obtained by a feature extracting process. Matching and pose selecting unit 41 calculates distance values (or similarity degrees) between the normalized image and the pose-specific reference images of respective objects obtained from pose-specific reference image storage unit 85, and selects one of the reference images whose distance value up to the object is the smallest (whose similarity degree is the largest), thereby estimating an optimum pose. The distance values are calculated by using the normalized correlation or Euclidean distance, for example. If an input image is matched against one object (one-to-one matching), then the minimum distance value is compared with a threshold value to determine whether the input image is the same as the object or the not. If one of a plurality of objects (reference images) which is closest to an input image is searched for (one-to-N matching), then one of the objects which has the smallest one of the minimum distance values determined up to the respective objects is extracted.
Another example of conventional object pose estimating and matching system is disclosed JP-2003-58896A (hereinafter referred to as “second prior art”). As shown in FIG. 2, the conventional object pose estimating and matching system according to the second prior art has image input unit 10, comparative image generator 20, pose candidate determining unit 30, matching and pose selecting unit 41, and reference three-dimensional object model storage unit 55.
The conventional object pose estimating and matching system thus constructed operates as follows: Reference three-dimensional object model storage unit 55 registers therein reference three-dimensional object models of respective objects (three-dimensional shapes and object surface textures of the objects). Pose candidate determining unit 30 determines at least one pose candidate. Comparative image generator 20 generates a comparative image having illuminating conditions close to those of the input image, based on the reference three-dimensional object models obtained from reference three-dimensional object model storage unit 55. Matching and pose selecting unit 41 calculates distance values (or similarity degrees) between the input image and the comparative images, and selects one of the comparative images (pose candidate) whose distance value up to the model (object) is the smallest, thereby estimating an optimum pose.
Still another example of conventional object matching system is disclosed in Guo, et. al. “Human face recognition based on spatially weighted Hausdorff distance” Pattern Recognition Letters, Vol. 24, pages 499-507, 2003 (hereinafter referred to as “third prior art”). As shown in FIG. 3, the conventional object matching system according to the third prior art has image input unit 10, normalizer 15, weighted matching unit 45, reference image storage unit 89, and weighting coefficient storage unit 99.
The conventional object matching system thus constructed operates as follows: Image input unit 10 and normalizer 15 operate in the same manner as the components denoted by the identical reference numerals according to the first prior art. Reference image storage unit 89 stores at least one reference image for each object. Weighting coefficient storage unit 99 stores weighting coefficients for pixels (or features) to be used for comparing a normalized image and reference images. Weighted matching unit 45 calculates distance values (or similarity degrees) between the normalized image and the reference images of respective objects obtained from reference image storage unit 89, and selects one of the reference images whose distance value is the smallest, thereby matching the input image. If the Euclidean distance, for example, is used for calculating the distances, then a weighted Euclidean distance is calculated according to D=Σrw(r){x(r)−m(r)}2 where x(r) represents the normalized image, m(r) the comparative image, and w(r) the weighting coefficient (r represents a pixel or feature index).
The conventional object matching systems described above have the following problems:
According to the first prior art and the second prior art, though a pose can be estimated and matched, the accuracy with which a pose is estimated and matched is lowered if a large local difference is developed between an input image and reference images or comparative images from the DB due to local deformations of the object and different image capturing conditions.
The reasons for the above problem are that when the object is deformed, even if the pose of the object is generally aligned with those of the reference images or comparative images, the object has a local area not aligned with the reference images or comparative images, resulting in different pixel values (or features) in the local area. Even when the object is not deformed and has aligned local areas, according to the first prior art, there is developed a local area having largely different pixel values if the input image and the reference images are captured under different conditions. For example, if the input image and the reference images are captured under different illuminating conditions, then shadows are produced in different areas. According to the second prior art, even if a comparative image is generated which is closest to an input image, they have different local areas because of observing errors in three-dimensional object measurement and a simplified process of generating comparative images.
The third prior art is problematic in that the matching accuracy is reduced if object poses and illuminating conditions in an input image and reference images are different from each other.
The reasons for the problem of the third prior art are that weighting coefficients are established for areas of an object, and if pose conditions are different, then the object has a misaligned area, making it impossible to perform proper weighted matching. Furthermore, when illuminating conditions are different, an area that is important for matching often changes. However, since the weighting coefficient remains the same, appropriate weighted matching cannot be performed.