1. Field of the Invention
The present invention relates to an object collation method and an object collation apparatus for collating an object with an image. More specifically, the invention relates to an object collation method and an object collation apparatus which is made robust against fluctuations in the photographic conditions such as the location and position on an image and the illumination conditions of an object to be collated, by registering the information such as the three-dimensional shape of the object and the reflectivity and the color information of the object surface in advance.
2. Description of the Related Art
The object collation technique or object recognition technique is one for finding out what a first object suitably arranged in a three-dimensional space is, and collates the object with a second object which is registered in advance by making use of the input image (or its group) which has been obtained by imaging the first object with a photography device such as a camera as shown in FIG. 1. In the following, the object to be registered in advance will be called the “registered object”, and the object to be collated with the registered object will be called the “target object”. The apparatus for the object collation will be called the “collating apparatus” in FIG. 1.
The object collation process is coarsely constructed of two procedures: a registration procedure for registering the registered object in advance; and a collation procedure for comparing/collating the target object with the registered object, to determine what is taken in the input image (or its group) of the target object. In these individual procedures, the taken image is used as a two-dimensional image having a two-dimensional extension or is converted into and used as a three-dimensional shape by a three-dimensional shape measuring apparatus.
Here will be described the object collation technique of the prior art with reference to References below.
Prior Art 1:
The object collation technique using a two-dimensional image as an input by registering in advance the image is exemplified in Reference 1, Japanese Patent No. 2872776. This technique supposes a human face as the object to be collated, and uses an apparatus having the construction shown in FIG. 2. In the registration phase, a two-dimensional image, as taken by camera 11, is stored in storage unit 12. In the collation phase, camera 13 takes a two-dimensional face image as the input images. Normalization unit 14 extracts the face feature points for standards of the position and the size such as the locations of eyes and nose from the input images by the image processing technique, and outputs the normalized image which is normalized in the two-dimensional position and size on the image with reference to the coordinate positions of the feature points. Finally, image comparison unit 15 compares the registered image read out from storage unit 12 and the normalized image by the pattern recognition technique, and outputs the collation result.
However, this mere two-dimensional image collating technique cannot cope with the three-dimensionally rotational fluctuations of an object to be collated or the apparent fluctuations on the two-dimensional image due to the fluctuations in the luminance conditions at the time of taking the image, so that it is seriously limited in its application range.
Prior Art 2:
The object collation technique of the prior art using a three-dimensional shape is exemplified by Reference 2, Japanese Patent Laid-Open Application No. 9-259271 (JP, 09259271, A). This technique employs an apparatus having the construction shown in FIG. 3. In the registration phase, the three-dimensional shape and the color information of a registered object are measured as face data by three-dimensional shape color information measuring unit 21 and are stored in storage unit 22. In the collation phase, too, the three-dimensional shape and the color information of a target object are measured as input data by three-dimensional shape color information measuring unit 23. Translation/rotation unit 24 generates a rotational face data group by translating the input data and adding a little rotation to it so that the center of gravity may coincide to the face data. Minimum error calculation unit 25 determines the minimum of the error between the face data and the rotational face data group to correct the three-dimensional location and position, and executes the collation on the basis of the minimum error.
However, this collation technique needs the three-dimensional shape not only at the registration phase but also at the collation phase. Therefore, the collation apparatus requires the three-dimensional shape measuring apparatus, so that it has a problem in its high cost. This problem is especially serious where the input image is to be taken and collated at a place different from that of the registration or at a plurality of places. There are left other problems that an object to be collated has to stand still till the measurement ends, and that highly precise shape data cannot be obtained unless the object is placed in the dark chamber or under the gloomy circumstances. These problems also limit the application range.
Prior Art 3:
In Reference 3, Japanese Patent Laid-Open Application No. 06-168317 (JP, 06168317, A), there is disclosed a collation technique for taking a two-dimensional image in each of the registration phase and the collation phase. This technique employs an apparatus having the construction shown in FIG. 4. In the registration phase, a camera 31 takes a two-dimensional image of a registered object, and a feature extraction unit 32 detects a pixel location of large luminance fluctuations and outputs the feature point location. This feature point location is stored in a storage unit 33. In the collation phase, a camera 34 takes a two-dimensional image of a target object as an input image, and a feature extraction unit 35 detects a pixel location of large luminance fluctuations and outputs the feature point location. A collation execution unit 37 compares and collates the registered feature point location and the feature point location of the input image. In order to absorb the fluctuations in the location and position of the target object, a three-dimensional shape model of a standard object is prepared in a location/position normalization unit 36 so that it is utilized to normalize the location and position.
Thus, the method for detecting the pixel position of large luminance fluctuations is effective for building blocks of extremely large radius of three-dimensional curvature or for such a black marker on a white board as has extremely large reflectivity fluctuations. However, it is known that this method is not suitable for the human face, as referred to in Reference 3, and this method finds it generally difficult to detect the coordinate location stably. On the other hand, the position is corrected with the standard three-dimensional shape of an object group to be collated, but the method has a problem that it cannot be applied to the case in which the similarity of the shape is not high between the individual objects of the group.
Prior Art 4:
In order to correct not only the fluctuations in the location and position but also the fluctuations according to the illumination conditions, there is a technique using only ordinary two-dimensional images in both the registration procedure and the collation procedure, as disclosed in Reference 4, Hiroshi Murase and Shree K. Nayer, “Visual Learning and Recognition of 3-D Objects from Appearance”, Int. J. Computer Vision, vol. 14, pp. 5-24, 1995. This technique employs an object collation apparatus having the construction shown in FIG. 5. In the registration phase, a photography unit 41 takes a two-dimensional image group covering all the positions and the illumination conditions that can be thought in the input image of the target object on the registered object. A manifold calculation unit 42 determines a basic vector group capable of expressing the change in the two-dimensional image group, by a principal component analysis, to generate a feature space featuring the correlation with the basic vector group, and determines the locus of the two-dimensional image group in the feature space, as the manifold. This locus is stored in a storage unit 43. In the collation phase, a camera 44 takes the two-dimensional image of the target object as an input image. A distance calculation unit 45 calculates the distance in the feature space between the input image and the manifold so that the collation is executed by using the calculated distance as a measure. Thus, it is possible to collate the input image which has been taken in the various locations and positions and under the various illumination conditions.
Considering the various illumination conditions such as a plurality of light sources or widening light sources as the illumination conditions of the input image of the target object, however, this technique needs a large number of sample images of the registered object covering those conditions. On the other hand, no assumption is made on the shape of the manifold in the feature space so that parameters for the photographic conditions have to be sought for determining the distance from the input image. Therefore, there arises a problem that a large number of calculations are required.
Prior Art 5:
The change in the two-dimensional image under the illumination conditions of the case in which the location and position of an object are fixed has been specifically analyzed in Reference 5, Peter N. Belhumeur and David J. Kriegman, “What Is the Set of Images of an Object under All Possible Illumination Conditions?”, Int. J. Computer Vision, vol. 28, pp. 245-260, 1998. If the location and position of the object are fixed, the image under arbitrary illumination conditions can be decomposed and expressed in the sum of images each under one point light source. Therefore, the images under an arbitrary number of light sources can be expressed with coefficients of intensities of every light sources as a linear sum of images under every light sources.
On the basis of this analysis, Reference 5 has proposed a method, as called the “Illumination Subspace Method” (as will be referred to as “Method 1”). In this method, there is utilized an object collation apparatus having the construction shown in FIG. 6. In the registration phase, a photography unit 51 is set with three or more different illumination conditions for pixels shadowed as little as possible, to take the two-dimensional image group of a registered object. A normal calculation unit 52 determines a vector group corresponding to products of the reflectivities of the object surfaces, as corresponding to the individual pixels of the image, and the normal vectors by the principal component analysis from the two-dimensional image group. Subsequently, an image generation unit 53 generates an image group called the “extreme ray” or the images of the case, in which the illumination is in the direction expressed by the exterior product of arbitrary two vectors of the vector group. This image group is stored in a storage unit 54. In the collation phase, a camera 55 takes a two-dimensional image of the target object as an input image. Where the reflecting characteristics of the object surface are completely scattering and have a convex shape, the image under an arbitrary illumination condition can be expressed as a linear sum having a positive coefficient of the extreme ray group so that the coefficient group can be calculated by the least square method under the nonnegative conditions. An illumination correction unit 56 performs the least square calculations to generate the comparison image of the target object under the same illumination conditions as those for the input image by the linear sum of the extreme ray group using the determined coefficient group. An image comparison unit 57 collates the comparison image and the input image by calculating their similarities.
According to this method, seriously many calculations are required for the procedure to calculate the extreme rays according to the complexness of the shape. According to Reference 5, the number of extreme rays is m(m−1) at the maximum where an m-number of linear independent normal vectors are on the object surface. Unless the object shape is as simple as that of the building blocks, therefore, a large number of images have to be calculated to raise a problem in the number of calculations to be made on all the extreme rays for the general object of a complicated shape. On the other hand, the method cannot be applied as it is to the case in which the object shape is not convex to have shadows caused by other portions shading the light source. Where the location or position of the object changes, moreover, there arises another problem that it is necessary to take the image of the object conforming to the location or the position and to calculate all the extreme rays over again.
Prior Art 6:
In Reference 6, A. S. Georghiades, “Illumination Cones for Recognition Under Variable Lighting: Faces”, Proc. IEEE Int. Conf. CVPR, pp. 52-58, 1998, there is disclosed a method (as will be referred to as “Method 2”). When the extreme rays are calculated in Method 1, according to Method 2, the technique of the computer graphics (CG) such as ray tracing is used to calculate what pixel is to be shadowed from the three-dimensional shape of the object for shadowing. As a result, Method 1 can also be applied even to an object having a non-convex shape.
According to this Method 2, however, where the location or position of the object changes, the image of this object has to be taken again conforming to the location or position thereby to calculate all the extreme rays again. Especially in this method, the calculations of the extreme rays include the shading of the images of the object, but this shadowing treatment requires many calculations such as the ray tracings thereby to raise a problem that the collations take a long time.
Prior Art 7:
Reference 6 has also proposed an exemplification of the Sampling Method (as will be referred to “Method 3”) by a method using an object collation apparatus having the construction shown in FIG. 7. It is troublesome to calculate all the extreme rays, as in Method 1. In the registration phase, therefore, a photography unit 61 takes a two-dimensional image group by setting such a suitable number of illumination directions that angles θ and φ of FIG. 8 may cover the entirety at as an equal spacing as possible, and the two-dimensional image group is substituted for the extreme rays. From now on, the nonnegative least square method is applied as in Method 1 to correct the illuminations thereby to recognize the object. FIG. 8 illustrates the angles indicating the directions of the illuminations for determining the illumination conditions with respect to the object.
For this method, however, it is necessary to take images by illuminating the object to be collated in many directions and to have a special illumination device at the registration phase. Where the location or position of the object to be collated changes, the images under the numerous illumination conditions of the objects conforming to the location or position have to be taken again. Therefore, it is necessary to take the images at all the locations and positions, as supposed for the input images. As a result, there arise problems that the registration is troublesome and that the image taken at the unregistered location or position cannot be collated.
The object to be collated is generally followed by the three-dimensional translations and rotations in front of the imaging device such as a camera so long as it is not especially fixed or adjusted. As apparent from the momentary fluctuations of the illumination conditions in the outdoor, seriously large fluctuations are apparent on the two-dimensional image inputted as the collation object. In the object collation technique of the prior art thus far described, those fluctuations in the location and position and in the illumination conditions cannot be sufficiently corrected to raise a problem that the application range is seriously restricted.
In order to solve those problems, the Assignee of the present invention has proposed the following technique in Japanese Patent Application No. 2000-105399 (JP, 2000-105399), “Image Collation Apparatus, Image Collation Method, and Recording Medium Recorded with Programs Therefor” (Reference 7). In this technique, a highly precise collation can be made even where both the location and position and the illumination conditions change, by registering the three-dimensional shape of a registered object in advance, by generating an illumination fluctuating texture group corresponding to the location and position of the input image of the target object by the computer graphics even if the location and position change, and by determining an image space enveloping the image group thereby to process the illumination corrections. Even if the location and position of the target object in the input image change, according to this technique, the images can be formed at the location and position adjusted by the three-dimensional shape of the registered object. This makes it unnecessary to take the image group at all the necessary locations and positions in the registration phase. At the step of generating the images under the numerous illumination conditions having the location and position adjusted to the input image, however, the numerous calculations including the shadowing treatment are required to leave unsolved a problem that the collations take a long time.