1. Field of the Invention
The present invention relates to a picture recognition apparatus for accumulating an object model converted from picture information of an object in a database, and consulting the database for picture recognition to recognize the object.
2. Description of the Related Art
With the advancement of a computer network such as the Internet, anybody can easily access various information, while the importance of a technique of confirming if a person accessing information is an authentic individual (i.e., an authentication technique) is being increased. This is because it is required to prevent an authentic individual from being mistaken for a pretender, or to minimize the probability of rejecting an authentic individual as a pretender.
One of the techniques, receiving most attention in recent years, in such a field is an authentication technique using a face picture for the following reason: like fingerprints and a voice print, a face is peculiar to an individual and can be a target used as a standard for recognition due to the advancement of a picture processing technique.
As a method using a face picture for recognition, various methods have been disclosed in the past. For example, JP11(1999)-110020 discloses a technique in which an environment parameter value representing the state of a capturing environment and a target state parameter value representing the state of a target are estimated from an input picture, and based on the values, recognition is performed by using a “picture for matching” corrected in such a manner that the states of a capturing environment and a target of the input picture match with those of a capturing environment and a target of a registered picture.
Hereinafter, the above-mentioned picture recognition processing using an environment parameter and a target state parameter disclosed in the above publication will be described with reference to FIGS. 1 to 4. FIG. 1 shows a flow of processing in a registration phase with respect to a database in the picture recognition processing.
In FIG. 1, first, a picture to be a registration target is input (Operation 11). Herein, one face picture captured from the front direction may be used. However, in order to enhance a recognition precision, it is desirable to prepare face pictures captured in various directions in addition to the front picture.
Next, a face region is cut out from the input picture (Operation 12) to obtain a picture of a face region (Operation 13). More specifically, as shown in FIG. 2, a face region is cut out as a rectangular region on the picture to be a registration target.
Then, the picture of the face region thus obtained is considered as an N-dimensional vector having each pixel as an element. The vector is projected onto an n-dimensional (n≦N) partial space (Operation 14), and the projected point is represented as P. In FIG. 2, the vector is projected onto one point of “sashida”.
Furthermore, an environment parameter value e representing the state of a capturing environment and a target state parameter value s representing the state of a target are estimated, and the estimated values and the projected point P are registered in a database as a pair (Operation 15). In the above-mentioned publication, there is no disclosure about a general method for estimating, from the picture, an environment parameter value e representing the state of a capturing environment and a target state parameter value s representing the state of a target.
FIG. 3 shows a flow of processing in a recognition phase in the picture recognition processing. In FIG. 3, the operations of inputting a picture to cutting out a picture of a face region (Operations 31 to 33) are the same as those in the registration phase in FIG. 1 (Operations 11 to 13).
Thus, the vector is projected onto one point of “sashida” in a partial space as shown in FIG. 4.
On the other hand, an environment parameter value e representing the state of a capturing environment and a target state parameter value s representing the state of a target are estimated from an input picture. Then, the parameter values estimated from the input picture are adjusted so as to match with the environment parameter value e and the target state parameter value s of the previously registered picture. Because of this adjustment, a picture for matching is generated in such a manner that the states of the capturing environment and the target of the input picture match with those of the capturing environment and the target of the registered picture. The picture for matching is projected onto a partial space to obtain a projected point Q (Operation 34).
Consequently, the registered picture is compared with the picture for matching under the same conditions regarding the states of a capturing environment (e.g., illumination), a target's position, posture, and the like. However, there is no disclosure about a general method for adjusting parameter values to generate a picture for matching in such a manner that the states of a capturing environment and a target of an input picture match with the states of a capturing environment and a target of a registered picture.
Then, the distance between the registered point P and the point Q in a partial space is calculated (Operation 305). Regarding all the registered pictures, the spatial distance is similarly calculated to find the closest point Pm (Operation 36).
Finally, the registered picture corresponding to the closest point Pm is recognized as that corresponding to the input picture (Operation 37).
However, according to the above-mentioned method, although there are advantages in that (1) an environment parameter value representing the state of a capturing environment and a target state parameter value representing the state of a target are estimated from a picture, and (2) parameter values are adjusted to generate a picture for matching in such a manner that the states of the capturing environment and the target of the input picture match with those of the capturing environment and the target of the registered picture, a general method for realizing these procedures is not known.
JP11(1999)-110020 proposes that an illumination parameter among environment parameters is estimated from a mean value, a variance, and a histogram of a brightness value of a face region picture, and that the resolution, focus, and exposure of a camera utilized for capturing are used as camera parameters among environment parameters. JP11(1999)-110020 also proposes that a target state parameter is estimated by using a skin color occupying area in a picture of a face region.
However, it is generally difficult to correctly estimate the above-mentioned parameter values. It is also difficult to model, from one or a few pictures, changes in a picture caused by the variations in these parameters. Thus, it is considered to be difficult to actually apply the above-mentioned method to recognition processing.
A face picture captured from the front direction is used for picture registration, so that an authentic individual may be mistaken for a pretender or a pretender may be mistaken for an authentic individual, in the case where the direction of a face and/or illumination conditions are varied at a time of input of a picture to be a recognition target