The present invention relates to an image processing apparatus and a method to execute a predetermined processing for an input image including a face area of a person.
In an image pattern recognition apparatus for recognizing a reflected intensity image of an object, an image captured by reflected light from object surface (the reflected intensity image) is input (image input processing). An image area as a recognition object is extracted from the input image (pattern extraction processing). The image area is converted to a pattern of predetermined size (pattern normalization processing). This pattern is converted to predetermined input data (feature extraction processing). This input data is compared with dictionary data previously registered and a similarity is calculated (similarity calculation processing).
In the pattern extraction processing, a background subtraction method, a temporal subtraction method, and a template matching method are selectively used. In the background subtraction method, a difference between an image not including a recognition object (background image) and an image including the recognition object (input image) is calculated, and an area of large difference value is extracted as an area including the recognition object. In the temporal subtraction method, a difference between two images inputted at different times is calculated, and an area of large difference value is extracted as an area including the recognition object detected by movement. In the template matching method, a template representing image feature of the recognition object is scanned on the input image, and an area of largest correlative value is extracted as an area including the recognition object. The background subtraction method and the temporal subtraction method are superior to the template matching method for quickly executing the pattern extraction processing.
In a similarity calculation processing, a distance evaluation method, a subspace method and a mutual subspace method are selectively used. In the distance evaluation method, input data and dictionary data are respectively represented as a vector of the same dimension and the same feature; a distance between both vectors is evaluated; and an object in the input data is recognized by evaluation. In the subspace method, the dictionary data is represented as a dictionary subspace generated from a plurality of vectors; a distance between the input vector and the dictionary subspace is evaluated; and the object in the input data is recognized by evaluation. In the mutual subspace method, the input data is also represented as an input subspace generated from a plurality of vectors; a distance between the input subspace and the dictionary subspace is evaluated; and the object in the input data is recognized by evaluation. In each method, a similarity between the input data and the dictionary data is converted to a similarity in order to recognize the object.
However, in the background difference method and the time difference method, the following two problems are well known.
(1) If a plurality of objects are included in the input image, the area of the recognition object is not extracted from the input image. As a result, by using the template matching method, each difference area must be verified based on image feature.
(2) If illumination environment changes because of weather variation or time passage, unexpected noise is mixed into the difference value. As a result, the area of the recognition object is not correctly extracted.
In order to solve these problems, it is necessary that the recognition object obtains high difference value in the difference image. Concretely speaking, the following two solution ideas are necessary.
(A) A camera means is controlled in order to capture the recognition object only in the input image.
(B) The difference value is calculated using an image representation not effected by illumination changes.
However, in the prior art, concrete means of two solution ideas (A) (B) are not considered as for above-mentioned two problems (1) (2). As a result, the image pattern recognition to quickly extract the recognition object using the difference is difficult.
Furthermore, in Japanese Patent Disclosure (Kokai) PH9-251534, a person recognition method is disclosed for a person""s face as the recognition object. In this method, a pattern extraction processing by the template matching method is combined with a similarity calculation processing by the mutual subspace method. The pattern extraction, the pattern normalization, and the similarity calculation are stably executed for change of facial direction and expression. Especially, in order to extract facial parts such as pupils and nostrils, a separability filter strong in change of illumination is used. In this case, the pattern normalization is executed based on location of the facial parts so that the normalized pattern is not varied by change of facial direction or expression. In this method, the nostrils are used as the facial parts.
Therefore, the camera (image input means) is located at lower part of a display to which a user faces in order to capture the nostrils of the user in the image. However, in this method, the following two problems exist.
(3) Concrete or detail condition for location of the camera is not disclosed. The detection of the facial parts is not assured if the camera is arbitrarily located.
(4) In order to stably detect the facial parts of the user from the input image, an idea to positively keep the user in such situation is not disclosed. As a result, the detection of the facial parts fails because of a caprice or whim of the user.
As mentioned-above, in the image pattern recognition method of the prior art, following two problems occur.
(1) A simple recognition object is not captured in the image. As a result, a pattern of the recognition object is not correctly extracted by the difference processing only.
(2) The noise area except for the recognition object is included in the difference value by noise cause such as illumination change. As a result, the pattern of the recognition object is not stably extracted by the difference processing only.
Furthermore, in the person identification method of the prior art, the following two problems occur.
(3) The location method of the camera means to assure the extraction of the facial parts is not apparent. As a result, a possibility to fail to extract the facial parts remains.
(4) A target means to lead the user to assure the extraction of the facial parts does not exist. As a result, the possibility to fail to extract the facial parts remains.
It is an object of the present invention to provide an image processing apparatus and a method to contrive the location of the camera means in order to simply execute the pattern extraction processing in image pattern recognition.
It is an object of the present invention to provide an image processing apparatus and a method to contrive the location of the camera means in order to simply execute the facial part extraction processing in person identification.
According to the present invention, there is provided an image processing apparatus, comprising: image input means for inputting an image of a face of a person to be recognized by using a camera; recognition area detection means for generating a difference image between the input image and a predetermined pattern and for detecting a recognition area whose value is above a threshold from the input image; input data generation means for converting the recognition area to a predetermined input data; and similarity calculation means for calculating a similarity by comparing the predetermined input data with a predetermined dictionary data; wherein a view position of the camera is located lower than a position of the face of the person, and a direction of optical axis of the camera represents an angle of elevation for a horizontal direction from the view position of the camera to the person.
Further in accordance with the present invention, there is also provided an image processing apparatus, comprising: image input means for inputting an image of a face of a person to be recognized by using a camera; face detection means for detecting a face area from the input image; facial part detection means for detecting a plurality of facial parts from the face area; and gaze direction detection means for detecting a gaze direction of the person from the plurality of facial parts; wherein a view position of the camera is located lower than a position of the face of the person, and a direction of optical axis of the camera represents an angle of elevation for a horizontal direction from the view position of the camera to the person.
Further in accordance with the present invention, there is also provided an image processing apparatus, comprising: image input means for inputting an image of a face of a person to be recognized by using a camera; face detection means for detecting a face area from the input image; facial part detection means for detecting a plurality of facial parts from the face area; person identification means for identifying the person by using a facial pattern consisted of the plurality of facial parts; and target means for leading at least one of a gaze direction and a facial position of the person to a predetermined direction or position; wherein a view position of the camera is located lower than a position of the face of the person, and a direction of optical axis of the camera represents an angle of elevation for a horizontal direction from the view position of the camera to the person.
Further in accordance with the present invention, there is also provided an image processing apparatus, comprising: first image input means for inputting a first image of a face of a person to be recognized by using a first camera; second image input means for inputting a second image of the face of the person by using a second camera; face detection means for detecting a face area from the first image; frontal face decision means for deciding whether the second image is a frontal face of the person by referring to the face area; open eyes detection means for detecting a state of open eyes from the face area; and image output means for outputting the second image inputted while the second image is decided to be the frontal face and the state of opening eyes is detected; wherein a direction from a view position of the second camera to a center position of the face of the person is a facial front direction, and wherein a view position of the first camera is located lower than a position of the face of the person, and a direction of optical axis of the first camera represents an angle of elevation for a horizontal direction from the view position of the first camera to the person.