In recent years, a face detection technique of detecting a position and a direction of the face and a state of face parts such as the eyes and the mouth included in a captured still image or moving image has been developed. For example, in a vehicle, inattentive driving or dozing-off while driving is detected by detecting the driver's face, and thus a predetermined action such as triggering an alarm can be performed. In order to perform such detection, the face is required to be detected in real time, but an intensity and a direction of light tend to change inside a vehicle, and the driver's face tends to move due to shaking of a car body. For this reason, particularly, the face detection in a vehicle allows a process to be performed in real time, and is also required to be resistant to noise caused by changes or the like in light and face states.
Stan Z. Li, Anil K. Jain, “Handbook of Face Recognition”, Springer, 2011, p, 124 to 133 (Non-patent Reference 1) discloses a lace detection technique (active shape model: ASM, or active appearance model: AAM) of generating a model of the face in an image by fitting a statistical face shape model to the face in the image, that is, performing model fitting by using a steepest descent method or the like. By using this technique, a model of the face in an image is generated, and then the model is continuously fitted to the face in the image, that is, tracking is performed, so that a position and a direction of the face or a state of a face part can be estimated over time.
JP 2008-192100A (Reference 1) discloses a face part detection technique in which changes in an image due to blinking are detected as moving regions by using a difference image between frames, and moving regions having the largest area are specified as eye positions. In a general method of detecting eye positions, the glass frames or the eyebrows are frequently wrongly detected as eye positions, but, by using the technique of Reference 1, it is possible to prevent the glass frames or the eyebrows from being wrongly detected as eye positions and thus to detect eye positions with high accuracy.
In the technique disclosed in Non-patent Reference 1, accuracy of the model fitting is greatly influenced by an initial state of a model, that is, where the model is initially disposed in an image and to which angle and shape the model is set. If the initial state of the model is widely different from an actual state of the face, there is a case where calculation of model fitting for generating the model which is fitted to the actual face ends with a local optimum solution, and thus the model deviates from the actual face and converges. This is referred to as a fitting error, and if the fitting error occurs, accuracy of a model of the face is reduced. Particularly, positions of the eyes of the model tend to wrongly converge on positions of glass frames or the eyebrows. In addition, there is a case where a deviation occurs between the model and the actual face during tracking of the model, and accuracy of the model is reduced. Non-patent Reference 1 does not suggest a method of correcting the model in this case.
In the technique disclosed in Reference 1, since a difference image is easily influenced by noise caused by changes in the intensity and the direction of light and a movement of the face, it is difficult to continuously detect eye positions at all times by using only the difference image. For this reason, in the technique disclosed in Reference 1, there is a concern that eye positions may not be detected while images having a lot of noise are continuously located, and accuracy of a model over time may not be maintained.