1. Field of the Invention
The present invention relates to a target object detecting method and apparatus for detecting a targeted object from a digital image. The present invention also relates to a program for causing a computer to execute the method.
2. Description of the Related Art
As for the method for detecting a predetermined target object from a digital image, a matching-based method is widely used. In the matching method, a target object is detected by matching the model of the object to be detected (template) with the target object in a digital image (template matching). The template matching, however, has many drawbacks. For example, it may not tolerate various variations (size, direction, deformation) of a target object in a digital image and the like, since the object models are fixed as described, for example, in “Image Analysis Handbook”, Takagi and Shimoda, pp 171-205, 1991, University of Tokyo Press. Consequently, in order to realize a detection method which is robust against these problems, several detecting methods are proposed as described, for example, in “Evaluation of Pattern Description By KL Expansion for Application to Face Image Discrimination”, Akamatsu, et al., NTT Human Interface Laboratory, and “One Method of Face Image Processing Using Hough's Conversion”, Hasegawa and Shimizu, Osaka City University, Shingihou, PR090-153. These methods use KL expansion or Hough's conversion to project a digital image to a space where characteristics of a target object in the digital image are handled more easily for detecting the target object. But, these methods may not fully tolerate variations of a target object, and are used, therefore, by setting certain conditions for detection.
Recently, as the method for solving the problems described above, a method based on a neural network modeled after the image processing of the human brain has been proposed. The neural network method described above is one of the studying methods for image processing known as the architectural method, and many studies, such as the visual model, learning model, and associative memory model, have been conducted. The method creates an appropriate neural network model in view of known physiological facts and findings, examines the behavior and performance of the model created, and compares them with actual behaviors and performance of the human brain to understand the image processing principle of the human brain.
For example, as an epistemic model of the neural network, which is robust against disagreement in size and location of the target object, a so called neocognitoron is known as described, for example, in “Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position-Neocognitoron”, Kunihiko Fukushima, The Institute of Electronics, Information and Communication Engineers Article A, J62-A (10), pp 658-665, October, 1979. The neocognitoron is based on the doctrine in which the pattern matching is performed on a small section of the target object by gradually absorbing the displacement in stepwise based on a hierarchical structure.
As described above, in neocognitoron, the procedure for gradually tolerating the displacement in stepwise not only removes the displacement of an input pattern but also plays an important roll in performing pattern recognition which is robust against deformations. That is, the adverse effects of the relative displacement of the local characteristics are gradually absorbed in the course of integrating the characteristics, and eventually an output not influenced by considerable deformations of the input pattern may be obtained.
As for the learning model, the Kohonen's self-organization mapping is known through “Self-Organization and AssociativeMemory”, T. Kohonen, Springer-Verlag, 1984. The Kohonen's self-organization mapping is a model in which a topological mapping is learned through self-organization. The topological mapping means, for example, the process of allocating a signal received by a human being from outside, i.e., a certain pattern, to neurons of the cortex reflecting the order based on a certain rule.
As one example of the system that utilizes the Kohonen's self-organization, an experiment in which binary images are learned by a hard system is reported in “Self-organizing Optical Neural Network for Unsupervised Learning”, Taiwei Lu, et al., Optical Engineering, Vol. 29, No. 9, 1990.
In the mean time, processes for detecting a predetermined target object are performed in many areas. For example, in the area of identification photograph, a grant application for passport or certificate, or in provision of a personal resume, submission of a photograph of the applicant face having a predetermined size, i.e., an ID photograph is often required. For this reason, automatic ID photograph creation systems have been used. The system has a photo studio in which the user sit on a chair to have his/her face taken (facial photograph image) and an ID photograph sheet on which facial photograph images for identification are recorded is created automatically. But such a system is large and the installation site is limited, so that the user must locate the installation site of the automatic ID photograph creation system in order to obtain an ID photograph, which is inconvenient to the user.
A method for solving the problem described above is proposed as described, for example, in Japanese Unexamined Patent Publication No. 11 (1999)-341272. The method provides an ID photograph by the following steps. First, display a facial photograph image to be used for the ID photograph on a display, such as a monitor. Then, indicate the top of the head and tip of the jaw of the facial photograph image on the screen to instruct a computer to create the ID photograph. The computer, in turn, enlarges/reduces the image to obtain a scaling rate and the position of the face based on the two positions indicated by the operator and the output specification of the ID photograph. Then, the computer performs trimming for the enlarged/reduced image so that the face in the enlarged/reduced image is placed at a predetermined location. In this way, the user may ask DPE shops, which may be more frequently encountered than the automatic ID photograph creation systems, to create the ID photograph. In addition, the user may bring in a DPE shop a photographic film or a recording medium out of his/her stock on which a favorite photo image is recorded in order to create the ID photograph from the favorite photo image.
However, in the method described above, the operator must perform the troublesome chore of indicating the top of the head and tip of the jaw of the facial photograph image displayed on a display screen. This is especially burdensome for the operator who handles ID photographs of many customers. Further, if the area of the face region of the facial photograph image displayed on a display screen is small, or the resolution of the facial photograph image is coarse, the operator may not indicate the top of the head and tip of the jaw quickly and accurately, so that an appropriate ID photograph may not be provided promptly.
Consequently, many methods for setting the trimming area promptly and accurately to reduce the burden of the operator are proposed. For example, an automatic trimming method is proposed in U.S. Patent Application Publication No. 20020085771. In the method described above, the top of the head and eyes in a facial photograph image are located and the trimming area is set by determining the position of the jaw based on the positions of the top of the head and eyes. The most important process in the automatic trimming is the detection of the regions for setting the trimming area. These regions, i.e., the target objects for detection may be, for example, the positions of the top of the head and eyes, entire face portion, both pupils, or the combination thereof as described in U.S. Patent Application Publication No. 20020085771.
Various methods for detecting target objects described above are used for detecting these regions.
But, in detecting a predetermined target object from a digital image, the target object included in the digital image is not always detected. For example, in detecting a face from a facial photograph image, a face having a standard characteristic may readily be detected. But a face having a certain specific characteristic (eyeglassed face, heavily whiskered face, uniquely hairstyled face, etc.) is difficult to be detected. The reason for this is that the faces found in the world are predominated by the standard faces, thus the face detection algorithms are designed based on the standard faces.
On the other hand, if face detection algorithms capable of detecting faces having specific characteristics are incorporated in the face detection process, as well as a face detection algorithm for detecting standard faces, in order to detect faces having specific characteristics, including those described above, a huge amount of calculations is required and the detection accuracy for standard faces may be degraded, which is like “putting the cart before the horse”.