1. Field of the Invention
The present invention relates to an image processing for processing an image from a camera or the like, and specifically, to a field of image recognition for extracting human face from an image.
2. Description of the Background Art
By means of communications, a TV conference system for a plurality of persons at remote sites to hold a conference has been brought into practical use. Such a system, however, involves a problem that transmission of the video itself increases the amount of transmitted data. In order to address the problem, study has been made on a technique for extracting feature data for eye direction, face direction, facial expression and the like of the target person at each remote site, and transmitting only the extracted data between the sites. At the receiving side, an image of a virtual human face is created based on the data and displayed. Thus, the TV conference may be carried out efficiently while the amount of transmitted data is decreased.
Further, such a technique for detecting a person from an image has been widely studied also as a technique essential for the development in the field of human-computer interaction, gesture recognition, security and the like.
These applications of the human detection technique require to structure the stable system that satisfies the conditions of 1) high detection rate, 2) withstanding variation in illumination environment, and 3) operating in real time. Further, in the future, the necessity of the real-time human detection for a high-quality image (an image having a large number of pixels forming one screen) is expected to increase, and therefore development of faster human detection algorithm is required toward the future.
For human detection, the effective scheme is to detect the face first. The face has important information such as expression, and once the face is detected, estimation and search of the position of arms and legs become easier.
There have been many reports on a face detection system using skin-color information, as disclosed in Japanese Patent Laying-Open No. 2001-52176 or in the following References 1-4.
Reference 1: Shinjiro Kawato and Nobuji Tetsutani, “Real-time Detection of Between-the-Eyes with a Circle-Frequency Filter”, Journal of IEICE, Vol. J84-DII, No. 12, pp. 2577-2584, December 2001.
Reference 2: Shinjiro Kawato and Nobuji Tetsutani, “Two-step Approach for Real-time Eye Tracking”, Technical Reports of IEICE, PRMU2000-63, pp. 15-22, September 2000.
Reference 3: D. Chai and K. N. Ngan, “Face Segmentation Using Skin-Color Map in Videophone Applications”, IEEE Transactions on Circuits and Systems for Video Technology, Vol. 9, No. 4, pp. 551-564, June 1999.
Reference 4: J. Yang and A. Waibel, “A Real-time Face Tracker”, Proceedings of 3rd IEEE Workshop on Application of Computer Vision, pp. 142-147, December 1996.
According to these schemes, a skin-color region is extracted from an image to determine a face candidate region. As the face candidate region can be limited, the range of process is limited and the computation amount can be reduced significantly, which enables to structure a fast system. The scheme of using the color information, however, is susceptible to the variation in the illumination environment, and stable performance can not be expected when operated in general environment.
On the other hand, as for a face detection scheme not using the color information (but using brightness information), numerous schemes employing template matching or learning scheme such as neural network have been reported, as shown in References 5 and 6 below. These schemes are characterized by high detection rate and robustness to the illumination environment. For example, the technique disclosed in Reference 5 applies neural network to realize extremely high detection rate.
Reference 5: H. Rowley, S. Baluja, and T. Kande, “Neural Network-Based Face Detection”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, No. 1, pp. 23-38, January 1998.
Reference 6: E. Hjelmas and B. K. Low, “Face Detection: A Survey”, Journal of Computer Vision and Image Understanding, 83(3), pp. 236-274, 2001.
These schemes, however, must match the whole image and a template (a model) varying the size, and hence involves a problem that the computation amount is large. Accordingly, since the large pixel size drastically increases the computation amount, structuring a real-time system is very difficult.
In the technique disclosed in Reference 7 shown below, a face is detected using brightness-darkness relations of mean brightness among segmented regions. The regions are distributed from the forehead to the chin in 16 segments, and therefore it is easily affected by the hairstyle or the beard.
Reference 7: Brian Scassellati, “Eye Finding via Face Detection for a Foveated, Active Vision System”, Proceedings of AAAI-98, pp. 969-976, 1998.
The technique disclosed in the above-mentioned Japanese Patent Laying-Open No. 2001-52176 takes notice of the middle point between the eyes (hereinafter referred to as Between-the-Eyes) as a stable feature point of the face. Specifically, vicinity of Between-the-Eyes forms a pattern in which the forehead and the nose bridge are relatively bright, while the eyes and the eyebrows at opposing sides are dark. A circle frequency filter for detecting this pattern is employed.
The circle frequency filter, however, involves the problem that a pre-processing for extracting a skin-color region to limit the region is required and the face with the hair covering the eyebrows cannot be detected, since the pattern described above does not appear therein.