1. The Field of the Invention
This invention relates to image signal processing, and more particularly to apparatus and method for interpreting and extracting the features of human faces represented in images input through camera sensor or video images, to detect the human face position within the images.
2. Description of the Related Art
Recently, in the study of artificial intelligence field, attention and study has been focussed on implanting the recognition capability human beings have into a computer to endow intelligence on the computer or machine. In particular, face recognition technology using the human vision system has been very actively and widely studied throughout all fields related to computer vision and image processing, such as image processing, pattern recognition, and facial expression investigation. A technique for detecting faces and facial area is highly regarded in various applied fields such as facial expression research, drivers"" drowsiness detection, entrance/exit control, or image indexing. Humans easily detect a facial area even in various and dynamic environments, while it is not an easy thing for computers to perform this, even in a relatively simple image environment.
Representative approaches in previously proposed facial area detection methods include a method of using a neural network (U.S. Pat. No. 5,680,481), a method of using the statistical features of facial brightness, such as a principal component analysis of brightness (U.S. Pat. No. 5,710,833), and a matching method proposed by T. Poggio (IEEE Transactions on Pattern Analysis and Machine Intelligence 20, 1998). In order to employ the extracted face candidate image as the input of a face recognition system, a means of detecting the exact position of facial components or facial features in the extracted face candidate region is required. In other words, in order to compare an input image with a model, position extraction and a size normalizing process for compensating for differences in size, angle, and orientation of the facial image extracted from the input image relative to a facial image of the model template are prerequisite for enhanced recognition and matching efficiency. In most face recognition systems, an eye area or the central area of a pupil is used as a reference facial component in the alignment and the normalizing processes. This is because that features of the eye area remain unchanged compared with those of other facial components, even if a change occurs in the size, expression, attitude, lighting, etc., of a facial image.
Many studies on detecting the eye area or the central position of the pupil from an image are ongoing. Methods applied to conventional face recognition systems mainly adopt a pupil detection method. A representative pupil detection method is to employ normalized correlation at all locations within an input image by making eye templates of various sizes and forming a Gaussian pyramid image of the input image. Furthermore, U.S. Pat. No. 5,680,481 and IEEE TPAMI19, 1997, by Moghaddam and T.Poggio (IEEE TPAMI 20, 1998) show a method in which eigen matrixes for eyes, nose, and mouth areas are provided according to the size of a template, and features of interest are searched through comparison with an input image in all areas within the template image. A problem that both methods have in common is that all areas in an image have to be searched with several model templates classified on the basis of size or orientation for all areas of an image, since no information on size, orientation or location of eye or nose features is made available in the input image. This not only causes excessive computation, but also requires determining a threshold value for defining each area, and causes excessive false alarms, so that application to an actual system is made difficult.
U.S. Pat. No. 5,832,115 discloses that templates having two concentric ellipses for detecting facial ellipses may be used to detect a facial location through evaluating the size of edge contours which encircle the face in a region between the two ellipses. However, even in this case, the same problem occurs in that the size and orientation of an elliptical template has to be determined and searched through all areas within the image.
In order to overcome such problems in facial location detection, many recent studies have focussed on the use of color images. Based on the fact that, in most color images, a color value in the color of a face or skin approximates a general statistical value, the study of extracting candidate facial areas by detection of skin color forms a mainstream (see J. Rehg, (COMPAQ TR CRL9811, 1998) and references therein). Recently, the studies have been successfully applied in color indexing, and facial tracking and extraction. However, the facial position extraction by a color is greatly affected by image obtaining conditions such as a camera which acquires an image, illumination color, and surface and state of an object. For example, two different cameras give different color values even in the same environment and for the same person, and in particular, a face or skin color value significantly changes depending on illumination. In a case in which the image obtaining conditions is unknown, it is difficult to determine the range of a skin color value for identifying only face color region. Furthermore, a process of determining only facial areas for similar skin colors which are widely extracted, including background regions, is not only a difficult task but requires many subsequent processes.
To solve the above problem, it is an objective of the present invention to provide an apparatus which is capable of accurately and quickly detecting a speaking person""s eye and face position, and which is tolerant of image noise.
It is another objective of the present invention to provide a method of accurately and quickly detecting a speaking person""s eye and face.
Accordingly, to achieve the above objective, an apparatus for a speaking person""s eye and face detection according to an embodiment of the present invention includes an eye position detecting means for detecting pixels having a strong gray characteristic to determine areas having locality and texture characteristics as eye candidate areas among areas formed by the detected pixels, in an input red, blue, and green (RGB) image, a face position determining means for creating search templates by matching a model template to two areas extracted from the eye candidate areas, and determining an optimum search template among the created search templates by using the value normalizing the sum of a probability distance for the chromaticity of pixels within the area of a search template, and horizontal edge sizes calculated in the positions of the left and right eyes, a mouth and a nose estimated by the search template, and an extraction position stabilizing means for forming a minimum boundary rectangle by the optimum search template, and increasing count values corresponding to the minimum boundary rectangle area and reducing count values corresponding to an area other than the minimum boundary rectangle area, among count values of individual pixels, stored in a shape memory, to output the area in which count values above a predetermined value are positioned, as eye and face areas.
To achieve another objective of the present invention, a method of detecting a speaking person""s eye and face includes the steps of detecting pixels having a strong gray characteristic to determine areas having locality and texture characteristics as eye candidate areas among areas formed by the detected pixels, in an input red, blue, and green (RGB) image, creating search templates by matching a model template to two areas extracted from the eye candidate areas, and determining an optimum search template among the created search templates by using the value normalizing the sum of a probability distance for the chromaticity of pixels within the area of a search template, and horizontal edge sizes in the positions of the left and right eyes, a mouth and a nose, estimated by the search template, in the RGB image, and forming a minimum boundary rectangle by the optimum search template, and increasing count values corresponding to the minimum boundary rectangle area and reducing count values corresponding to an area other than the minimum boundary rectangle area, among count values of individual pixels, stored in a shape memory, to output the area, in which count values above a predetermined value are positioned, as eye and face areas.