The present invention pertains generally to the field of image processing, and in particular, the invention relates to a system and method for determining information related to features of an object in a digital image using models.
Systems and methods are known that analyze digital images and recognize, for example, human faces. Determining the existence of a human face in an image or extracting facial feature information has been used for various applications such as in automated/surveillance systems, monitoring systems, human interfaces to computers, television and video signal analysis.
Conventional detection systems use methods such as color tone detection, template matching or edge detection approaches. There are, however, numerous shortcomings to these types of conventional systems. In general, they lack robustness, e.g., due to variations in human races, facial expression and lighting conditions.
More particularly, in systems using template matching, facial templates are first determined based upon average positions of facial features (i.e., eyes, nose and mouth) for a particular sex or race. A digital image is then matched to a template to identify sex or race. One shortcoming of this type of system is that expressions, e.g., a smile, may cause the wrong template to be used which leads to incorrect results. Another shortcoming of this method is that the exact positions of facial feature such as the eyes and nose are not actually determined. Facial color tone detection and template matching typically only determine whether a human face is present in an image.
Conventional edge detection approaches, however, are known to locate the position of eyes. Edge detection approaches are effective in this application because the eyes typically have high edge density values. However, eye glasses and facial hair such as a mustache may cause erroneous results. In addition, edge detection can not typically be used to determine the position of other facial feature such as a nose. Such edge detection approaches are also slow because a global search/operation must be preformed on the entire image.
This delay or slow processing degrades, for example, video/image communication applications over the Internet or Public Switch Telephone Network (PSTN). In conventional video/image communication technology a picture (in a JPEG, MPEG or GIF format) is captured and then transmitted over a transmission network. This approach, however, requires a large bandwidth because of the size (i.e., the amount of data) of the information.
Methods have been used to improve video/image communication and/or to reduce the amount of information required to be transmitted. One method is called model-based coding. Low bit-rate communication can be achieved by encoding and transmitting only representative parameters of an object in an image. At the remote site, the object is synthesized using the transmitted parameters.
One of the most difficult problems in model-based coding is providing feature correspondence quickly, easily and robustly. In sequential frames, the same features must be matched correctly. Conventionally, a block-matching process is used to compare pixels in a current frame and a next frame to determine feature correspondence. If the entire frame is searched for feature correspondence, the process is slow and may yield incorrect results due to mismatching of regions having the same gradient values. If only a subset of the frame is searched, the processing time may be improved. However, in this situation, the process may fail to determine any feature correspondence.
There thus exists in the art a need for improved systems and methods for extraction of object features from images and feature correspondence matching in images.
One aspect of the present invention to provide a feature extraction system that uses front-end models to define regions of interest in an image so that positions of specific feature are determined quickly.
In another aspect of the present invention, an image processing device includes front-end modeling of an object in an image to improve accuracy and reduce processing time of feature determination, as well as reduce the amount of memory required to store information related to features of the object and overall image.
One embodiment of the invention relates to an image processing apparatus including an object detector arranged to determine whether an object is present in image data, at least one model of the object, and a feature extractor which identifies at least one feature of the object. The feature is identified in accordance with the model.
Another embodiments of the invention relate to a memory medium and method of determining positions of facial features in an image.
These and other embodiments and aspects of the present invention are exemplified in the following detailed disclosure.