The present invention relates to methods for tracking heads, faces, facial features, and other objects within complex images.
Although the principles of this invention are equally applicable in other contexts, the invention will be fully understood from the following explanation of its use in the context of locating heads and faces within still or moving pictures.
Various applications necessitate the design of a method for locating objects, such as heads and faces, within complex images. These applications include, for example, tracking people for surveillance purposes, model-based image compression for video telephony, intelligent computer-user interfaces, and other operations.
Many algorithms for recognizing faces in images, and for tracking individual facial features, have been described in the literature. A common drawback shared by these algorithms, however, is that they routinely fail when implemented in environments involving varying conditions such as lighting or camera characteristics. This problem can be traced in part to the reliance of many of these algorithms on a single modality to represent the tracked data. For example, an algorithm which uses color as its single modality usually fails when the background colors of the tracked image are similar to skin colors. Likewise, an algorithm using shape as its sole classifier may falsely recognize extraneous background objects to be heads or facial features.
In addition, existing tracking algorithms typically use classifiers that rely on a single type of representation. Some algorithms, for example, gather data constituting potential faces or facial features, and then represent these data exclusively in the form of binary bitmaps. The bitmaps are ultimately combined to form the tracked output. Particularly when conditions of the tracked environment vary (e.g., the person to be tracked has a light, unpronounced complexion), the final tracked result can be very poor. One reason for this result is that bitmaps in such algorithms are never evaluated or compared with other types of representations. Thus, these methods provide for little error-checking capability.
The problem of inaccurate tracking is exacerbated when the analysis relies on a single channel, or classifier, to produce its output, as in many existing algorithms. As an illustration, when a color-channel analysis yields a significant amount of tracking error due to an insufficient skin contrast in the person to be tracked, the resulting representations usually contain erroneous data. The problem yet increases when the algorithm relies on a single type of representation (e.g., a bitmap). In this case, the representation cannot be compared with other classifiers, or other types of representations, for accuracy. Hence, the corrupting data cannot be filtered out of the analysis. All of these problems create a practical limit on the achievable accuracy and robustness of tracked images, especially when adverse environmental conditions are encountered such as bad lighting.
To overcome many of these disadvantages, the inventors described an algorithm entitled "Multi-Modal System For Locating Heads And Faces." U.S. Pat. No. 5,834,630, which issued from this application, and which was pending at the date the instant application was filed and which is expressly incorporated by reference as if fully set forth herein, involves combining several different channels, or classifiers, to evaluate objects in images. Using a combination of classifiers (e.g., motion, shape, color, texture, etc.) rather than just a single classifier increases the robustness of the tracked output by enabling the tracking system to compare the results of different channels. Thus, error checking is possible by periodically evaluating and comparing the representations obtained from different channels throughout the analysis.
After the channels have gathered data for a sufficient amount of time, the system controller determines whether a different combination of channels should be used for the remainder of the analysis. For example, channels which are not perceived as producing accurate or robust results are terminated, while other channels which are producing high quality outputs are maintained. Thus, this method provides for a faster and more robust tracking method by maintaining activity on only those channels which are producing effective outputs under the circumstances of the particular tracking application.
Nevertheless, because of the wide variety of different conditions that may be encountered and the practical limitations on training a tracking system by sampling different heads and faces, a need persists in the art for a tracking method which provides for even greater capability to achieve high-quality, robust results with greater error-checking capability.
It is therefore an object of the invention to provide an improved multi-modal method for recognizing objects such as faces and facial features which provides a more flexible tracking strategy in the face of diverse camera and lighting conditions and other variables.
Another object of the invention is to provide an improved method for tracking heads, faces, and facial features which is capable of using both multiple classifiers and multiple types of representations.
Another object of the invention is to provide a more robust and accurate tracked output than existing methods.
Another object of the invention is to provide a method for tracking faces and facial features which selects a tracking strategy based on optimal speed and accuracy of the tracked output.
Another object of the invention is to provide a method for accurately tracking individual facial features including mouths making speech.
Additional objects of the invention will be contemplated by those skilled in the art after perusal of the instant specification, claims, and drawings.