The present invention relates to dynamic facial feature sensing, and more particularly, to a vision-based motion capture system that allows real-time finding, tracking and classification of facial features for input into a graphics engine that animates an avatar.
Virtual spaces filled with avatars are an attractive way to allow for the experience of a shared environment. However, existing shared environments generally lack facial feature sensing of sufficient quality to allow for the incarnation of a user, i.e., the endowment of an avatar with the likeness, expressions or gestures of the user. Quality facial feature sensing is a significant advantage because facial gestures are a primordial means of communications. Thus, the incarnation of a user augments the attractiveness of virtual spaces.
Existing methods of facial feature sensing typically use markers that are glued to a person""s face. The use of markers for facial motion capture is cumbersome and has generally restricted the use of facial motion capture to high-cost applications such as movie production. Accordingly, there exists a significant need for a vision based motion capture systems that implements convenient and efficient facial feature sensing. The present invention satisfies this need.
The present invention is embodied in an apparatus, and related method, for sensing a person""s facial movements, features or characteristic. The results of the facial sensing may be used to animate an avatar image. The avatar apparatus uses an image processing technique based on model graphs and bunch graphs that efficiently represent image features as jets composed of wavelet transforms at landmarks on a facial image corresponding to readily identifiable features. The sensing system allows tracking of a person""s natural characteristics without any unnatural elements to interfere with the person""s natural characteristics.
The feature sensing process operates on a sequence of image frames transforming each image frame using a wavelet transformation to generate a transformed image frame. Node locations associated with wavelets jets of a model graph to the transformed image frame are initialized by moving the model graph across the transformed image frame and placing the model graph at a location in the transformed image frame of maximum jet similarity between the wavelet jets at the node locations and the transformed image frame. The location of one or more node locations of the model graph is tracked between image frames. A tracked node is reinitialized if the node""s position deviates beyond a predetermined position constraint between image frames.
In one embodiment of the invention, the facial feature finding may be based on elastic bunch graph matching for individualizing a head model. Also, the model graph for facial image analysis may include a plurality of location nodes (e.g., 18) associated with distinguishing features on a human face.
Other features and advantages of the present invention should be apparent from the following description of the preferred embodiments, taken in conjunction with the accompanying drawings, which illustrate, by way of example, the principles of the invention.