1. Field of Invention
This invention relates generally to animation producing methods and apparatuses, and more particularly is directed to a method for automatically animating lip synchronization and facial expression for three dimensional characters.
2. Description of the Related Art
Various methods have been proposed for animating lip synchronization and facial expressions of animated characters in animated products such as movies, videos, cartoons, CD""s, and the like. Prior methods in this area have long suffered from the need of providing an economical means of animating lip synchronization and character expression in the production of animated products due to the extremely laborious and lengthy protocols of such prior traditional and computer animation techniques. These shortcomings have significantly limited all prior lip synchronization and facial expression methods and apparatuses used for the production of animated products. Indeed, the limitations of cost, time required to produce an adequate lip synchronization or facial expression in an animated product, and the inherent imitations of prior methods and apparatuses to satisfactorily provide lip synchronization or express character feelings and emotion, leave a significant gap in the potential of animated methods and apparatuses in the current state of the art.
Time aligned phonetic transcriptions (TAPTS) are a phonetic transcription of a recorded text or soundtrack, where the occurrence in time of each phoneme is also recorded. A xe2x80x9cphonemexe2x80x9d is defined as the smallest unit of speech, and corresponds to a single sound. There are several standard phonetic xe2x80x9calphabetsxe2x80x9d such as the International Phonetic Alphabet, and TIMIT created by Texas instruments, Inc. and MIT. Such transcriptions can be created by hand, as they currently are in the traditional animation industry and are called xe2x80x9cxxe2x80x9d sheets, or xe2x80x9cgray sheetsxe2x80x9d in the trade. Alternatively such transcriptions can be created by automatic speech recognition programs, or the like.
The current practice for three dimensional computer generated speech animation is by manual techniques commonly using a xe2x80x9cmorph targetxe2x80x9d approach. In this practice a reference model of a neutral mouth position, and several other mouth positions, each corresponding to a different phoneme or set of phonemes is used. These models are called xe2x80x9cmorph targetsxe2x80x9d. Each morph target has the same topology as the neutral model, the same number of vertices, and each vertex on each model logically corresponds to a vertex on each other model, or example, vertex #n on all models represents the left corner of the mouth, and although this is the typical case, such rigid correspondence may not be necessary.
The deltas of each vertex on each morph target relative to the neutral are computed as a vector from each vertex n on the reference to each vertex n on each morph target. These are called the delta sets. There is one delta set for each morph target.
In producing animation products, a value usually from 0 to 1 is assigned to each delta set by the animator and the value is called the xe2x80x9cmorph weightxe2x80x9d. From these morph weights, the neutral""s geometry is modified as follows: Each vertex N on the neutral has the corresponding delta set""s vertex multiplied by the scalar morph weight added to it. This is repeated for each morph target, and the result summed. For each vertex v in the neutral model:
|result|=|neutral|+xcexa3x=1n|delta setx|morph weigh 
where the symbol |xxx| is used to indicate the corresponding vector in each referenced set. For example, |result| is the corresponding resultant vertex to vertex v in the neutral model |neutral| and |delta setx| is the corresponding vector for delta set x.
If the morph weight of the delta set corresponding to the morph target of the character saying, for example, the xe2x80x9cohxe2x80x9d sound is set to 1, and all others are set to 0, the neutral would be modified to look like the xe2x80x9coh target. If the situation was the same, except that the xe2x80x9cohxe2x80x9d morph weight was 0.5, the neutral""s geometry is modified half way between neutral and the xe2x80x9cohxe2x80x9d morph target.
Similarly, if the situation was as described above, except xe2x80x9cohxe2x80x9d weight was 0.3 and the xe2x80x9ceexe2x80x9d morph weight was at 0.7, the neutral geometry is modified to have some of the xe2x80x9cohxe2x80x9d model characteristics and more of the xe2x80x9ceexe2x80x9d model characteristics. There also are prior blending methods including averaging the delta sets according to their weights.
Accordingly, to animate speech, the artist needs to set all of these weights at each frame to an appropriate value. Usually this is assisted by using a xe2x80x9ckeyframexe2x80x9d approach, where the artist sets the appropriate weights at certain important times (xe2x80x9ckeyframesxe2x80x9d) and a program interpolates each of the channels at each frame. Such keyframe approach is very tedious and time consuming, as well as inaccurate due to the large number of keyframes necessary to depict speech.
The present invention overcomes many of the deficiencies of the prior art and obtains its objectives by providing an integrated method embodied in computer software for use with a computer for the rapid, efficient lip synchronization and manipulation of character facial expressions, thereby allowing for rapid, creative, and expressive animation products to be produced in a very cost effective manner.
Accordingly, it is the primary object of this invention to provide a method for automatically animating lip synchronization and facial expression of three dimensional characters, which is integrated with computer means for producing accurate and realistic lip synchronization and facial expressions in animated characters. The method of the present invention further provides an extremely rapid and cost effective means to automatically create lip synchronization and facial expression in three dimensional animated characters.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the appended claims.
To achieve the foregoing objects, and in accordance with the purpose of the invention as embodied and broadly described herein, a method is provided for controlling and automatically animating lip synchronization and facial expressions of three dimensional animated characters using weighted morph targets and time aligned phonetic transcriptions of recorded text, and other time aligned data. The method utilizes a set of rules that determine the systems output comprising a stream or streams of morph weight sets when a sequence of timed phonemes or other timed data is encountered. Other timed data, such as pitch, amplitued, noise amounts, or emotional state data or emotemes such as xe2x80x9csurprise, xe2x80x9cdisgust, xe2x80x9cembarrassmentxe2x80x9d, xe2x80x9ctimid smilexe2x80x9d, or the like, may be inputted to affect the output stream of morph weight sets.
The methodology herein described allows for automatically animating lip synchronization and facial expression of three dimensional characters in the creation of a wide variety of animation products, including but not limited to movies, videos, cartoons, CD""s, software, and the like. The method and apparatuses herein described are operably integrated with computer software and hardware.
In accordance with the present invention there also is provided a method for automatically animating lip synchronization and facial expression of three dimensional characters for films, videos, cartoons, and other animation products, comprising configuring a set of default correspondence rules between a plurality of visual phoneme groups and a plurality of morph weight sets; and specifying a plurality of morph weight set transition rules for specifying durational data for the generation of transitionary curves between the plurality of morph weight sets, allowing for the production of a stream of specified morph weigh sets to be processed by a computer animation system for integration with other animation, whereby animated lip synchronization and facial expression of animated characters may be automatically controlled and produced.