1. Field of the Invention
The present invention relates generally to lip sync animation tools and more specifically it relates to a text-derived speech animation tool for producing simple, effective animations of digital media content that educate, entertain, and inform viewers by the presentation of speaking digital characters. The invention makes the creation of digital talking characters both easy and effective to produce.
2. Description of the Prior Art
It can be appreciated that lip sync animation tools have been in use for years. Typically, lip sync animation tools are comprised of Morph Magic (for 3D Studio Max), Morph Gizmo (for Lightwave), Shape Shifter (for Power Animator), Mimic (made by Lip Sinc Co.), Smirk (made by Lambsoft Co.), Famous Face (made by Famous Technology Co.), TalkMaster (made by Comhertz Co.), Automatic Lipsync (made by Fluent Speech Technologies Co.). Existing products generally can be divided into three categories, and problems with each are best described in relation to each category. The first category (A) are manual lip syncing products which generally require digital artists to manually sync up the animation of character lips to a pre-recorded sound voice track. Every new speech to be animated requires the same manual construction of animation information to attempt to synchronize with the voice track. The second category (B) is voice-driven products where a character animation of lip sync is automatically constructed from a processed analysis of the recorded speech of a real person. The third category (C) of products are text-driven speech animation programs, where a digital artist enters text as dialogue for the characters and the product automatically generates both a speech animation of the character and a sound dialogue track of the character voice.
The main problem with conventional lip sync animation tools are the complexity of trying to sync up lip motions to speech in a conscious manner, when in reality (as you speak), the lip motion is a totally unconscious and automatic derivative result of the intent to vocalize words. Category (A) products are most prone to this problem. The user of these products must try to consciously and logically do something that is fundamentally sub-conscious, automatic and never thought about in real life. The process is also time consuming and changes in the speech content require extensive efforts to modify the animation accordingly.
Another problem with conventional lip sync animation tools are the separation of the processes for generating the voice recording and performing the facial animation. Both Category (A) and (B) type products are prone to this problem. A voice talent person is recorded speaking the desired dialogue. This is done in a recording studio with professional audio recording specialists, equipment and facilities. The voice recording is then given to digital artists to use to create the facial animation. If at a later point in time, there is a desire or need to alter the dialogue content for any reason, the entire process of bringing voice talent into a recording studio must be repeated before the digital artist has a new sound track to work with to produce the new animation sequence. By making the voice recording a completely separate process requiring separate equipment, facilities and skilled employees, the digital animation process is unnecessarily complicated.
Another problem with conventional lip sync animation tools are the structure of the synthesized speech. Category (C) products are most prone to this problem. Products in this category currently use an unnatural division of sound components to translate text into synthesized speech, the audio portion of the resulting animation. Speech may be divided or broken down into phoneme modules (fundamental speech sounds) or syllabic modules (the syllables people actually use in real speech). A phoneme-based text translation process is simpler in that there are less phonemes than syllables in speech, but the result is unnatural because real speech is syllabic and all dramatic quality and character of human speech derives from modulation, emphasis, and pace of the syllables, not the phonemes.
While these devices may be suitable for the particular purpose which they address, they are not as suitable for producing simple, effective animations of digital media content that educate, entertain, and inform viewers by the presentation of speaking digital characters. The invention makes the creation of digital talking characters both easy and effective to produce.
In these respects, the text-derived speech animation tool according to the present invention substantially departs from the conventional concepts and designs of the prior art, and in so doing provides an apparatus primarily developed for the purpose of producing simple, effective animations of digital media content that educate, entertain, and inform viewers by the presentation of speaking digital characters. The invention makes the creation of digital talking characters both easy and effective to produce.