The present invention generally relates to text-to-visual speech (TTVS), and more particularly, to a messaging system for displaying emotions (e.g., happiness or anger) in an image of a face.
With the advent of the internet and other networks, users at remote locations are able to communicate with each other in various forms, such as on-line chat (e.g. chat rooms) and e-mail. On-line chat is particularly useful in many situations since it allows users to communicate over a network in real-time by typing text messages to each other in a chat window that shows at least the last few messages from all participating users.
In early instant messaging systems, users personalized text messages by typing in “emoticons” to convey emotions. Examples of commonly used emoticons that can be produced using a standard QWERTY keyboard include :-) representing a smiling face, :-< representing sadness, :-( representing dismay or anger, and >:-< representing extreme anger. Unfortunately, even with the widespread use of such typed emoticons, on-line chat tends to be impersonal, and requires the user read “between the lines” of a message in order to understand the emotional state of the sender. Newer instant messaging systems allow users to access the library of icons that provided expressions of emotions; for example,  for dismay or anger).
Mobile devices, such as cell phones with text messaging capabilities or personal digital assistants with communications capabilities, are becoming more and more popular for electronic chats. Text-based chatting using such mobile devices is difficult, however, because the display screens in such devices are typically too small to display complex messages, such as messages including a number of emoticons in addition to the typed text. Users are forced to restrain their use of emoticons in order to dedicate much of the screen to text. With currently available systems, if two users want to clearly convey emotional states during an electronic chat, the users must resort to video technology by using Web cameras or other networked video cameras. Chats conducted using such video technology consume a significant amount of network bandwidth and require the use of significant data processing resources.
“Text to visual speech” systems utilize a keyboard or an equivalent character input device to enter text, convert the text into a spoken message, and broadcast the spoken message along with an animated face image. One of the limitations of existing text-to-visual speech systems is that, because the author of the message is simply typing in text, the output (i.e., the animated face and spoken message) may not convey the emotions the sender would like to convey.