1. Field of the Invention
The present invention generally relates to robotic systems, and more particularly to a robotic system and a related method for reproducing a real person's facial expression and speech synchronously and simultaneously.
2. The Prior Arts
Recent robotic researches have shifted from traditional autonomous robots designed to operate as independently and remotely as possible from humans to humanoid robots that can communicate in a manner that supports the natural communication modalities of humans such as facial expression, body posture, gesture, gaze direction, voice, etc.
One such humanoid robot currently under development is the Kismet robot by the Robotics and Artificial Intelligence Laboratory of Massachusetts Institute of Technology. Kismet has a 15 degree-of-freedom robotic head whose ears, eyebrows, eyelids, lips, jaw, etc., are driven by actuators to display a wide assortment of facial expressions. For example, Kismet has four lip actuators, one at each corner of the mouth, so that the mouth can be curled upwards for a smile or downwards for a frown. Similarly, each eyebrow of Kismet can be lowered and furrowed in frustration, or elevated upwards for surprise. More details about Kismet could be found in the article “Toward Teaching a Robot ‘Infant’ using Emotive Communication Acts,” by Breazeal, C. and Velasquez, J., in Proceedings of 1998 Simulation of Adaptive Behavior, workshop on Socially Situtated Intelligence, Zurich, Switzerland, pp. 25-40, 1998.
Another similar research is the Tokyo-3 robot by the Hara Laboratory of Tokyo University of Science. The Tokyo-3 robotic head has a facial skin made of silicone so its facial expression is more resembling to that of real human. The actuators of Tokyo-3 robotic head drive 18 characteristic points of the facial skin to imitate various human expressions such as happiness, anger, sadness, resentment, surprise, horror, etc. More details about the Tokyo-3 robot could be found in the article “Artificial Emotion of Face Robot through Learning in Communicative Interactions with Human,” by Fumio Hara, JST CREST International Symposium on Robot and Human Interactive Communication, Kurashiki Ivy Square, Kurashiki, Okayama, Japan, Sep. 20, 2004.
The focus of these foregoing researches is to engage the robot into natural and expressive face-to-face interaction with human. To achieve this goal, the robot usually perceives a variety of natural social cues from visual and auditory channels, and, in response to these sensory stimuli, delivers social signals to the human through gaze direction, facial expression, body posture, and vocal babbles autonomously. On the other hand, researches in seemingly unrelated areas such as pattern recognition and computer animation and modeling suggest an interesting application of the humanoid robotic head. For example, Pighin et al. (in the article “Synthesizing Realistic Facial Expressions from Photographs,” by Pighin, F., Hecker, J., Lischinski, D., Szeliski, R., and Salesin, D. in SIGGRAPH 98 Conference Proceedings, pp. 75-84, ACM SIGGRAPH, July 1998) presents a technique for creating highly realistic face models and natural looking animations. Pighin et al. generates a 3D face model of a person by deriving feature points on several 2D images of the person's face from different viewpoints and using the feature points to compute the positions of the remaining face mesh vertices. Separate face models corresponding to the person's different facial expressions could be produced in this way. Pighin et al. then create smooth transitions between different facial expressions by 3D shape morphing between these different face models. It should be obvious that the technique of Pighin et al. could be readily adapted to the humanoid robotic head, for example, by locating the feature points at where the face actuators is positioned and using 3D shape morphing to guide the operation of the actuators. The result would be a humanoid robotic head, instead of generating generically human-like expressions, but actually reproducing a specific real person's facial expression in very high degree of resemblance. Many similar facial expression interpretation techniques such as using neural networks, multiple point integrations, etc. could be found in the literature.
Besides facial expressions, another social signal delivered by the humanoid robotic heads of recent researches is the voice. For example, Kismet is equipped with a synthesizer that models the physiological characteristics of human's articulatory tract. By adjusting the parameters of the synthesizer, Kismet is possible to convey speaker personality as well as adding emotional qualities to the synthesized speech. Despite that, the humanoid robotic heads by recent researches are still made to deliver generically human-like voice, not a specific real person's voice. Following the thought of making a humanoid robotic head to reproduce a specific person's facial expression, it would make an even more interesting application if the person's own voice is pre-recorded and then played synchronously along with the humanoid robotic head's delivery of the person's facial expression.