As computers and electronic devices become more and more prevalent in today's society, attempts have been made to develop the human-computer interfaces to provide more personalization. One approach is the use of avatars—iconic representations of users drawn in two or three dimensions. Avatars are used in conventional instant messaging systems and kiosks. In these applications, users interact with avatar representations to communicate with other users or with the operating system.
In the case of instant messaging systems, a user selects his or her own avatar, which then appears to others when the user sends a message. It should be appreciated that some systems can be tailored to permit a recipient of data to choose the avatar to associate with each originator of the data. In either case, these avatars are relatively static, their motions and gestures selected from a small number of pre-created motion templates and not reflective of any actual motion of the person whom the avatar is representing.
Psychological and social studies have concluded that nonverbal communication cues, such as, for example, head gestures, play an important role in personal communication. Thus, in the context of real-time communication, it may be desirable to provide an avatar with motion and gestures that reflect the motion of the person whom the avatar is representing. Such simulated motion would provide the viewer with non-verbal communication information from the user. In order to simulate such motion, some type of head pose estimation or body movement estimation would need to be implemented.
Another use of avatars controlled directly by head gestures of a user is for human and device interaction. For example, speech input to a mobile device with the word “Yes” may not be accurately detected in a noisy environment. However, with detection of user nodding, the user's input can be more reliably determined.
Conventional methods for head pose simulation and/or body pose simulation typically involve tracking full rigid body motion employing three-dimensional models. These methods involve detection of face regions and face features, which require considerable computational power. These methods have the additional drawbacks of requiring specific sensors and/or a model initialization step. In addition, most of these complex models are not robust enough to handle noisy input or head poses outside a narrow range. Thus, conventional methods of head pose estimation and simulation are not practical for many applications, including use in animating avatars on mobile devices.
It may be desirable to provide methods and apparatus for head pose estimation that can be used to animate an avatar in low bandwidth applications and/or is low processing power applications (e.g., due to processor and/or power constraints), such as use on mobile devices. It may also be desirable to use a moving object, including but not limited to a real human head, in front of a camera to generate avatar control signals.