This invention is related to the field of use of artificial intelligence. More particularly, this invention is directed to application of personality models and interaction with synthetic characters in a computing system.
Computer systems attempting to provide more xe2x80x9chuman-likexe2x80x9d interfaces often employ such technologies as speech recognition and voice control as command input interfaces, and synthesized speech and animated characters as output interfaces. In other words, these computers provide interactions through use of simulated human speech and/or animated characters.
Potential applications for these input/output interfaces are numerous, and offer the possibility of allowing people who are not computer proficient to use a computer without learning the specifics of a particular operating system. For example, an application may include a personality within the computer to simulate a personal assistant, thus creating a different interface to databases and/or schedules. In applications for entertainment, a system with these capabilities may implement role-playing in games, simulate interaction with historical figures for education purposes, or simulate interaction with famous rock singers or movie stars.
Currently, systems are focused on understanding speech content, and reacting to the words. Although this is a challenging endeavor in itself, once some of these obstacles are overcome, it will be important to also interpret other aspects of the interaction if it is desired to achieve a more natural interaction between humans and computers. Moreover, even if the state of the art of speech recognition dramatically improves, a combination of interfaces will increase the quality and the accuracy of the interface/interaction of the computer.
In one embodiment, an apparatus includes a video input unit and an audio input unit. The apparatus also includes a multisensor fusion/recognition unit coupled to the video input unit and the audio input unit, and a processor coupled to the multisensor fusion/recognition unit. The multisensor fusion/recognition unit decodes a combined video and audio stream containing a set of user inputs.