1. Field of the Invention
The present invention relates an information processing apparatus, an information processing method, and a program, and more particularly, to an information processing apparatus, an information processing method, and a program of performing various processes based on user's speech or action.
2. Description of the Related Art
When a user operates a PC, a TV, a recording/reproducing apparatus, or other various home appliances, the user manipulates an input unit, a remote controller, or the like provided to each apparatus to allow the apparatus to perform a desired process. For example, in many cases of using a PC, a keyboard or a mouse is used as an input device. In addition, in many cases of a TV, a recording/reproducing apparatus, or the like, a remote controller is used to perform various processes, for example, channel changing, reproducing content selection, or the like.
A variety of researches have been made into a system of performing instruction to various apparatuses by using user speech or action (gesture). More specifically, there is a system of recognizing user speech by using a voice recognition process, a system of recognizing user action or gesture by using an image process, or the like.
In addition to general input devices such as a remote controller, a keyboard, or a mouse, an interface of performing communication with a user by using a plurality of various communication modes such as voice recognition or image recognition is referred to as a multi-modal interface. The multi-modal interface in the related art is disclosed in, for example, U.S. Pat. No. 6,988,072.
However, a voice recognition apparatus or an image recognition apparatus used for the multi-modal interface or the like has a limitation in a processing capability, so that the understandable user speech or action is limited. Therefore, in the current state, in many cases, the user's intention may not be transferred to the system side.