The present disclosure relates to an information processing apparatus, information processing method, and program. In particular, the present disclosure relates to an information processing apparatus, information processing method, and program for performing various processes based on user's speech or the like.
When using various home appliances, such as a PC, a television set, and a video recorder/player, a user operates an input unit, a remote controller, or the like equipped to each device to cause the device to perform a desired process. For example, when a PC is used, a keyboard or a mouse is often used as an input device. Also, for a television set or a video recorder/player, a remote controller is often used to perform various processes, such as switching channels and selecting a content to be reproduced.
Various studies have been conducted regarding a system for executing instructions to these various devices with user's speech and motion. Specifically, examples of this system include a system for recognizing user's speech by using voice recognition and a system for recognizing user's action and gesture by using image processing.
An interface for communications with a user by using a plurality of various communication modes including voice recognition and image recognition, in addition to general input devices, such as a remote controller, a keyboard, and a mouse, is called a multi-modal interface. An example of related art regarding the multi-modal interface is disclosed in U.S. Pat. No. 6,988,072.
However, a voice recognition apparatus and an image recognition apparatus for use in this multi-modal interface or the like have a limitation on performance, limiting recognizable user's speech and motions. Therefore, under the present circumstances, it is often the case that the intention of the user is not correctly transferred to a system side.
For an information processing apparatus to provide information to a user or respond to a user's request, there are various methods, such as displaying a message on a display unit and outputting voice and sound effect via a loudspeaker.
However, description based on voice may be garrulous to some users, and may also be missed by some users. Moreover, when description and help information are presented on a display unit, these description and help information are useless if the user is not watching the display unit.
The following documents are examples of related art disclosing the structure for controlling a response from a system.
Japanese Unexamined Patent Application Publication No. 2004-333543 discloses a voice interaction system and voice interaction method, describing the structure for providing the voice interaction system and voice interaction method capable of changing voice output on a system side according to the user's learning degree of using the voice interaction system.
Also, Japanese Unexamined Patent Application Publication No. 2005-202076 discloses a technology for smoother interaction according to the distance between a user and a system. Specifically, in the suggested technology, when a robot and a user are distanced apart from each other, there is a high possibility that voice produced from the robot is inaudible to the user, and therefore the volume of the voice of the robot is turned up for smooth interaction.
However, the structures described in these documents are directed to a process for a specific point, such as the user's learning degree or distance, and observation information from various points of view is not used.
Furthermore, Japanese Unexamined Patent Application Publication No. 2008-217444 discloses an apparatus, method, and program for interaction with a user. Specifically, based on the state of a close watch from the user, a response is changed for natural interaction. When the position of the user is far away or the line of sight is not directed to a television set, a response to a request from the user is performed by using voice. To do this, an infrared ray or a sound wave is used to detect the distance from the user and the direction of the line of sight. However, in this structure, the user disadvantageously wears some device.