Robots that have figures modeled after humans, animals, and the like and are capable of having communication, such as a conversation and the like, with humans have been known. Such robots include a robot that detects a sound generated around the robot, based on output from microphones mounted on the robot and, when determining that the sound is a human voice, changes the direction of its face or the direction of its body to the direction in which the person is present and performs actions such as talking to the person and waving a hand and the like.
Unexamined Japanese Patent Application Kokai Publication No. 2003-266351 discloses that a robot, caused by an input of a sound having amplitude which is equal to or larger than a threshold value to a microphone, detects that a sound event has occurred, estimates a sound source direction, and turns around to the estimated sound source direction.    [Non Patent Literature 1] Andrew J. Davison, “Real-Time Simultaneous Localization and Mapping with a Single Camera”, Proceedings of the 9th IEEE International Conference on Computer Vision Volume 2, 2003, pp. 1403-1410    [Non Patent Literature 2] Richard Hartley, Andrew Zisserman, “Multiple View Geometry in Computer Vision”, Second Edition, Cambridge. University Press, March 2004, chapter 9    [Non Patent Literature 3] Csurka, G., Dance, C. R., Fan, L., Willamowski, J. and Bray, C.: Visual categorization with bags of keypoints, ECCV International Workshop on Statistical Learning in Computer Vision (2004)