1. Field of the Invention
The present invention relates to a technology for detecting a sound source and autonomously following the sound source.
2. Description of the Related Art
A robot device has been known in the art that follows a sound source, such as a person, by detecting visual features of the sound source. The visual features are detected from an image of the sound source. The image can be acquired by an image acquiring unit such as a camera. However, there is a limitation on taking images of a person as a sound source. For example, an image of the person cannot be taken when the person steps out of the viewing field of the camera, or when the person walks farther than the visually detectable range, or when the person is hidden behind an obstacle.
If the person is lost from the sight of the camera due to some reason, one approach is to catch sound generated by the person, locate the person from the sound, and turn the camera towards the person, or move towards the person. An example of such a technology is a video conference system that turns a camera in a direction of sound to catch sight of a speaker and frames a face of the speaker.
However, there can be other sound sources around the person. For example, there can be a sound producing door or a television around the target person. In such a case, the sound caught can not necessarily be the voice of the target person, and therefore the robot cannot auditorily locate the person.
There is a demand for a function capable of distinguish a voice of a target person from voices of other persons and other sounds. To realize the technology, the robot device needs to be able to identify auditory features as well as the visual features.
JP-A 2002-307349 (KOKAI) discloses a robot device that detects an object by using visual and audio information of the sound source. A character string acquired by recognizing a word vocalized by a person when a certain object is visible is stored in combination with the image of the object. A user shows the object to the robot device and vocalizes a name of the object, whereby the robot device memorizes the name of the object. When the robot device visually detects the object, the robot device vocalizes the name associated with the image using a synthetic voice.
However, the technology only supplementarily uses the auditory detection for visual detection of the object. Moreover, the sound associated with the image of the object is vocalized by the user that shows the object and not produced from the object. For these reasons, if the robot device loses sight of a certain sound source, there is a risk that the robot device can neither detect nor follow the sound source.