For such robots of human and animal types, the attention has in recent years been drawn to active senses of vision and audition.
Active senses are such that, for targeting the sensing apparatuses to deal with a robot's vision and audition to the target, the head, for example, which supports such sensing apparatuses is posture-controlled by drive means.
Here, as for the active vision, at least a camera as a sensing apparatus holds its optical axis toward the target by posture-control using drive means, automatically conducts focusing, zoom-in, zoom-out, and others, thereby takes the pictures of the target, and various studies are being made about it.
On the other hand, as for the active audition, at least a microphone as a sensing apparatus holds its directivity toward the target by posture-control using drive means, and collects sounds from the target. In this case, as a disadvantage of active audition, the microphone picks up the operation noise of the drive means while the drive means operates, thereby relatively big noise is mixed into the sound from the target, and the sound from the target becomes unrecognizable. In order to exclude such disadvantage of active audition, such a method is adopted as to accurately recognize the sound from the target by, for example, determining orientation of the sound source referring to the visual information.
Here, for such active audition in general, Interaural Phase Difference (IPD) and Interaural Intensity Difference (IID) obtained from Head-Related Transfer Function (HRTF) are utilized upon sound source localization, thereby sound source is oriented. However, the sound source localization utilizing said HRTF needs to take into consideration even the sound environment of the room, and the result of sound source localization is largely influenced by the change of room environment. There is also such a problem that complement is required for the value between the observed values due to measurement function. Therefore, it is not suited to the real environment application.
The sound source localization utilizing so called epipolar geometry for vision may be considered not depending on HRTF, but the sound source localization utilizing a conventional epipolar geometry conducts it based on the triangle connecting two ears and the target. However, in such a triangle in the epipolar geometry, one side of the triangle penetrates the head portion, but the real sound from the target does not penetrate the head portion, but rather is transmitted along its surface, hence accurate sound source localization could not be performed.
Further for the sound source separation, there is such a method as to utilize a so called direction pass filter, and select the sub-band having the same IPD as that of a specific direction. However, with such a direction pass filter, the difference in sensitivity by direction and active motion are not considered, thereby the accuracy of sound source localization is lowered except for the front where the sensitivity is good, as well as HRTF is utilized which is a measurement function in prior arts. Therefore, it is hard to correspond to the real environment and the dynamic change in environment, and further there was such a problem that interpolation of HRTF was required for active motion.