From the standpoint of improving robustness in perception, it is essential to integrate and synthesize various types of information. For example, a time-wise integration method for audiovisual information (see Document No. 1), the McGruck Effect utilized in sound recognition (see Document No. 2), and an audiovisual integration method utilized in sound source localization (see Document No. 3), are known. Furthermore, a method for carrying out sound source localization by a microphone built into a robot has been proposed (see Document No. 4), and further, a method for carrying out sound source localization by means of a microphone array affixed to a ceiling or the like has also been proposed (see Document No. 5).    Document No. 1: Makoto Nagamori et al., “A Framework for Multi-Domain Conversational Systems,” Information Processing Society of Japan Research Report, 2000-SLP-31-7, 2000.    Document No. 2: Nobuo Kawaguchi et al., “Design and Evaluation of a Unified Management Architecture for Multi-Domain Spoken Dialogue,” Information Processing Society of Japan Research Report, 2001-SLP-36-10, 2001.    Document No. 3: I. O'Neill et al., “Cross Domain Dialogue Modeling: An Object-Based Approach,” In Proc. ICSLP, Vol. 1, 2004.    Document No. 4: Japanese Laid-Open Patent Publication No. 2004-198656    Document No. 5: Japanese Laid-Open Patent Publication No. 2006-121709
However, in the aforementioned prior art techniques, there is still room for improvement, taking into consideration robustness and accuracy with respect to tracking of sound source localization.
The present invention offers a solution by providing a sound source tracking system, which aims to improve performance in tracking of sound source localization from the viewpoint of robustness and accuracy.