In speech recognition, for example, an acoustic signal is collected by a microphone array that is formed of a plurality of microphones, and sound source localization or sound source separation is performed with respect to the collected acoustic signal. The sound source localization is a process in which a sound source position is estimated. The sound source separation is a process in which a signal of each sound source is extracted from a plurality of sound sources. In speech recognition, a feature quantity is extracted from data obtained by the sound source localization and data obtained by the sound source separation, and the speech recognition is performed on the basis of the extracted feature quantity. A transfer function to each microphone of the microphone array is used in the sound source localization and the sound source separation. The transfer function is calculated by collecting a measurement signal that is output from the sound source using the microphone and obtaining an impulse response from the collected measurement signal. It is possible to obtain the impulse response by outputting an impulse from the sound source and collecting the output impulse.
Regarding the transfer function, two generation methods are known, namely, a theory-based method and an actual measurement-based method. The theory-based method is a method in which the transfer function is obtained by calculation from a theoretical formula of sound propagation. The actual measurement-based method is a method in which a speaker is provided at a sound source position, an impulse response is measured by transmitting a measurement signal such as a TSP (Time-Stretched-Pulse; frequency sweep pattern) signal, and the transfer function is obtained by performing Fourier transform of the impulse response.
The actual measurement-based transfer function is more accurate than the theory-based transfer function. This is because the actual measurement-based transfer function includes all of the influences of actual sound propagation such as the characteristics of the microphone and diffraction by a tool. In order to generate a database (hereinafter, also referred to as a TFDB) in which a transfer function to a plurality of microphones from sound sources in various directions on the actual measurement basis is recorded, a very large amount of time and effort are required. This is because a large number of transfer functions are required. For example, in order to perform the sound source localization with an accuracy of 5° for both the azimuth angle and the elevation angle, a TFDB that includes transfer functions in 2522 (=72×35+2) directions is required. Further, in order to perform the sound source localization with an accuracy of 1° for both the azimuth angle and the elevation angle, transfer functions in 64442 (=360×179+2) directions are required.
For example, Japanese Unexamined Patent Application, First Publication No. 2010-171785 discloses a method in which a transfer function in an intermediate direction is obtained by interpolation from a small number of transfer functions in a limited direction. By using this technique, it is possible to obtain a transfer function of a fine angle without measuring a large number of transfer functions.