1. Field of the Invention
The present invention relates to a directional setting apparatus, a directional setting system, a directional setting method and a directional setting program which form directivity of equipments by using a microphone consisted of a plurality of microphones.
2. Related Background Art
Recently, with performance improvement of voice recognition techniques, a voice recognition engine has been actively put into practical use in real circumstances. Especially, in situations where an input apparatuses are limited, i.e. a car navigation system and a mobile apparatus, expectation for voice recognition is high.
In voice recognition processing, an input sound imported from a microphone is compared with a recognition objective vocabulary in order to obtain a voice recognition result. Under real circumstances, because there are various noise sources, ambient noises are included in the sound signal imported by the microphone. In the voice recognition processing, anti-noise performance has large influence on recognition accuracy. For example, assuming the case where the voice recognition is carried out in a car, there are a lot of noises in the car, for example, engine sound of the car, wind sound, sound of an opposite car, sound of passing car and sound of a car stereo. These noises are inputted to a voice recognition apparatus while being mixed in a voice of a speaking person, and degradation of a recognition rate is caused.
As a method of solving a problem of such a noise, a microphone array technique for suppressing noises by using a plurality of microphones is known. In the microphone array technique, signal processings are performed for a sound inputted from a plurality of microphones. Therefore, a sharp directivity is formed in a direction of an objective sound, and an objective sound is emphasized by lowering sensitivity of the other direction.
For example, in the case of a delay sum type of microphone array (delay sum array) described in seventh chapter of “Sound System and Digital processings” (The Institute of Electronics, Information and Communication Engineers, 1995), an output signal Se(t) of the delay sum array is obtained by adding signals Sn(t) (n=1, . . . N) while shifting by a time difference τ depending on the direction of arrival of the objective sound. That is, the emphasized sound signal Se(t) is obtained by the following equation (1).
                              Se          ⁢                                          ⁢                      (            t            )                          =                              ∑                          n              =              1                        N                    ⁢                                          ⁢                      Sn            ⁢                                                  ⁢                          (                              t                +                                  n                  ⁢                                                                          ⁢                  τ                                            )                                                          (        1        )            
Here, a plurality of microphones are arranged in sequence of suffixes n at even intervals.
The delay sum array forms directivity in a direction of the objective sound by using a phase difference of an incoming signal. That is, the delay sum array sets a delay time for the input signal of the microphones taking into consideration a time difference τ by when the incoming signal reaches the microphones. After a phase of sound signals (including an objective signal) from the direction of arrival of the objective sound by delay processings for the signals is set in-phase, the objective signal is emphasized by mutually adding them. On the other hand, with regard to the noise incoming from a direction different from the objective signal, the phases are mutually shifted by the delay processings, and the noise components are weakened to each other.
In such a delay sum array, how to estimate the time difference τ corresponding to the direction of arrival (DOA) of the objective sound is important. If the estimation of τ is mistaken, the phase between the objective sounds after delay is shifted, the objective sound by itself is suppressed, and deterioration of performance occurs.
Thus, in a technique of suppressing noise by using directivity of the microphone, estimation of DOA is extremely important. With regard to estimation of the DOA, research is actively being carried out. As disclosed in the above-mentioned document, various methods such as a linear predictive method, a minimum distribution method and an MUSIC method are proposed.
For example, Japanese Patent Publication Laid-Open No. 9794/1997 discloses a method in which direction of the speaking person is sequentially detected by the microphone, the direction of the speaking person is tracked by updating the directivity of the microphone depending on the direction of the speaking person, thereby suppressing distortion for the objective signal.
However, for example, in the case where a plurality of persons utter, the direction of the speaking person is not necessarily the direction of arrival of the objective sound. For example, only a certain person among a plurality of persons utters the objective sound, and utterance of the other persons may be noise. In this case, the direction of arrival (DOA) of the objective sound has to be set to only a direction of the certain person of a plurality of persons.
In order to ensure the estimation of the DOA, Japanese Patent Publication Laid-Open No. 9794/1997 discloses a method of setting a sound source area in advance, and registering it by association with the keyword. In this document, locations of the speaking persons for the microphone array are registered with the keyword. When the keyword is acknowledged from the input voice, the table in which the locations of the speaking persons and the keywords are registered is referred. The sound source area corresponding to the acknowledged keyword is identified. Hereinafter, a sharp directivity is set to this sound source area. Therefore, it becomes possible to detect a sure DOA, and improve a voice recognition accuracy.
Although a method of Japanese Patent Publication Laid-Open 9794/1997 is effective as a method of surely setting the DOA, the DOA capable of setting, that is, locations of the speaking persons are fixed in advance. There is a problem in which the location of a fixed speaking person has to be registered and recorded with the keyword.