The present invention relates of a sound collection apparatus and method, and can be applied to a sound collection apparatus that collects and emphasizes only sounds of a specific direction under an environment where a plurality of sound sources are present.
As technology that collects and emphasizes only sounds of a certain specific direction under an environment where a plurality of sound sources are present, there is a beam former (hereinafter, called a “BF”) using microphone arrays. A BF is technology that forms a directionality by using a time difference of signals arriving at a plurality of microphones (refer to Futoshi Asano (Author), “Sound technology series 16: Array signal processing for acoustics: localization, tracking and separation of sound sources”, The Acoustical Society of Japan Edition, Corona publishing Co. Ltd, publication date: Feb. 25, 2011).
A BF can be roughly divided into the two types of an addition-type and a subtraction-type. In particular, a subtraction-type BF has the advantage of being able to form a directionality with a small number of microphones, compared to an addition-type BF.
FIG. 3 is a block diagram that shows a configuration of a sound collection apparatus PS in which a conventional subtraction-type BF is adopted. In FIG. 3, a case is illustrated where the sound collection apparatus PS includes two microphones.
When sounds present in a target direction (hereinafter, called “target sounds”) arrive at each of the microphones M1 and M2, a delayer DEL calculates a time difference of the signals arriving at the microphones M1 and M2, and causes the phases of the target sounds to match by adding a delay. The time difference is calculated by the following Formula (1).τi=(d sin θL)/c  (1)
In Formula (1), d is a distance between the microphones M1 and M2, c is the speed of sound, and ti is a delay amount (time difference). Further, θL is an angle from the vertical direction to the target direction with respect to a straight line connecting the microphones M1 and M2.
Here, in the case where a dead angle is present in the direction of the microphone M1, with respect to the center of the microphones M1 and M2, a delay process is performed for an input signal x1(t) of the microphone M1. Afterwards, a subtractor SUB performs a subtraction process in accordance with Formula (2).a(t)=x2(t)−x1(t−τL)  (2)
The subtraction process can also be similarly performed in a frequency domain. In this case, Formula (2) is changed as follows.A(ω)=X2(ω)−e−jωτLX1(ω)  (3)
Here, in the case of θL=±π/2, the directionalities formed by the microphones M1 and M2 become a cardioid-shaped unidirectionality, such as shown in FIG. 4A. On the other hand, in the case of θL=0, π, the directionalities formed by the microphones M1 and M2 become an 8-shaped bi-directionality, such as shown in FIG. 4B. Hereinafter, a filter that forms a unidirectional from input signals will be called a unidirectional filter, and a filter that forms a bi-directionality will be called a bi-directional filter.
The subtractor SUB can form a directionality that is strong in a dead angle of bi-directionality by using a spectral subtraction technique (hereinafter, called “SS”).
The subtractor SUB performs the formation of a directionality by SS in accordance with Formula (4). In Formula (4), the input signal X1 of the microphone M1 is used. Note that a similar effect can also be obtained in the case where the input signal X2 of the microphone M2 is used. Here, β is a coefficient for adjusting the strength of SS. In the case where the value becomes negative at the time of subtraction, a flooring process is performed that replaces the negative value with 0 or a value obtained by reducing the original value. By extracting sounds other than those in a target direction (hereinafter, called “non-target sounds”) by the bi-directional filter, and subtracting amplitude spectrums of the extracted non-target sounds from an amplitude spectrum of the input signal, this method can emphasize target sounds.|Y(ω)|=|X1(ω)|−β|A(ω)|  (4)
A sharp directionality can be formed in the target sound direction, if using the above subtraction-type BF.
However, in the case where only sounds present within a certain specific area (hereinafter, called “target area sounds”) are wanted to be collected, the directionality of the subtraction-type BF will be linear. Accordingly, there will be the problem of sound sources present in the same direction as a target area (hereinafter, called “non-target area sounds”) also being collected.
In JP 2014-72708A, a technique has been proposed where target area sounds are collected by directing directionalities from different directions to a target area, using a plurality of microphone arrays MA1 and MA2, and causing the directionalities to intersect at the target area.