The present invention relates to a sound pick-up apparatus and method, that are applicable, for example, when sounds in a specific area are emphasized and sounds in the other areas are reduced.
As technology that collects and separates only sounds in a specific direction in an environment in which a plurality of sound sources are present, there is a beam former (which will be referred to as “BF”) using microphone arrays. The BF is technology that forms directionality by using the time difference in signals arriving at the respective microphones (see Futoshi Asano (Author), “Sound technology series 16: Array signal processing for acoustics: localization, tracking and separation of sound sources,” The Acoustical Society of Japan Edition, Corona publishing Co. Ltd, publication date: Feb. 25, 2011). The BF roughly comes in two types: an addition-type and a subtraction-type. In particular, a subtraction-type BF can advantageously form directionality with a smaller number of microphones as compared to an addition-type BF. FIG. 6 is a block diagram illustrating the configuration of a sound pick-up apparatus PS to which the conventional subtraction-type BF including two microphones is applied. The sound pick-up apparatus PS to which the conventional subtraction-type BF is applied first uses a delayer to calculate the signal time difference in sounds in a target direction (which will be referred to as “target sounds”) which arrive at microphones M1 and M2, and then obtains the target sounds in phase by adding delay.
The sound pick-up apparatus PS calculates the time difference on the basis of the following expression (1). In the expression (1), d represents the distance between the microphones, c represents the speed of sound, and τt represents the delay amount. Further, in the expression (1), θL represents the angle from the vertical direction to the target direction with respect to the straight line connecting the microphones.τL=(d sin θL)/c  (1)
Here, if there is a dead angle in the direction of the microphone M1 with respect to the center of the microphones M1 and M2, the sound pick-up apparatus PS performs delay processing on an input signal χ1(t) of the microphone M1. Afterwards, the sound pick-up apparatus PS uses a subtractor to perform signal processing in accordance with an expression (2).m(t)=x2(t)−x1(t−τL)  (2)
The sound pick-up apparatus PS can similarly perform subtraction processing in the frequency domain. In that case, the expression (2) is changed into the following expression (3).M(ω)=X2(ω)−e−jωτLX1(ω)  (3)
If θL=±π/2, the sound pick-up apparatus PS forms cardioid unidirectionality as illustrated in FIG. 7A. Meanwhile, if θL=0 or π, the sound pick-up apparatus PS forms 8-shaped bidirectionality as illustrated in FIG. 7B. A filter that forms unidirectionality from input signals will be referred to as “unidirectional filter,” and a filter that forms bidirectionality will be referred to as “bidirectional filter.”
The sound pick-up apparatus PS can form directionality that is strong in a dead angle of bidirectionality by using a spectral subtraction (which will be referred to as “SS”). The directionality of the sound pick-up apparatus PS using SS is formed in all the frequency bands or a specified frequency band in accordance with an expression (4). The expression (4) uses an input signal X1 of the microphone M1, but it is also possible to attain the similar advantageous effects by using an input signal X2 of the microphone M2. In the expression (4), β represents a coefficient for adjusting the strength of SS. If SS processing (subtraction processing) yields a negative value, the sound pick-up apparatus PS performs flooring processing of replacing the negative value with 0 or a value obtained by reducing the original value. If the SS processing is used, the sound pick-up apparatus PS can emphasize target sounds by extracting sounds in a direction other than a target direction (which will be referred to as “non-target sounds”) with the bidirectional filter, and subtracting the amplitude spectrum of the extracted non-target sounds from the amplitude spectrum of the input signals.Y(n)=X1(n)−ΣM(n)  (4)
If the conventional sound pick-up apparatus PS uses the subtraction-type BF alone to collect only sounds in a specific area (which will be referred to as “target area sounds”), the conventional sound pick-up apparatus PS would also probably collect sounds from a sound source around the area (non-target area sounds).
JP 2014-072708A proposes an area sound pick-up apparatus that collects target area sounds by directing directionalities from different directions to a target area, and causing the directionalities to intersect in the target area with a plurality of microphone arrays. The area sound pick-up apparatus described in JP 2014-072708A first estimates the power ratio of target area sounds included in the BF output of each microphone array, and then uses the power ratio as a correction coefficient. If the area sound pick-up apparatus described in JP 2014-072708A uses two microphone arrays as an example, the correction coefficient of the target area sound power is calculated on the basis of the following expressions (5) and (6), or (7) and (8).
                                                        α              1                        ⁡                          (              n              )                                =                                                    mode                (                                                                            Y                                              2                        ⁢                                                                                                  ⁢                        k                                                              ⁡                                          (                      n                      )                                                                                                  Y                                              1                        ⁢                        k                                                              ⁡                                          (                      n                      )                                                                      )                            ⁢                                                          ⁢              k                        =            1                          ,        2        ,        …        ⁢                                  ,        N                            (        5        )                                                                    α              2                        ⁡                          (              n              )                                =                                                    mode                (                                                                            Y                                              1                        ⁢                        k                                                              ⁡                                          (                      n                      )                                                                                                  Y                                              2                        ⁢                        k                                                              ⁡                                          (                      n                      )                                                                      )                            ⁢                                                          ⁢              k                        =            1                          ,        2        ,        …        ⁢                                  ,        N                            (        6        )                                                                    α              1                        ⁡                          (              n              )                                =                                                    median                (                                                                            Y                                              2                        ⁢                                                                                                  ⁢                        k                                                              ⁡                                          (                      n                      )                                                                                                  Y                                              1                        ⁢                        k                                                              ⁡                                          (                      n                      )                                                                      )                            ⁢                                                          ⁢              k                        =            1                          ,        2        ,        …        ⁢                                  ,        N                            (        7        )                                                                    α              2                        ⁡                          (              n              )                                =                                                    median                (                                                                            Y                                              1                        ⁢                        k                                                              ⁡                                          (                      n                      )                                                                                                  Y                                              2                        ⁢                        k                                                              ⁡                                          (                      n                      )                                                                      )                            ⁢                                                          ⁢              k                        =            1                          ,        2        ,        …        ⁢                                  ,        N                            (        8        )            
In the expressions (5) to (8), Y1κ(n) and Y2κ(n) respectively represent the amplitude spectra of the BF outputs of the first and second microphone arrays. N represents the total number of frequency bins. K represents a frequency. α1(n) and α2(n) represent the power correction coefficients for the respective BF outputs. Further, in the expressions (5) to (8), mode represents a mode value, and median represents a median value.
Afterwards, the area sound pick-up apparatus described in JP 2014-072708A corrects each BF output and does SS by using the correction coefficient, thereby extracting non-target area sounds in the target area direction. The area sound pick-up apparatus described in JP 2014-072708A can extract target area sounds by further doing SS of the extracted non-target area sounds from each BF output. When extracting a non-target area sound N1(n) in the target area direction seen from a first microphone array, the area sound pick-up apparatus described in JP 2014-072708A does SS of a BF output Y2(n) of a second microphone array which has been multiplied by a power correction coefficient α2 from a BF output Y1(n) of the first microphone array as shown in the following expression (9). Further, the area sound pick-up apparatus described in JP 2014-072708A makes a calculation according to an expression (10) to extract a non-target area sound N2(n) in the target area direction seen from the second microphone array.N1(n)=Y1(n)−α2(n)Y2(n)  (9)N2(n)=Y2(n)−α1(n)Y1(n)  (10)
Afterwards, the area sound pick-up apparatus described in JP 2014-072708A does SS of the non-target area sounds from the respective BF outputs in accordance with expressions (11) and (12) to extract the target area sounds. In the expressions (11) and (12), γ1(n) and γ2(n) represent coefficients for changing the strength at the time of SS.Z1(n)=Y1(n)−γ1(n)N1(n)  (11)Z2(n)=Y2(n)−γ2(n)N2(n)  (12)