Firstly, a framework of basic signal processing will be described.
It is assumed that an array formed of M microphones is used. M is an integer equal to or larger than 2. For example, it is assumed that M is on the order of 2 to 4. M may be on the order of 100. In an observation signal Xm(ω, τ) (m=1, 2, . . . , M) at a frequency ω and a frame time τ, one target sound S0(ω, τ) and K interference noises Sk(ω, τ) (k=1, 2, . . . , K) that are coherent and non-stationary and an incoherent stationary noise Nm(ω, τ) are included. K is to be a predetermined positive integer. m is the number for each microphone, and the observation signal Xm(ω, τ) is a signal obtained by converting a time domain signal collected using the microphone m into a frequency domain.
A target sound is a sound coming from a predetermined target area. A target area is an area in which a sound source desired to be collected is included. The number of the sound sources desired to be collected and the position of the sound source desired to be collected in the target area may be unknown. For example, it is assumed that an area in which six speakers and three microphones are arranged is divided into three areas (an area 1, an area 2, and an area 3), as illustrated in FIG. 6. When the sound source desired to be collected is included in the area 1, the area 1 is to be the target area.
The target sound may contain a reflected sound from a sound source outside the target area. For example, when the target area is the area 1, among sounds generated from sound sources included in the area 2 and the area 3, a sound coming to a microphone in the direction of the area 1 due to reflection may be contained in the target sound.
The target area may be an area within a predetermined distance from the microphone. In other words, the target area may be an area including a finite area. Furthermore, a plurality of target areas may be present. FIG. 7 is a diagram illustrating an example in which two target areas are present.
An area including a sound source generating a noise is also referred to as a noise area. In the example in FIG. 6, when a sound source generating a noise is included in each of the area 2 and the area 3, each of the area 2 and the area 3 is to be a noise area. Although each of the area 2 and the area 3 is a noise area in this example, an area including the area 2 and the area 3 may be a noise area. A noise area including a sound source generating an interference noise is particularly referred to as an interference noise area. The noise area is set so as to be different from the target area.
When a transfer characteristic from the m-th microphone to a target sound S0(ω, τ) is described as Am0(ω) and a transfer characteristic from the m-th microphone to a k-th interference noise is described as Amk(ω), the observation signal Xm(ω, τ) is modeled as below.
                                          X            m                    ⁡                      (                          ω              ,              τ                        )                          =                                                            A                                  m                  ,                  0                                            ⁡                              (                ω                )                                      ⁢                                          S                0                            ⁡                              (                                  ω                  ,                  τ                                )                                              +                                    ∑                              k                =                1                            K                        ⁢                                                  ⁢                                                            A                                      m                    ,                    k                                                  ⁡                                  (                  ω                  )                                            ⁢                                                S                  k                                ⁡                                  (                                      ω                    ,                    τ                                    )                                                              +                                    N              m                        ⁡                          (                              ω                ,                τ                            )                                                          (        1        )            
When the number of microphones is small, that is, M<K, for example, a framework in which a minimum variance distortionless response (MVDR) beamforming approach and a post-filter are combined is thought to be effective for suppressing noises (see Non-patent Literature 1, for example). FIG. 1 illustrates a processing flow of a post-filter type array. A filter coefficient w0(ω)=[W0,1(ω), . . . , W0,M(ω)]T that is designed for emphasis of a target sound is calculated as below.
                                          w            0                    ⁡                      (            ω            )                          =                                                            R                                  -                  1                                            ⁡                              (                ω                )                                      ⁢                                          h                0                            ⁡                              (                ω                )                                                                                        h                0                H                            ⁡                              (                ω                )                                      ⁢                                          R                                  -                  1                                            ⁡                              (                ω                )                                      ⁢                                          h                0                            ⁡                              (                ω                )                                                                        (        2        )            
With x being an optional vector or matrix, xT represents a transpose of x and xH represents a complex conjugate transpose of x. h0(ω)=[H0,1(ω), . . . , H0,M(ω)]T is an array manifold vector in the target sound direction. The array manifold vector is a transfer characteristic H0,m(ω) from the sound source to the microphone, the transfer characteristic H0,m(ω) represented by a vector h0(ω). The transfer characteristic H0,m(ω) from the sound source to the microphone includes a transfer characteristic with which only a direct sound that can be theoretically calculated from the sound source and the microphone position is assumed, a transfer characteristic actually measured, and a transfer characteristic estimated by calculator simulation such as a mirror method and a finite element method. When it is assumed that source signals are uncorrelated with each other, a spatial correlation matrix R(ω) can be modeled as below.
                              R          ⁡                      (            ω            )                          =                              ∑                          k              =              1                        K                    ⁢                                          ⁢                                                    h                k                            ⁡                              (                ω                )                                      ⁢                                          h                k                H                            ⁡                              (                ω                )                                                                        (        3        )            
hk(ω) here is an array manifold vector of the k-th interference noise. An output signal Y0(ω, τ) of beamforming is obtained with the formula below.Y0(ω,τ)=w0H(ω)x(ω,τ)  (4)
x(ω, τ)=[X1(ω, τ), . . . , XM(ω, τ)]T holds. To suppress a noise signal included in Y0(ω, τ), a post-filter G(ω, τ) is multiplied.Z(ω,τ)=G(ω,τ)Y0(ω,τ)  (5)
Finally, Z(ω, τ) is subjected to inverse fast Fourier transforming (IFFT), whereby the output signal is obtained.
Next, a post-filter designing method based on Non-patent Literature 2 will be described.
Non-patent Literature 2 proposes a method of designing a post-filter based on a power spectrum density (PSD) of each area estimated using multiple beamforming (see Non-patent Literature 2, for example). Hereinafter, this method is referred to as an LPSD method (local PSD-based post-filter design). FIG. 2 is used to describe the processing flow of the LPSD method.
When the post-filter is designed based on a Wiener method, G(ω, τ) is calculated as below.
                              G          ⁡                      (                          ω              ,              τ                        )                          =                                            ϕ              S                        ⁡                          (                              ω                ,                τ                            )                                                                          ϕ                S                            ⁡                              (                                  ω                  ,                  τ                                )                                      +                                          ϕ                N                            ⁡                              (                                  ω                  ,                  τ                                )                                                                        (        6        )            
φS(ω, τ) represents the power spectrum density of the target area and φN(ω, τ) represents the power spectrum density of the noise area. The power spectrum density of a certain area means the power spectrum density of a sound coming from that area. More specifically, the power spectrum density of a target area is the power spectrum density of a sound coming from the target area, for example, and the power spectrum density of a noise area is the power spectrum density of a sound coming from the noise area. Although there are various methods of estimating φS(ω, τ) and φN(ω, τ) from Xm(ω, τ), the LPSD method is used because it is assumed that the observation signal contains an interference noise.
With the LPSD method, it is assumed that the observation signal contains a target sound and an interference noise, which are sparse in the time-frequency domain. To analyze the power spectrum density of each area positioned in various directions, L+1 beamforming filters wu(ω) (u=0, 1, . . . , L) are designed. The relation among a sensitivity |Duk(ω)|2 in the direction of the k-th area of a filter wu(ω), the power |Yu(ω, τ)|2 of the u-th output signal, and the power spectrum density |Sk(ω, τ)|2 of each area can be modeled as below. For |Duk(ω)|2, |Duk(ω)|2=|wuH(ω)hk(ω)|2 holds, for example. As |Duk(ω)|2, a measured value may be used.
                                          [                                                                                                                                                    Y                        0                                                                                    2                                                                                                                                                                                        Y                        1                                                                                    2                                                                                                ⋮                                                                                                                                                                      Y                        L                                                                                    2                                                                        ]                                ︸                                          Φ                Y                            ⁡                              (                                  ω                  ,                  τ                                )                                                    =                                            [                                                                                                                                                                    D                                                      0                            ,                            0                                                                                                                      2                                                                                                                                                                            D                                                      0                            ,                            1                                                                                                                      2                                                                            …                                                                                                                                                        D                                                      0                            ,                            K                                                                                                                      2                                                                                                                                                                                                            D                                                      1                            ,                            0                                                                                                                      2                                                                                                                                                                            D                                                      1                            ,                            1                                                                                                                      2                                                                            …                                                                                                                                                        D                                                      1                            ,                            K                                                                                                                      2                                                                                                            ⋮                                                        ⋮                                                        ⋱                                                        ⋮                                                                                                                                                                                        D                                                      L                            ,                            0                                                                                                                      2                                                                                                                                                                            D                                                      L                            ,                            1                                                                                                                      2                                                                            …                                                                                                                                                        D                                                      L                            ,                            K                                                                                                                      2                                                                                  ]                                      ︸                              D                ⁡                                  (                  ω                  )                                                              ⁢                                    [                                                                                                                                                                    S                          0                                                                                            2                                                                                                                                                                                                            S                          1                                                                                            2                                                                                                            ⋮                                                                                                                                                                                        S                          K                                                                                            2                                                                                  ]                                      ︸                                                Φ                  S                                ⁡                                  (                                      ω                    ,                    τ                                    )                                                                                        (        7        )            
The index of each symbol is here omitted. More specifically, Yu=Yu(ω, τ), Duk=Duk(ω), and Su=Su(ω, τ) hold. Furthermore, φY(ω, τ)=[|Y0(ω, τ)|2, |Y1(ω, τ)|2, . . . , |YL(ω, τ)|2]T and φS(ω, τ)=[|S0(ω, τ)|2, |S1(ω, τ)|2, . . . , |SK(ω, τ)|2]T hold.
For example, the power spectrum density of each area is calculated by solving the inverse problem of formula (7).{circumflex over (Φ)}S(ω,τ)=D+(ω)ΦY(ω,τ)  (8)
With b being an optional matrix, b+ represents a pseudo inverse matrix calculation for b. A local PSD estimation unit 11 uses the observation signal Xm(ω, τ) (m=1, 2, . . . , M) as an input to output a local power spectrum density ^φS(ω, τ) defined by formula (8), for example. “^” indicates that the density is from estimation.
Local indicates an area. In the example in FIG. 6, each of the area 1, the area 2, and the area 3 is local. The local PSD estimation unit estimates the power spectrum density ^φS(ω, τ) of each area and outputs the estimated power spectrum density ^φS(ω, τ).
A target area/noise area PSD estimation unit 12 uses the local power spectrum density ^φS(ω, τ) estimated based on formula (8) for each frequency ω and frame τ as an input to calculate ^φS(ω, τ) and ^φN(ω, τ) which are defined by the formula below.
                                                        ϕ              ^                        S                    ⁡                      (                          ω              ,              τ                        )                          =                                                                                          S                  ^                                0                            ⁡                              (                                  ω                  ,                  τ                                )                                                          2                                    (        9        )                                                                    ϕ              ^                        N                    ⁡                      (                          ω              ,              τ                        )                          =                              ∑                          k              =              1                        K                    ⁢                                          ⁢                                                                                                        S                    ^                                    k                                ⁡                                  (                                      ω                    ,                    τ                                    )                                                                    2                                              (        10        )            
Finally, a Wiener gain calculation unit 13 uses ^φS(ω, τ) and ^φN(ω, τ) as an input to calculate the post-filter G(ω, τ) defined by formula (6) and outputs the calculated post-filter G(ω, τ). Specifically, the Wiener gain calculation unit 13 inputs ^φS(ω, τ) and ^φN(ω, τ) as φS(ω, τ) and φN(ω, τ) of formula (6) to calculate G(ω, τ) and outputs the calculated G(ω, τ).
Two main advantages of the LPSD method are described below. (i) In a power spectrum domain, the relation between an output of beamforming and each sound source is formulated, whereby flexibility of control surpassing the number of microphones can be achieved and noises thus can be effectively suppressed. (ii) By calculating in advance L beamforming filters wu(ω) (u=0, 1, . . . , L) and D(ω) of formula (7), the merit of (i) can be implemented with low-complexity.