A Beamformer (also called beamforming) is a widely-known conventional art of extracting a particular signal through use of multiple sensors and suppressing the other signals (for example see Non-patent literature 1). However, the beamformer requires information about the direction of a target signal and therefore has the drawback of being difficult to use in situations in which such information cannot be obtained (or cannot be estimated).
One newer art is Blind Signal Separation (BSS) (for example see Non-patent literature 2). BSS is advantageous in that it does not require the information that the beamformer requires and is expected to find application in various situations. Signal separation using the BSS will be descried below.
[Blind Signal Separation]
First, BSS is formulated. It is assumed here that all signals are sampled at a certain sampling frequency fs and are discretely represented. It is also assumed that N signals are mixed and observed by M sensors. In the following description, a situation is dealt with in which signals are attenuated and delayed with the distance from the signal sources to sensors and a distortion in the transmission channels can occur due to reflections of the signals by objects such as walls. Signals mixed in such a situation can be expressed, using the impulse responses hqk(r) from sources k to sensors q (where q is the sensor's number [q=1, . . . , M] and k is the source's number [k=1, . . . , N]), as a convolutive mixture
                    [Formula  1]                                                                                  x            q                    ⁡                      (            t            )                          =                              ∑                          k              =              1                        N                    ⁢                                    ∑                              r                =                0                            ∞                        ⁢                                                            h                  qk                                ⁡                                  (                  r                  )                                            ⁢                                                s                  k                                ⁡                                  (                                      t                    -                    r                                    )                                                                                        (        1        )            where t denotes the time of sampling, sk(t) denotes the source signal originated from a signal source at sample time t, xq(t) denotes the signal observed by a sensor q at the sampling time t, and r is a sweep variable.
Typical impulse response hqk(r) has a strong pulsing response after a time lapse and then attenuates with time. The purpose of blind signal separation is to obtain separated signals y1(t), . . . , yN(t), each corresponding to one of the source signals s1(t), . . . , SN(t), only from observed signals (hereinafter referred to as “mixed signals”) without the aid of information about the source signals s1(t), . . . , sN(t) and impulse responses h11(r), . . . , h1N(r), . . . , hM1(r), . . . , hMN(r).
[Frequency Domain]
A process of conventional BSS will be described below.
Operations for separation are performed in the frequency domain. Therefore, an L-point Short-Time discrete Fourier Transformation (STFT) is applied to the mixed signal xq(t) at a sensor q to obtain a time-series signal at each frequency.
                    [Formula  2]                                                                                  X            q                    ⁡                      (                          f              ,              τ                        )                          =                              ∑                          r              =                                                -                  L                                /                2                                                                    (                                  L                  /                  2                                )                            -              1                                ⁢                                                    x                q                            ⁡                              (                                  τ                  +                  r                                )                                      ⁢                          g              ⁡                              (                r                )                                      ⁢                          ⅇ                                                -                  j2π                                ⁢                                                                  ⁢                fr                                                                        (        2        )            Here, f is one of frequencies which are discretely sampled as f=0, fs/L, . . . , fs(L−1)/L (where fs is the sampling frequency), τ is discrete time, j is an imaginary unit, and g(r) is a window function. The window function may be a window that has the center of power at g(0), such as a Hanning window.
                    [                  Formula          ⁢                                          ⁢          3                ]                                                                      g          ⁡                      (            r            )                          =                              1            2                    ⁢                      (                          1              +                              cos                ⁢                                                                  ⁢                                                      2                    ⁢                    π                    ⁢                                                                                  ⁢                    r                                    L                                                      )                                                          In this case, Xq(f, τ) represents a frequency characteristic of the mixed signals xq(t) centered at time t=τ. It should be noted that Xq(f, τ) includes information about L samples and Xq(f, τ) does not need to be obtained for all τ. Therefore, Xq(f, τ) is obtained at τ with an appropriate interval.
By performing the processing in the frequency domain, the convolutive mixture in the time domain expressed by Equation (1) can be approximated as a simple mixture at each frequency as
                    [Formula  4]                                                                                  X            q                    ⁡                      (                          f              ,              τ                        )                          =                              ∑                          k              =              1                        N                    ⁢                                                    H                qk                            ⁡                              (                f                )                                      ⁢                                          S                k                            ⁡                              (                                  f                  ,                  τ                                )                                                                        (        3        )            Thus, operations for separation are simplified. Here, Hqk(f) is the frequency responses of a source signal k to a sensor q and Sk(f, τ) is obtained by applying a Short-Time Discrete Fourier Transformation to the source signal sk(t) according to an equation similar to Equation (2). With a vector notation, Equation (3) can be written as
                    [Formula  5]                                                                      X          ⁡                      (                          f              ,              τ                        )                          =                              ∑                          k              =              1                        N                    ⁢                                                    H                k                            ⁡                              (                f                )                                      ⁢                                          S                k                            ⁡                              (                                  f                  ,                  τ                                )                                                                        (        4        )            where, X(f, τ)=[X1(f, τ), . . . XM(f, τ)]T is a mixed-signal vector, Hk(f)=[H1k(f), . . . , HMK (f)]T is the vector consisting of frequency responses from the source k to sensors. Here, [*]T represents the transposed vector of [*].[Signal Separation using Independent Component Analysis]
One approach to the blind signal separation is signal separation using Independent Component Analysis (ICA). In the approach using ICA, a separation matrix W(f) of N rows and M columns and a separated signal vectorY(f,τ)=W(f)X(f,τ)  (5)are calculated solely from the mixed-signal vector X(f, τ). Here, the separation matrix W(f) is calculated such that the elements (separated signals) Y1(f, τ), . . . , YN(f, τ) of the separated signal vector Y(f, τ)=[Y1(f, τ), . . . , YN(f, τ)]T are independent of each other. For this calculation, an algorithm such as the one described in Non-patent literature 4 may be used.
In ICA, separation is made by exploiting the independence of signals. Accordingly, obtained separated signals Y1(f, τ), . . . , YN(f, τ) have ambiguity of the order. This is because the independence of signals is retained even if the order of the signals changes. The order ambiguity problem, known as a permutation problem, is an important problem in signal separation in the frequency domain. The permutation problem must be solved in such a manner that the suffix p of separated signals Yp(f, τ) corresponding to the same source signal Sk(f, τ) is the same at all frequencies f.
Examples of conventional approaches to solving the permutation problem include the one described in Non-patent literature 5. In that approach, information about the position of a signal source (the direction and the distance ratio) is estimated with respect to the positions of selected two sensors (sensor pair). The estimates at multiple sensor pairs are combined to obtain more detailed positional information. These estimates as positional information are clustered and the estimates that belong to the same cluster are considered as corresponding to the same source, thereby solving the permutation problem.
[Signal Separation Using Time-Frequency Masking]
Another approach to blind signal separation is a method using time-frequency masking. This approach is a signal separation and extraction method effective even if the relation between the number N of sources and the number M of sensors is such that M<N.
In this approach, the sparseness of signals is assumed. Signals are said to be “sparse” if they are null at most of discrete times τ. The sparseness of signals can be observed for example in speech signals in the frequency domain. The assumption of the sparseness and independence of signals makes it possible to assume that the probability that multiple coexisting signals are observed to overlap one another at a time-frequency point (f, τ) is low. Accordingly, it can be assumed that mixed signals at each time-frequency point (f, τ) at each sensor consists of only one signal sp(f, τ) that is active at that time-frequency point (f, τ). Therefore, mixed-signal vectors are clustered by an appropriate feature quantity, a time-frequency mask Mk(f, τ) to be used for extracting mixed signals X(f, τ) that correspond to the member time-frequencies (f, τ) of each cluster Ck, and each signal is separated and extracted according toYk(f,τ)=Mk(f,τ)XQ′(f,τ).Here, XQ′(f, τ) is one of the mixed signals and Q′ε{1, . . . , M}.
The feature quantity used for the clustering may be obtained, for example, as follows. The phase difference between the mixed signals at two sensors (a sensor q and a reference sensor Q (hereinafter Q is referred to as the reference value and the sensor that corresponds to the reference value Q is denoted as the reference sensor Q)) is calculated as
                    [Formula  6]                                                                      ϕ          ⁡                      (                          f              ,              τ                        )                          =                  ∠          ⁢                                                    X                q                            ⁡                              (                                  f                  ,                  τ                                )                                                                    X                Q                            ⁡                              (                                  f                  ,                  τ                                )                                                                        (        8        )            and, from the phase difference, Direction of Arrival (DOA)
                    [Formula  7]                                                                      θ          ⁡                      (                          f              ,              t                        )                          =                              cos                          -              1                                ⁢                                                    ϕ                ⁡                                  (                                      f                    ,                    τ                                    )                                            ·              c                                      2              ⁢                              π                ·                f                ·                d                                                                        (        9        )            can be calculated as the feature quantity used for the clustering (for example see Non-patent literature 3). Here, “d” is the distance between sensor q and reference sensor Q and “c” is the signal transmission speed. Also, the k-means method (for example see Non-patent literature 6) may be used for the clustering. The time-frequency mask Mk(f, τ) used may be generated by calculating the average θ1˜, θ2˜, . . . , θN˜ of the members of each cluster Ck and obtaining
                    [                  Formula          ⁢                                          ⁢          8                ]                                                                                  M            k                    ⁡                      (                          f              ,              τ                        )                          =                  {                                                                      1                                                                                                                    θ                        k                        ~                                            -                      Δ                                        ≤                                          θ                      ⁡                                              (                                                  f                          ,                          τ                                                )                                                              ≤                                                                  θ                        k                        ~                                            +                      Δ                                                                                                                    0                                                  otherwise                                                      ⁢                                                  ⁢                          (                                                k                  =                  1                                ,                …                ⁢                                                                  ,                N                            )                                                                      Here, Δ gives the range in which signals are extracted. In this method, as Δ is reduced, the separation and extraction performance increases but the nonlinear distortion increases; on the other hand, as Δ is increased, the nonlinear distortion decreases but the separation performance degrades.
Another feature quantity that can be used for the clustering may be the phase difference between the mixed signals at two sensors (sensor q and reference sensor Q) (Equation (8)) or the gain ratio between the two sensors
                    [                  Formula          ⁢                                          ⁢          9                ]                                                                      α          ⁡                      (                          f              ,              τ                        )                          =                                                                      X                q                            ⁡                              (                                  f                  ,                  τ                                )                                                                    X                Q                            ⁡                              (                                  f                  ,                  τ                                )                                                                                              Non-patent literature 1: B. D. Van Veen and K. M. Buckley, “Beamforming: a versatile approach to special filtering,” IEEE ASSP Magazine, pp. 4-24, April 1988Non-patent literature 2: S. Haykin, eds, “Unsupervised Adaptive Filtering,” John-Wiley & Sons, 2000, ISBN 0-471-29412-8Non-patent literature 3: S. Araki, S. Makino, A. Blin, R. Mukai, and H. Sawada, “Underdetermined blind separation for speech in real environments with sparseness and ICA,” in Proc. ICASSP 2004, vol. III, May 2004, pp. 881-884Non-patent literature 4: A. Hyvarinen and J. Karhunen and E. Oja, “Independent Component Analysis,” John Wiley & Sons, 2001, ISBN 0-471-40540Non-patent literature 5: R. Mukai, H. Sawada, S. Araki and S. Makino, “Frequency Domain Blind Source Separation using Small and Large Spacing Sensor Pairs,” in Proc. of ISCAS 2004, vol. V, pp. 1-4, May 2004Non-patent literature 6: R. O. Duda, P. E. Hart, an D. G Stork, Pattern Classification, Wiley Interscience, 2nd edition, 2000