Hitherto, blind source separation (BSS) has been known as a technique for separating and extracting the original source signals from a mixed signal consisting of a mixture of a plurality of source signals (e.g., audio signals), without using any prior knowledge about the source signals or the mixing process. FIG. 27A shows a block diagram that illustrates the concept of this blind source separation technique.
As this figure shows, a plurality of (in this case, N) signal sources 701 emit source signals si (i=1, . . . , N) which are mixed together and observed with a plurality of(in this case, M) sensors 702, and under these conditions the separated signals yk (k=1, . . . , N) estimated to correspond to the source signals are extracted from these observed signals xj (j=1, . . . , M). Here, the process that takes place between the source signals si emitted from signals sources 701 and the observations by sensors 702 is referred to as the “mixing process”, and the process whereby the separated signals are extracted from the observations of sensors 702 is called the “separation process”.
To start with, the observed signals and the separation problem are formularized as follows.
A Model of Mixed Signals (Observed Signals) in Real Environments
First, the mixing process is modeled as follows.
Here, N is the number of signal sources 701, M is the number of sensors 702, si is the signal (source signal) emitted from the i-th signal source 701 (signal source i), and hji is the impulse response from signal source i to the j-th sensor 702 (sensor j). The signal xj observed at sensor j is modeled by the convolutive mixtures of these source signals si and impulse responses hji as follows:
                              FORMULA          ⁢                                          ⁢          1                ⁢                                  ⁢                                            x              j                        ⁡                          (              t              )                                =                                    ∑                              i                =                1                            N                        ⁢                                                  ⁢                                          ∑                                  p                  =                  1                                P                            ⁢                                                          ⁢                                                                    h                    ji                                    ⁡                                      (                    p                    )                                                  ⁢                                                      s                    i                                    ⁡                                      (                                          t                      -                      p                      +                      1                                        )                                                                                                          (        1        )            Here, the term “convolution” means that the signals are added together after being delayed and being multiplied by specific coefficients in the signal propagation process. It is assumed that all the signals are sampled at a certain sampling frequency and represented by discrete values. In Formula (1), P represents the length of the impulse response, t represents the sampling time, and p represents a sweep variable (“sweep” being an operation whereby different coefficients are applied to each sample value of a time-shifted signal). The N signal sources 701 are assumed to be statistically mutually independent, and each signal is assumed to be sufficiently sparse. Here, “sparse” means that the signal has a value of zero at most time t—sparsity is exhibited by speech signals, for example.
The aim of BSS is to obtain separated signals yk from the observed signals xj by estimating a separation system (W) 703 without any prior knowledge of the source signals si or impulse responses hji.
Since convolutional mixing problems are complicated to address and the assumption of sparsity holds better in the time-frequency domain, an effective way of addressing the above problem involves first applying a short-time discrete Fourier transform (DFT) to the abovementioned Formula (1) to transform the signal into the time-frequency domain. In the time-frequency domain, the abovementioned Formula (1) becomesX(f,m)=H(f)S(f,m)where f is the frequency, and m represents the timing of the DFT frames. H(f) is an (M×N) matrix whose ji element is the frequency response Hji(f) from signal source i to sensor j, and is referred to as the mixing matrix. Also, S(f,m)=[S1(f,m), . . . , SN(f,m)]T and X(f,m)=[X1(f,m), . . . , XM(f,m)] are the DFT results obtained for the source signals and observed signals, respectively. Here, the notation [α]T denotes the transposed matrix of α. Furthermore, S(f,m) and X(f,m) are vectors.
Hereafter, explanations are given in the time-frequency domain.
Model of the Separation Process
The separation process is modeled as follows.
First, let W(f,m) be an (N×M) matrix whose jk element is the frequency response Wjk(f,m) from the observed signal at sensor j to the separated signal yk. This matrix W(f,m) is called the separation matrix. Using the separation matrix, the separated signals can be obtained in the time-frequency domain as follows:Y(f,m)=W(f,m)X(f,m)Here, Y(f,m)=[Y1(f,m), . . . , YN(f,m)]T represents the separated signals in the time-frequency domain, and subjecting this to a short-time inverse discrete Fourier transform (IDFT) yields the separated signals yk—i.e., the results of estimating the source signals. Note that the separated signals yk are not necessarily ordered in the same way as the source signals si. That is, it is not necessarily the case that k=i. Also, Y(f,m) is a vector.
Estimating the Separation Matrix W(F,M)
In BSS, the separation matrix W(f,m) is estimated by using solely the observed signals.
Known conventional methods for estimating the separated signals Y(f,m) include: (a) methods based on independent component analysis, (b) methods that utilize the sparsity of the signals, and (c) methods in which the mixing matrix is estimated based on the signal sparsity. These methods are discussed in turn below.
Conventional Method 1: Independent Component Analysis
Independent component analysis (ICA) is a technique in which signals that have been combined by linear mixing as in Formula (1) above are separated based on the statistical independence of the signals. FIG. 27B shows a block diagram of an ICA separation process for the case where N=M=2. In the time-frequency domain ICA, we perform successive learning with the learning rule W(f)=W(f)+ΔW(f) to find a separation matrix W(f,m) at each frequency so that each element of the output signal Y(f,m) becomes mutually independent. Here, the estimation unit 705 of the ICA separation matrix might determine ΔW(f) by the following rule, for example:ΔW=μ[I−<φ(Y(f,m))Y(f,m)H>]  (2)Here, the notation [α]H denotes the conjugate transpose of α. Also, I represents a unit matrix, < . . . > represents time averaging, φ represents a nonlinear function, and μ represents the update coefficient. Separation systems obtained by ICA are time-invariant linear systems. Various forms of the ICA algorithm have been introduced, including the one mentioned in Non-Patent Reference 1.
In ICA, since separation is performed by concentrating on the independence of the signals, the matrix Y′(f,m)=[Y1′(f,m), . . . , YN′(f,m)]T obtained from the relationship Y′(f,m)=W(f,m)X(f,m) using this separation matrix W(f,m) is indeterminate with respect to the ordering and scaling of the separated signals. This is because independence between the separated signals is preserved even when the ordering and scaling of the signals change.
The process of resolving this indeterminacy of ordering is referred to as permutation resolution, and results in a separated signal Yi(f,m) where the separated signal components corresponding to the same source signal si have the same subscript i at all frequencies. Methods for achieving this include a method in which the estimated arrival directions of signals obtained using the inverse matrix of the separation matrix (the Moore-Penrose pseudo-inverse matrix for cases where N≠M) are verified, and the rows of the separation matrix W(f,m) are replaced so that the estimated arrival direction corresponding to the i-th separated signal becomes the same at each frequency, and a method in which the rows of the separation matrix W(f,m) are replaced so as to maximize the correlation between the absolute values |Yi(f,m)| of the i-th separated signal between different frequencies. In this example, a permutation/scaling solving unit 706 resolves these permutations while feeding back the separated signals Yi(f,m).
The process of resolving the indeterminacy of magnitude is referred to as scaling resolution. Permutation/scaling solving unit 706 performs this scaling resolution by, for example, calculating the inverse matrix (the Moore-Penrose pseudo-inverse matrix for cases where N≠M).W−1(f,m) of the separation matrix W(f,m) obtained after permutation resolution, and then scaling each row wi(f,m) of the separation matrix W(f,m) as follows:wi(f,m)←[W−1(f,m)]jiwi(f,m)
The separated signals at each frequency can then be obtained from Y(f,m)=W(f,m)X(f,m) by using the separation matrix W(f,m) in which the indeterminacy of ordering and magnitude have been resolved.
With regard to the abovementioned learning rule, it is possible to use a function likeφ(Y)=φ(|Y|)·exp(j·∠(Y))φ(x)=sign(x)as the nonlinear function in Formula (2). Also, as mentioned above, it is possible to use any permutation resolution method such as the signal arrival direction estimation method or the method that utilizes the similarity in the frequency components of the separated signals, or a combination of such methods, details of which can be found in Patent Reference 1 and Non-Patent Reference 2. Furthermore, a requirement of ICA is that the number of signal sources N and the number of sensors M obey the relationship M≧N.
Conventional Method 2: The Sparsity Method
In cases where the number of signal sources N and the number of sensors M obey the relationship M<N, separation can be achieved by methods based on the signal sparsity (e.g., Non-Patent Reference 3).
By assuming the signals to be sparse and mutually independent, even when a plurality of signals are present at the same time, it can be assumed that in the sample levels there is a low probability of observing overlapping signals at the same timing. That is, it can be assumed that there is no more than one signal contained in the observed signal at any one time. Accordingly, the signals can be separated by using a separation system W(f,m) consisting of a function that uses some method to estimate which signal source emitted the signal observed at each timing and only extracts signals at this timing (binary mask). This is the sparsity method.
FIG. 28 (conventional method 2) shows a block diagram to illustrate this sparsity method.
The following method is generally used to estimate the signal source at each timing. If each signal source is assumed to be spatially separate, then between the signals observed by the plurality of sensors there will exist phase difference and amplitude ratios determined by the relative positions of the signal sources and sensors. From the assumption that there is at most one signal contained in the observed signal at each timing, the phase differences and amplitude ratios of the observed signal at this timing correspond to the phase difference and amplitude ratio of the one signal contained in the observed signal at this timing. Accordingly, the phase differences and amplitude ratios of the observed signal in each sample can be subjected to a clustering process, and we can estimate each source signal by reconstituting the signals belonging to each cluster.
This is described in more specific detail as follows. First, observed signal relative value calculation unit 751 calculates the phase differences and/or amplitude ratios between the observed signals X(f,m) to obtain relative values z(f,m) as follows:
      FORMULA    ⁢                  ⁢    2              Phase      ⁢                          ⁢      difference      ⁢                          ⁢                        z          1                ⁡                  (                      f            ,            m                    )                      =          ∠      ⁢                                    X            i                    ⁡                      (                          f              ,              m                        )                                                X            j                    ⁡                      (                          f              ,              m                        )                              ⁢              (                  i          ≠          j                )                        Amplitude      ⁢                          ⁢      ratio      ⁢                          ⁢                        z          2                ⁡                  (                      f            ,            m                    )                      =                                                                                  X                i                            ⁡                              (                                  f                  ,                  m                                )                                                                                                            X              j                        ⁡                          (                              f                ,                m                            )                                                    ⁢                        (                      i            ≠            j                    )                .            Alternatively, instead of using the phase difference itself, it is also possible to use the signal arrival directions derived from the phase differences as relative values z(f,m).
Next, the distribution of the relative values z(f,m) is checked and clustered into N clusters by clustering unit 752. An example of such a distribution is shown in FIG. 29. In this example, a mixed signal comprising three signals (N=3) is observed by sensor 1 (j=1) and sensor 2 (j=2) FIG. 29A shows the distribution obtained using the phase difference alone, and FIG. 2B shows the distribution obtained using both the phase difference and the amplitude ratio. As this figure shows, sparsity allows these distributions to be classified into N=3 clusters 801-803 or 811-813.
Next, the representative values (peak, mean, median, etc.) of these N clusters are obtained in representative value calculation unit 753. In the following discussion, for the sake of convenience, these are numbered a1, a2, . . . , aN in ascending order (in FIG. 29 they are numbered a1, a2 and a3).
Next, in binary mask preparation unit 754, a binary mask Mk(f,m) is prepared as follows:
                              FORMULA          ⁢                                          ⁢          3                ⁢                                  ⁢                              M            k                    ⁡                      (                          f              ,              m                        )                          =                  {                                                                      1                                                                                                                    a                        k                                            -                      ɛ                                        ≤                                          z                      ⁡                                              (                                                  f                          ,                          m                                                )                                                              ≤                                                                  a                        k                                            +                      ɛ                                                                                                                    0                                                  otherwise                                                      ⁢                          (                                                k                  =                  1                                ,                …                ⁢                                                                  ,                N                            )                                                          (        3        )            Here, ε is a parameter that determines the width of the binary mask. Next, in signal extraction unit 755, the k-th separated signal is obtained by performing the calculation Yk(f,m)=Mk(f,m)Xj(f,m), where j is an arbitrary sensor number.
That is, the method based on sparsity described in this example results on a nonlinear system with a time-varying separation matrix W(f,m):Wjk(f,m)=Mk(f,m) for j∈{1, . . . , M}Wkl(f,m)=0 for l≠j (l=1, . . . , M)
Conventional Method 3: Estimating the Mixing Matrix Based on Sparsity
In this method, as a signal separation technique for cases where the number of signal sources N and the number of sensors M obey the relationship M=N, the sparsity of the signals is used to estimate the mixing matrix H(f), and the inverse matrix thereof is used to separate the signals (see, e.g., Non-Patent Reference 4 and Non-Patent Reference 5).
FIG. 28 (conventional method 3) shows a block diagram illustrating this method for estimating the mixing matrix based on sparsity.
The mixed signal X(f,m) is expressed in terms of the mixing matrix H(f) as follows:
                              FORMULA          ⁢                                          ⁢                      4            ⁢                                                  [                                                                                                      X                      1                                        ⁡                                          (                                              f                        ,                        m                                            )                                                                                                                                                              X                      2                                        ⁡                                          (                                              f                        ,                        m                                            )                                                                                                                    ⋮                                                                                                                        X                      N                                        ⁡                                          (                                              f                        ,                        m                                            )                                                                                            ]                          =                              [                                                                                                      H                      11                                        ⁡                                          (                      f                      )                                                                                        …                                                                                            H                                              1                        ⁢                        N                                                              ⁡                                          (                      f                      )                                                                                                                                                              H                      21                                        ⁡                                          (                      f                      )                                                                                        …                                                                                            H                                              2                        ⁢                        N                                                              ⁡                                          (                      f                      )                                                                                                                    ⋮                                                  ⋰                                                  ⋮                                                                                                                        H                      N1                                        ⁡                                          (                      f                      )                                                                                        …                                                                                            H                      NN                                        ⁡                                          (                      f                      )                                                                                            ]                    ⁡                      [                                                                                                      S                      1                                        ⁡                                          (                                              f                        ,                        m                                            )                                                                                                                                                              S                      2                                        ⁡                                          (                                              f                        ,                        m                                            )                                                                                                                    ⋮                                                                                                                        S                      N                                        ⁡                                          (                                              f                        ,                        m                                            )                                                                                            ]                                              (        4        )                                                          ⁢                  =                                          ⁢                                                                 [                                                                            1                                                              …                                                              1                                                                                                                                                                                H                            21                                                    ⁡                                                      (                            f                            )                                                                          /                                                                              H                            11                                                    ⁡                                                      (                            f                            )                                                                                                                                      …                                                                                                                                            H                                                          2                              ⁢                              N                                                                                ⁡                                                      (                            f                            )                                                                          /                                                                              H                                                          1                              ⁢                              N                                                                                ⁡                                                      (                            f                            )                                                                                                                                                                          ⋮                                                              ⋰                                                              ⋮                                                                                                                                                                                H                            N1                                                    ⁡                                                      (                            f                            )                                                                          /                                                                              H                            11                                                    ⁡                                                      (                            f                            )                                                                                                                                      …                                                                                                                                            H                            NN                                                    ⁡                                                      (                            f                            )                                                                          /                                                                              H                                                          1                              ⁢                              N                                                                                ⁡                                                      (                            f                            )                                                                                                                                              ]                            ⁢                                                                 [                                                                                                                                                                        H                              11                                                        ⁡                                                          (                              f                              )                                                                                ⁢                                                                                    S                              1                                                        ⁡                                                          (                                                              f                                ,                                m                                                            )                                                                                                                                                                                                                                                                              H                              12                                                        ⁡                                                          (                              f                              )                                                                                ⁢                                                                                    S                              2                                                        ⁡                                                          (                                                              f                                ,                                m                                                            )                                                                                                                                                                                          ⋮                                                                                                                                                                                                H                                                              1                                ⁢                                N                                                                                      ⁡                                                          (                              f                              )                                                                                ⁢                                                                                    S                              N                                                        ⁡                                                          (                                                              f                                ,                                m                                                            )                                                                                                                                                            ]                                ⁢                                                                                                                          (        5        )                                                          ⁢                  ≡                                                    H                ^                            ⁡                              (                f                )                                      ⁢                                          S                ^                            ⁡                              (                                  f                  ,                  m                                )                                                                        (        6        )            Thus, if Ĥ(f) can be estimated, then the separated signals Y(f,m) can be estimated fromY(f,m)=Ŝ(f,m)=Ĥ(f)−1X(f,m)  (7)This procedure for obtaining the separated signals Y(f,m) from the estimated Ĥ(f) is described below. In the following, the notation α^ is equivalent to the notation {circumflex over (α)}.
First, signals at timings where only one signal is present are 10 obtained by applying the same procedure as in [Conventional method 2] in observed signal relative value calculation unit 751, clustering unit 752, representative value calculation unit 753, binary mask preparation unit 754 and signal extraction unit 755:{circumflex over (X)}(f,m)=Mk(f,m)X(f,m)  Formula 5Here, binary masks Mk(f,m) are applied to the observed signals X(f,m)=[X1(f,m), . . . , XM(f,m)]T of all the sensors. At this time, the timing mi at which only source signal Si(f,m) is active, for example, can be expressed as follows:Formula 6{circumflex over (X)}j(f,mi)=Mi(f,mj)Xj(f,mi)≈Hji(f)Si(f,mi)  (8)
The separated signals X^j(f,mi) obtained in this way are sent to mixing process calculation unit 756, where H^(f) is estimated by performing the following calculation:
                                                                           H                ^                            ji                        ⁡                          (              f              )                                =                                    E              ⁡                              [                                                                                                    M                        k                                            ⁡                                              (                                                  f                          ,                                                      m                            i                                                                          )                                                              ⁢                                                                  X                        j                                            ⁡                                              (                                                  f                          ,                                                      m                            i                                                                          )                                                                                                                                                M                        k                                            ⁡                                              (                                                  f                          ,                                                      m                            i                                                                          )                                                              ⁢                                                                  X                        1                                            ⁡                                              (                                                  f                          ,                                                      m                            i                                                                          )                                                                                            ]                                      ⁢                                                  ⁢                                                  =                                          E                ⁡                                  [                                                                                                              X                          ^                                                j                                            ⁡                                              (                                                  f                          ,                                                      m                            i                                                                          )                                                                                                                                      X                          1                                                ^                                            ⁡                                              (                                                  f                          ,                                                      m                            i                                                                          )                                                                              ]                                            =                                                E                  ⁡                                      [                                                                                                                        H                            ji                                                    ⁡                                                      (                            f                            )                                                                          ⁢                                                                              S                            i                                                    ⁡                                                      (                                                          f                              ,                                                              m                                i                                                                                      )                                                                                                                                                                            H                                                          1                              ⁢                              i                                                                                ⁡                                                      (                            f                            )                                                                          ⁢                                                                              S                            i                                                    ⁡                                                      (                                                          f                              ,                                                              m                                i                                                                                      )                                                                                                                ]                                                  =                                  E                  ⁡                                      [                                                                                            H                          ji                                                ⁡                                                  (                          f                          )                                                                                                                      H                                                      1                            ⁢                            i                                                                          ⁡                                                  (                          f                          )                                                                                      ]                                                                                                            (          9          )                    where E[ . . . ] denotes averaging over mi. The matrix H^(f) obtained in this way is sent to inverse matrix calculation unit 757, where its inverse matrix H^(f)−1 is obtained. Then, in signal separation unit 758, the calculation shown in Formula (7) above provides estimate the separated signals Y(f,m).
Note that since this procedure uses the inverse matrix of H^(f), it can only be applied in cases where the number of signal sources N and the number of sensors M obey the relationship M=N.    [Patent Reference 1] Japanese Unexamined Patent Publication No. 2004-145172    [Non-Patent Reference 1] A. Hyvaerinen, J. Karhunen and E. Oja, “Independent Component Analysis,” John Wiley & Sons, 2001, ISBN 0-471-40540    [Non-Patent Reference 2] H. Sawada, R. Mukai, S. Araki and S. Makino, “A Robust and Precise Method for Solving the Permutation Problem of Frequency-Domain Blind Source Separation,”, in Proc. the 4th International Symposium on Independent Component Analysis and Blind Signal Separation (ICA 2003), 2003, pp. 505-510    [Non-Patent Reference 3] S. Rickard, R. Balan, and J. Rosca, “Real-Time Time-Frequency Based Blind Source Separation,” 3rd International Conference on Independent Component Analysis and Blind Source Separation (ICA2001), San Diego, December, 2001, pp. 651-656    [Non-Patent Reference 4] F. Abrard, Y. Deville, P. White, “From blind source separation to blind source cancellation in the underdetermined case: a new approach based on time-frequency analysis,” Proceedings of the 3rd International Conference on Independent Component Analysis and Signal Separation (ICA2001), pp. 734-739, San Diego, Calif., December 2001    [Non-Patent Reference 5] Y. Deville, “Temporal and time-frequency correlation-based blind source separation methods,” in Proc., ICASSP2003, April 2003, pp. 1059-1064