1. Field of the Invention
This invention relates to a speech signal separation apparatus and method for separating a speech signal with which a plurality of signals are mixed are separated into the signals using independent component analysis (ICA).
2. Description of the Related Art
A technique of independent component analysis (ICA) of separating and reconstructing a plurality of original signals using only statistic independency from a signal in which the original signals are mixed linearly with unknown coefficients attracts notice in the field of signal processing. By applying the independent component analysis, a speech signal can be separated and reconstructed even in such a situation that, for example, a speaker and a microphone are located at places spaced from away from each other and the microphone picks up sound other than the speech of the speaker.
Here, it is investigated to separate a speech signal with which a plurality of signals are mixed into the individual signals using the independent component analysis in the time-frequency domain.
It is assumed that, as seen in FIG. 7, different sounds are emitted individually from N sound sources and are observed using n microphones. Sound (original signal) emitted from a sound source is subject to time delay, reflection and so forth before it reaches a microphone. Therefore, the signal (observation signal) xk(t) observed by the kth (1≦k≦n) microphone k is represented by an expression of summation of results of convolution arithmetic operation of an original signal and a transfer function for all sound sources as represented by the expression (1) given below. Further, where the observation signals of all microphones are represented by a single expression, it is given as the expression (2) specified as below. In the expressions (1) and (2), x(t) and s(t) are column vectors which include xk(t) and sk(t) as elements thereof, respectively, and A represents an n×N matrix which includes elements aij(t). It is to be noted that, in the following description, it is assumed that N=n.
                                          x            t                    ⁡                      (            t            )                          =                                            ∑                              j                =                1                            μ                        ⁢                                          ∑                                  i                  =                  0                                ∞                            ⁢                                                                    a                    tf                                    ⁡                                      (                    τ                    )                                                  ⁢                                                      s                    f                                    ⁡                                      (                                          t                      -                      τ                                        )                                                                                =                                    ∑                              j                =                1                            N                        ⁢                          {                                                a                  tf                                *                                                      ɛ                    t                                    ⁡                                      (                    t                    )                                                              }                                                          (        1        )                                                      x            ⁡                          (              t              )                                =                      A            *                          s              ⁡                              (                t                )                                                    ⁢                                  ⁢        where        ⁢                                  ⁢                              s            ⁡                          (              t              )                                =                      [                                                                                                      s                      1                                        ⁡                                          (                      t                      )                                                                                                                    ⋮                                                                                                                        s                      N                                        ⁡                                          (                      t                      )                                                                                            ]                          ⁢                                  ⁢                              x            ⁡                          (              t              )                                =                      [                                                                                                      x                      1                                        ⁡                                          (                      t                      )                                                                                                                    ⋮                                                                                                   ⁢                                          (                      t                      )                                                                                            ]                          ⁢                                  ⁢                              A            ⁡                          (              t              )                                =                      [                                                                                                      a                      11                                        ⁡                                          (                      t                      )                                                                                        ⋯                                                                                            a                                              1                        ⁢                                                                                                  ⁢                        N                                                              ⁡                                          (                      t                      )                                                                                                                    ⋮                                                  ⋰                                                  ⋮                                                                                                                      ⁢                                          (                      t                      )                                                                                        ⋯                                                                         ⁢                                          (                      t                      )                                                                                            ]                                              (        2        )            
In the independent component analysis in the time-frequency domain, not A and s(t) are estimated from x(t) of the expression (2) given above, but x(t) is converted into a signal in a time-frequency domain, and signals corresponding to A and s(t) are estimated from the signal in the time-frequency domain. In the following, a method of the estimation is described.
Where results of short-time Fourier transform of the signal vectors x(t) and s(t) through a window of the length L are presented by X(ω, t) and S(ω, t), respectively, and results of similar short-time Fourier transform of the matrix A(t) are represented by A(ω), the expression (2) in the time domain can be represented as the expression (3) in the time-frequency domain given below. It is to be noted that ω represents the number of frequency bins (1≦ω≦M), and t represents the frame number (1≦t≦T). In the independent component analysis in the time-frequency domain, S(ω, t) and A(ω) are estimated in the time-frequency domain.
                                          X            ⁡                          (                              ω                ,                t                            )                                =                                    A              ⁡                              (                ω                )                                      ⁢                          S              ⁡                              (                                  ω                  ,                  t                                )                                                    ⁢                                  ⁢        where        ⁢                                  ⁢                              X            ⁡                          (                              ω                ,                t                            )                                =                      [                                                                                                      X                      1                                        ⁡                                          (                                              ω                        ,                        t                                            )                                                                                                                    ⋮                                                                                                       ⁢                                          (                                              ω                        ,                        t                                            )                                                                                            ]                          ⁢                                  ⁢                              S            ⁡                          (                              ω                ,                t                            )                                =                      [                                                                                                      S                      1                                        ⁡                                          (                                              ω                        ,                        t                                            )                                                                                                                    ⋮                                                                                                    ⁢                                          (                                              ω                        ,                        t                                            )                                                                                            ]                                              (        3        )            
It is to be noted that the number of frequency bins originally is equal to the length L of the window, and the frequency bins individually represent frequency components where the range from −R/2 to R/2 is divided into L portions. Here, R is the sampling frequency. It is to be noted that a negative frequency component is a c conjugate complex number of a positive frequency component and can be represented by X(−ω)=conj(X(ω)) (conj(•) is a conjugate complex number). Therefore, in the present specification, only non-negative frequency components from 0 to R/2 (the number of frequency bins is L/2+1) are taken into consideration, and the numbers from 1 to M (M=L/2+1) are applied to the frequency components.
In order to estimate S(ω, t) and A(ω) in the time-frequency domain, for example, such an expression as the expression (4) given below is considered. In the expression (4), Y(ω, t) represents a column vector which includes results Yk(ω, t) of short-time Fourier transform of yk(t) through a window of the length L, and W(ω) represents an n×n matrix (separation matrix) whose elements are wij(ω).
                                          Y            ⁡                          (                              ω                ,                t                            )                                =                                    W              ⁡                              (                ω                )                                      ⁢                          X              ⁡                              (                                  ω                  ,                  t                                )                                                    ⁢                                  ⁢        where        ⁢                                  ⁢                              Y            ⁡                          (                              ω                ,                t                            )                                =                      [                                                                                                      Y                      1                                        ⁡                                          (                                              ω                        ,                        t                                            )                                                                                                                    ⋮                                                                                                   ⁢                                          (                                              ω                        ,                        t                                            )                                                                                            ]                          ⁢                                  ⁢                              W            ⁡                          (              ω              )                                =                      [                                                                                                      w                      11                                        ⁡                                          (                      ω                      )                                                                                        ⋯                                                                         ⁢                                          (                      ω                      )                                                                                                                    ⋮                                                  ⋰                                                  ⋮                                                                                                    ⁢                                          (                      ω                      )                                                                                        ⋯                                                                      ⁢                                          (                      ω                      )                                                                                            ]                                              (        4        )            
Then, W(ω) is determined with which Y1(ω, t) to Yn(ω, t) become statistically independent of each other (actually the independency is maximum) when t is varied while ω is fixed. As hereinafter described, since the independent component analysis in the time-frequency domain exhibits instability in permutation, a solution exists in addition to W(ω)=A(ω)−1. If Y1(ω, t) to Yn(ω, t) which are statistically independent of each other are obtained for all ω, then the separation signals y(t) in the time domain can be obtained by inverse Fourier transforming them.
An outline of conventional independent component analysis in the time-frequency domain is described with reference to FIG. 8. Original signals which are emitted from n sound sources and are independent of each other are represented by s1 to sn and a vector which includes the original signals s1 to sn as elements thereof is represented by s. An observation signal x observed by the microphones is obtained by applying the convolution and mixing arithmetic operation of the expression (2) given hereinabove to the original signal s. An example of the observation signal x where the number n of microphones is two, that is, where the number of channels is two, is illustrated in FIG. 9A. Then, short-time Fourier transform is applied to the observation signal x to obtain a signal X in the time-frequency domain. Where elements of the signal X are represented by Xk(ω, t), Xk(ω, t) assume complex number values. A chart which represents the absolute values |Xk(ω, t)| of Xk(ω, t) in the form of the intensity of the color is referred to as spectrogram. An example of the spectrogram is shown in FIG. 9B. In FIG. 9B, the axis of abscissa indicates t (frame number) and the axis of ordinate indicates ω (frequency bin number). Then, each frequency bin of the signal X is multiplied by W(ω) to obtain such separation signals Y as seen in FIG. 9C. Then, the separation signals Y are inverse Fourier transformed to obtain such separation signals y in the time domain as see in FIG. 9D.
It is to be noted that, in the following description, also Yk(ω, t) and Xk(ω, t) themselves which are signals in the independent component analysis are each represented as “spectrogram”.
Here, as the scale for representing the independency of a signal in the independent component analysis, a Kullback-Leibler information amount (Hereinafter referred to as “KL information amount”), a kurtosis and so forth are available. However, the KL information amount is used here as an example.
Attention is paid to a certain frequency bin as seen in FIG. 10. Where Yk(ω, t) when the frame number t thereof is varied within the range from 1 to T is represented by Yk(ω), the KL information amount I(Xk(ω) which is a scale representative of the independency of the separation signals X1(ω) to Yn(ω) is defined as represented by the expression (5) given below. In particular, the value obtained when the simultaneous entropy H(Yk(ω)) for each frequency bin (=ω) for all channels is subtracted from the sum total of the entropy H(Yk(ω)) for the frequency bins (=ω) for the individual channels is defined as KL information amount I(Y(ω)). A relationship between H(Yk(ω)) and H(Y(ω)) where n=2 is illustrated in FIG. 11. H(Yk(ω)) in the expression (5) is re-written into the first term of the expression (6) given below in accordance with the definition of entropy, and H(Y(ω)) is developed into the second and third terms of the expression (6) in accordance with the expression (4). In the expression (A) PYk(ω)(Yk(ω, t)) represents a probabilistic density function (PDF) of Yk(ω, t), and H(X(ω)) represents the simultaneous entropy of the observation signal X(ω).
                              I          ⁡                      (                          Y              ⁡                              (                ω                )                                      )                          =                                            ∑                                           =                                           ⁢                                                  ⁢                          H              ⁡                              (                                                      Y                    k                                    ⁡                                      (                    ω                    )                                                  )                                              -                      H            ⁡                          (                              Y                ⁡                                  (                  ω                  )                                            )                                                          (        5        )                                                          ⁢                              =                                                            ∑                                             =                                                                   ⁢                                                                  ⁢                                                      E                    k                                    ⁡                                      [                                                                  -                        log                                            ⁢                                                                                          ⁢                             ⁢                                              (                                                                              Y                            k                                                    ⁡                                                      (                                                          ω                              ,                                                              )                                                                          )                                                              ]                                                              -                              log                ⁢                                                                        det                    ⁡                                          (                                              W                        ⁡                                                  (                          ω                          )                                                                    )                                                                                                    -                              H                ⁡                                  (                                      X                    ⁡                                          (                      ω                      )                                                        )                                                              ⁢                                          ⁢          where          ⁢                                          ⁢                                                    Y                k                            ⁡                              (                ω                )                                      =                          [                                                                    Y                    k                                    ⁡                                      (                                          ω                      ,                      1                                        )                                                  ⁢                                                                  ⁢                ⋯                ⁢                                                                  ⁢                                                      Y                    k                                    ⁡                                      (                                          ω                      ,                      T                                        )                                                              ]                                ⁢                                          ⁢                                    Y              ⁡                              (                ω                )                                      =                          [                                                                                                                  Y                        l                                            ⁡                                              (                        ω                        )                                                                                                                                  ⋮                                                                                                                                      Y                        n                                            ⁡                                              (                        ω                        )                                                                                                        ]                                ⁢                                          ⁢                                    X              ⁡                              (                ω                )                                      =                          [                                                X                  ⁡                                      (                                          ω                      ,                      1                                        )                                                  ⁢                                                                  ⁢                ⋯                ⁢                                                                  ⁢                                  X                  ⁡                                      (                                          ω                      ,                      T                                        )                                                              ]                                                          (        6        )            
Since the KL information amount I(Y(ω)) exhibits a minimum value (ideally zero) where Y1(ω) to Yn(ω) are independent of each other, the separation process determines a separation matrix W(ω) with which the KL information amount I(Y(ω)) is minimized.
The most basic algorithm for determining the separation matrix W(ω) is to update a separation matrix based on a natural gradient method as recognized from the expressions (7) and (8) given below. Details of the deriving process of the expressions (7) and (8) are described in Noboru MURATA, “Introduction to the independent component analysis”, Tokyo Denki University Press (hereinafter referred to as Non-Patent Document 1), particularly in “3.3.1 Basic Gradient Method”.
                              Δ          ⁢                                          ⁢                      W            ⁡                          (              ω              )                                      =                                                                        I                n                            +                                ⁢                                                                                              φ                      ⁡                                              (                                                  Y                          ⁡                                                      (                                                          ω                              ,                              t                                                        )                                                                          )                                                              ⁢                                                                  Y                        ⁡                                                  (                                                      ω                            ,                            t                                                    )                                                                    H                                                                                                                                  ⁢                      W            ⁡                          (              ω              )                                                          (        7        )                                                      W            ⁡                          (              ω              )                                ←                                    W              ⁡                              (                ω                )                                      +                                          η                ·                Δ                            ⁢                                                          ⁢                              W                ⁡                                  (                  ω                  )                                                                    ⁢                                  ⁢        where                            (        8        )                                                      Y            ⁡                          (                              ω                ,                t                            )                                =                                    W              ⁡                              (                ω                )                                      ⁢                          X              ⁡                              (                                  ω                  ,                  t                                )                                                    ⁢                                  ⁢                  ϕ          ⁡                      (                          Y              ⁡                              (                                  ω                  ,                  t                                )                                      )                          =                                            [                                                                                         ⁢                                              (                                                                              Y                            1                                                    ⁡                                                      (                                                          ω                              ,                              t                                                        )                                                                          )                                                                                                                                  ⋮                                                                                                              ⁢                                              (                                                                              Y                            n                                                    ⁡                                                      (                                                          ω                              ,                              t                                                        )                                                                          )                                                                                                        ]                        ⁢                                                  ⁢                         ⁢                              (                                                      Y                    k                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                  )                                              =                                    ∂                              ∂                                                      Y                    k                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                                        ⁢            log            ⁢                                                  ⁢                                          P                                  Y                  ⁢                                                                          ⁢                                      k                    ⁡                                          (                      ω                      )                                                                                  ⁡                              (                                                      Y                    k                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                  )                                                                        (        9        )            
In the expression (7) above, In represents an n×n unit matrix, and Et[•] represents an average in the frame direction. Further, the superscript “H” represents an Hermitian inversion (a vector is inverted and elements thereof are replaced by a conjugate complex number). Further, the function φ is differentiation of a logarithm of a probability density function and is called score function (or “activation function”). Further, η in the expression (6) above represents a learning function which has a very low positive value.
It is to be noted that it is known that the probability density function used in the expression (7) above need not necessarily truly reflect the distribution of Yk(ω, t) but may be fixed. Examples of the probability density function are indicated by the following expressions (10) and (12), and the score functions in this instance are indicated by the following expressions (11) and (13), respectively.
                             ⁢                      (                                          Y                k                            ⁡                              (                                  ω                  ,                  t                                )                                      )                          =                  1                      cos            ⁢                                                  ⁢                          h              ⁡                              (                                                                                              Y                      k                                        ⁡                                          (                                              ω                        ,                        t                                            )                                                                                        )                                                                        (        10        )                                                      ϕ            k                    ⁡                      (                                          Y                k                            ⁡                              (                                  ω                  ,                  t                                )                                      )                          =                              -            tan                    ⁢                                          ⁢                      h            ⁡                          (                                                                                    Y                    k                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                                              )                                ⁢                                                    Y                k                            ⁡                              (                                  ω                  ,                  t                                )                                                                                                    Y                  k                                ⁡                                  (                                      ω                    ,                    t                                    )                                                                                                      (        11        )                                           ⁢                      (                                          Y                k                            ⁡                              (                                  ω                  ,                  t                                )                                      )                          =                  exp          ⁡                      (                          -                                                                                    Y                    k                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                                                        )                                              (        12        )                                                      ϕ            k                    ⁡                      (                                          Y                k                            ⁡                              (                                  ω                  ,                  t                                )                                      )                          =                  -                                                    Y                k                            ⁡                              (                                  ω                  ,                  t                                )                                                                                                    Y                  k                                ⁡                                  (                                      ω                    ,                    t                                    )                                                                                                      (        13        )            
According to the natural gradient method, a modification value ΔW(ω) of the separation matrix W(ω) in accordance with the expression (7) given hereinabove, and then W(ω) is updated in accordance with the expression (8) given above, whereafter the updated separation matrix W(ω) is used to produce a separation signal in accordance with the expression (9). If the loop processes of the expressions (7) to (9) are repeated many times, then the elements of W(ω) finally converge to certain values, which make estimated values of the separation matrix. Then, a result when a separation process is performed using the separation matrix makes a final separation signal.
However, such a simple natural gradient method as described above has a problem that the number of times of execution of the loop processes until W(ω) converges is great. Therefore, in order to reduce the number of times of execution of the loop processes, a method has been proposed wherein a pre-process (hereinafter described) called non-correlating is applied to an observation signal, and a separation matrix is searched out from within an orthogonal matrix. The orthogonal matrix is a square matrix which satisfies a condition defined by the expression (14) given below. If the orthogonality restriction (condition for satisfying that, when W(ω) is an orthogonal matrix, also W(ω)+η·ΔW(ω) becomes an orthogonal matrix) is applied to the expression (7) given hereinabove, then the expression (15) given below is obtained. Details of the process of derivation of the expression (15) are disclosed in Non-Patent Document 1, particularly in “3.3.2 Gradient method restricted to an orthogonal matrix”.
                                          W            ⁡                          (              ω              )                                ⁢                                    W              ⁡                              (                ω                )                                      H                          =                  I          n                                    (        14        )                                          Δ          ⁢                                          ⁢                      W            ⁡                          (              ω              )                                      =                                            E              t                        ⁡                          [                                                                    ϕ                    ⁡                                          (                                              Y                        ⁡                                                  (                                                      ω                            ,                            t                                                    )                                                                    )                                                        ⁢                                                            Y                      ⁡                                              (                                                  ω                          ,                          t                                                )                                                              H                                                  -                                                      Y                    ⁡                                          (                                              ω                        ,                        t                                            )                                                        ⁢                                                            ϕ                      ⁡                                              (                                                  Y                          ⁡                                                      (                                                          ω                              ,                              t                                                        )                                                                          )                                                              H                                                              ]                                ⁢                      W            ⁡                          (              ω              )                                                          (        15        )            
In the gradient method with an orthogonality restriction, a modification value ΔW(ω) of the separation matrix W(ω) is determined in accordance with the expression (15) above, and W(ω) is updated in accordance with the expression (8). If the loop processes of the expressions (15), (8) and (9) are repeated many times, then the elements of W(ω) finally converge to certain values, which make estimated values of the separation matrix. Then, a result when a separation process is performed using the separation matrix makes a final separation signal. In the method in which the expression (15) given above is used, since it involves the orthogonality restriction, the converge is reached by a number of times of execution of the loop processes smaller than that where the expression (7) given hereinabove is used.