1. Field of the Invention
This invention relates to an apparatus and a method for separating the component signals of an audio signal, which is a mixture of a plurality of component signals, by means of independent component analysis (ICA).
2. Description of the Related Art
The technique of independent component analysis (ICA) for separating and restoring a plurality of original signals that are linearly mixed by means of unknown coefficients, using only statistic independence, has been attracting attention in the field of signal processing. Then, it is possible to separate and restore an audio signal in a situation where a speaker and microphone are separated from each other and the microphone picks up sounds other than the voice of the speaker by applying the technique of independent composite analysis.
Now, how the component signals of an audio signal that is a mixture of a plurality of component signals are separated and restored by means of independent component analysis in the time-frequency domain will be discussed below.
Assume a situation where N different sounds are emitted from N audio sources and are observed by n microphones as illustrated in FIG. 1 of the accompanying drawings. Since the sounds (original signals) emitted from the audio sources undergo time lags and reflections before they get to the microphones, the signal (observation signal) Xk(t) observed at the k-th microphone (1≦k≦n) is expressed by formula (1) shown below for the total sum of convoluted operations of original signals and transfer functions. Then, the observation signals of all the microphones are expressed by a single formula (2) shown blow. Note that, in the formulas (1) and (2), x(t) and s(t) respectively represent column vectors having respective elements of xk(t) and sk(t) and A represents a matrix of n rows and N columns having elements of aij(t). Also note that N=n is assumed in the following description.
                    [                  FORMULA          ⁢                                          ⁢          1                ]                                                                                  x            k                    ⁡                      (            t            )                          =                                            ∑                              j                =                1                            N                        ⁢                                          ∑                                  τ                  =                  0                                ∞                            ⁢                                                                    a                    kj                                    ⁡                                      (                    τ                    )                                                  ⁢                                                                  ⁢                                                      s                    j                                    ⁡                                      (                                          t                      -                      τ                                        )                                                                                =                                    ∑                              j                =                1                            N                        ⁢                          {                                                a                  jk                                *                                                      s                    j                                    ⁡                                      (                    t                    )                                                              }                                                          (        1        )                                                      x            ⁢                                                  ⁢                          (              t              )                                =                      A            *            s            ⁢                                                  ⁢                          (              t              )                                      ⁢                                  ⁢        where        ⁢                                  ⁢                              s            ⁢                                                  ⁢                          (              t              )                                =                      [                                                                                                      s                      1                                        ⁡                                          (                      t                      )                                                                                                                    ⋮                                                                                                                        s                      N                                        ⁡                                          (                      t                      )                                                                                            ]                          ⁢                                  ⁢                              x            ⁢                                                  ⁢                          (              t              )                                =                      [                                                                                                      x                      1                                        ⁡                                          (                      t                      )                                                                                                                    ⋮                                                                                                                        x                      n                                        ⁡                                          (                      t                      )                                                                                            ]                          ⁢                                  ⁢                              A            ⁢                                                  ⁢                          (              t              )                                =                      [                                                                                                      a                      11                                        ⁡                                          (                      t                      )                                                                                        ⋯                                                                                            a                                              1                        ⁢                        N                                                              ⁡                                          (                      t                      )                                                                                                                    ⋮                                                  ⋱                                                  ⋮                                                                                                                        a                                              n                        ⁢                                                                                                  ⁢                        1                                                              ⁡                                          (                      t                      )                                                                                        ⋯                                                                                            a                      nN                                        ⁡                                          (                      t                      )                                                                                            ]                                              (        2        )            
In independent component analysis for a temporal, A and s(t) are not directly estimated but x(t) is transformed into a signal in the time-frequency domain and the signals that corresponds to A and s(t) are estimated in the time-frequency domain. The technique to be used for the analysis will be described below.
The signal vectors x(t) and s(t) are subjected to short-time Fourier transformation in a window of a length of L to produce X(ω, t) and S(ω, t). Similarly the matrix A(t) is subjected to short-time Fourier transform to produce A(ω). Then, the above formula (2) for the time domain can be expressed by formula (3) below Note that ω represents the number of frequency bin (1≦ω≦M) and t represents the frame number (1≦t≦T). With independent component analysis in the time-frequency domain, S(ω, t) and A(ω) are estimated in the time-frequency domain:
                    [                  FORMULA          ⁢                                          ⁢          2                ]                                                                                  X            ⁢                                                  ⁢                          (                              ω                ,                t                            )                                =                      A            ⁢                                                  ⁢                          (              ω              )                        ⁢            S            ⁢                                                  ⁢                          (                              ω                ,                t                            )                                      ⁢                                  ⁢                  where          ,                                          ⁢                                    X              ⁢                                                          ⁢                              (                                  ω                  ,                  t                                )                                      =                          [                                                                                                                  X                        1                                            ⁡                                              (                                                  ω                          ,                          t                                                )                                                                                                                                  ⋮                                                                                                                                      X                        n                                            ⁡                                              (                                                  ω                          ,                          t                                                )                                                                                                        ]                                      ⁢                                  ⁢                              S            ⁡                          (                              ω                ,                t                            )                                =                      [                                                                                                      S                      1                                        ⁡                                          (                                              ω                        ,                        t                                            )                                                                                                                    ⋮                                                                                                                        S                      n                                        ⁡                                          (                                              ω                        ,                        t                                            )                                                                                            ]                                              (        3        )            
The number of frequency bin is same as the length L of the window in the proper sense of the word and each frequency bin represents a frequency component that is produced when the span between −R/2 and R/2 (where R is the sampling frequency) is divided equally into L parts. Since the negative frequency components are respectively complex conjugates of the positive frequency components, they can be expressed by X(−ω)=conj(X(ω)) (where conj(·) is a complex conjugate, only the non-negative frequency components from 0 to R/2 (the number of frequencies bin being equal to L/2+1) are considered and the numbers from 1 to M (M=L/2+1) are assigned to the frequency components).
When estimating S(ω, t) and A(ω) in the time-frequency domain, firstly formula (4) as shown blow is taken into consideration. In the formula (4), Y(ω, t) represents the column vector having elements Yk(ω, t) that are obtained by short-time Fourier transformation of yk(t) in a window with a length L and W(ω) represents a matrix (separate matrix) of n rows and n columns having elements wij(ω).
                    [                  FORMULA          ⁢                                          ⁢          3                ]                                                                                  Y            ⁢                                                  ⁢                          (                              ω                ,                t                            )                                =                      W            ⁢                                                  ⁢                          (              ω              )                        ⁢                          X              ⁡                              (                                  ω                  ,                  t                                )                                                    ⁢                                  ⁢                  where          ,                                          ⁢                                    Y              ⁡                              (                                  ω                  ,                  t                                )                                      =                          [                                                                                                                  Y                        1                                            ⁡                                              (                                                  ω                          ,                          t                                                )                                                                                                                                  ⋮                                                                                                                                      Y                        n                                            ⁡                                              (                                                  ω                          ,                          t                                                )                                                                                                        ]                                      ⁢                                  ⁢                              W            ⁢                                                  ⁢                          (              ω              )                                =                      [                                                                                                      w                      11                                        ⁡                                          (                      ω                      )                                                                                        ⋯                                                                                            w                                              1                        ⁢                        n                                                              ⁡                                          (                      ω                      )                                                                                                                    ⋮                                                  ⋱                                                  ⋮                                                                                                                        w                                              n                        ⁢                                                                                                  ⁢                        1                                                              ⁡                                          (                      ω                      )                                                                                        ⋯                                                                                            w                      nn                                        ⁡                                          (                      ω                      )                                                                                            ]                                              (        4        )            
Then, W(ω) that makes Y1(ω, t) through Yn(ω, t) statistically independent (that maximizes their independency to be more accurate) is determined by changing t, while holding w to a fixed value. Due to permutations and instable scaling that arise in independent component analysis in the time-frequency domain as will be described in greater detail hereinafter, solutions other than W(ω)=A(ω)−1 can exist. As Y1(ω, t) through Yn(ω, t) that are statistically independent are obtained for all the values of w, it is possible to obtain isolated signals (component signals) y(t) by subjecting them to inverse Fourier transformation.
FIG. 2 of the accompanying drawings schematically illustrates the prior art independent component analysis in the time-frequency domain. Assume that the original signals that are emitted from n audio sources and independent from each other are s1 through sn and the vector having them as elements is s. The observation signals x that are observed at respective microphones are obtained by performing convoluted/mixed operations in the above formula (2). FIG. 3A of the accompanying drawings shows as example observation signals that are obtained when the number of microphones n is equal to 2 and hence the number of channels is equal to 2. Then, the observation signals x are subjected to short-time Fourier transformation to obtain signals X of the time-frequency domain. If the elements of X are expressed by Xk(ω, t), Xk(ω, t) takes a complex value. The graphic expression of the absolute value |Xk(ω, t)| of Xk(ω, t), using shades of color, is referred to as spectrogram. FIG. 3B of the accompanying drawings shows spectrograms as examples. In FIG. 3B, the horizontal axis represents t (frame number) and the vertical axis represents ω (frequency bin number). In the following description, a signal itself in the time-frequency domain (a signal before being expressed by an absolute value) is also referred to as “spectrogram”. Subsequently, isolated signals Y as shown in FIG. 3C are obtained by multiplying each frequency bin of the signal X by W(ω). Isolated signals y in the time domain as shown in FIG. 3D are obtained by subjecting the isolated signals Y to inverse Fourier transformation.
Many variations exist as for the scale for expressing independency and the algorithm for maximizing independency. As an example, independency is expressed by means of a Kullback-Leibler information quantity (to be referred to as “KL information quantity” hereinafter) and the natural gradient method is used for the algorithm for maximizing independency in the following description.
Take a frequency bin as shown in FIG. 4. If the frame number t of Yk(ω, t) is made to vary between 1 and T and expressed by Yk(ω), the KL information quantity I that is the scale for expressing the isolated signals Y1(ω) through Yn(ω) is defined by formula (5) below. In other words, the KL, information quantity I is defined as the value obtained by subtracting the simultaneous entropy H(Y(ω)) of the individual frequency bins (=ω) for all the channels from the total sum of the entropies H(Yk(ω)) of the frequency bins (=ω) for the individual channels. FIG. 5 shows the relationship between H(Yk(ω)) and H(Y(ω)) when n=2. In the formula (5), H(Yk(ω)) can be rewritten so as to read as the first term of formula (6) below because of the definition of entropy while H(Y(ω)) can be expanded to read as the second and third terms in the formula (6) from the above formula (4). In the formula (6), PYk(ω)(·) expresses the probability density function of Yk(ω, t) and H(X(ω)) expresses the simultaneous entropy of the observation signals X(ω).
                    [                  FORMULA          ⁢                                          ⁢          4                ]                                                                      I          ⁢                                          ⁢                      (                          Y              ⁡                              (                ω                )                                      )                          =                                            ∑                              k                =                l                            n                        ⁢                          H              ⁡                              (                                                      Y                    k                                    ⁡                                      (                    ω                    )                                                  )                                              -                      H            ⁡                          (                              Y                ⁡                                  (                  ω                  )                                            )                                                          (        5        )                                          =                                                    ∑                                  k                  =                  1                                n                            ⁢                                                E                  t                                ⁡                                  [                                                            -                      log                                        ⁢                                                                                  ⁢                                                                  P                                                  Y                                                      k                            ⁢                                                                                                                  ⁢                                                          (                              ω                              )                                                                                                                          ⁡                                              (                                                                              Y                            k                                                    ⁡                                                      (                                                          ω                              ,                              t                                                        )                                                                          )                                                                              ]                                                      -                          log              ⁢                                                                det                  ⁢                                                                          ⁢                                      (                                          W                      ⁡                                              (                        ω                        )                                                              )                                                                                        -                          H              ⁡                              (                                  X                  ⁡                                      (                    ω                    )                                                  )                                                    ⁢                                  ⁢                  where          ,                                          ⁢                                                    Y                k                            ⁡                              (                ω                )                                      =                          [                                                                                                                  Y                        k                                            ⁡                                              (                                                  ω                          ,                          1                                                )                                                                                                  ⋯                                                                                                      Y                        k                                            ⁡                                              (                                                  ω                          ,                          T                                                )                                                                                                        ]                                      ⁢                                  ⁢                              Y            ⁡                          (              ω              )                                =                      [                                                                                                      Y                      1                                        ⁡                                          (                      ω                      )                                                                                                                    ⋮                                                                                                                        Y                      n                                        ⁡                                          (                      ω                      )                                                                                            ]                          ⁢                                  ⁢                              X            ⁡                          (              ω              )                                =                      [                                                                                X                    ⁡                                          (                                              ω                        ,                        1                                            )                                                                                        ⋯                                                                      X                    ⁡                                          (                                              ω                        ,                        T                                            )                                                                                            ]                                              (        6        )            
The KL information quantity I(Y(ω)) becomes minimal (ideally equal to 0) when Y1(ω) through Yn(ω) are independent. The natural gradient method is used for the algorithm for determining the separation matrix W(ω) that minimizes the KL information quantity I (Y(ω)). With the natural gradient method, the direction for minimizing I(Y(ω)) is determined by means of formula (7) below and W(ω) is gradually changed in that direction as shown by formula (9) below for convergence. In the formula (7), W(ω)T shows the transposed matrix of W(ω). In the formula (9), η represents a learning coefficient (a very small positive value).
                    [                  FORMULA          ⁢                                          ⁢          5                ]                                                                      Δ          ⁢                                          ⁢          W          ⁢                                          ⁢                      (            ω            )                          =                              -                                                            ∂                  I                                ⁢                                                                  ⁢                                  (                                      Y                    ⁢                                                                                  ⁢                                          (                      ω                      )                                                        )                                                                              ∂                  W                                ⁢                                                                  ⁢                                  (                  ω                  )                                                              ⁢          W          ⁢                                          ⁢                                    (              ω              )                        T                    ⁢          W          ⁢                                          ⁢                      (            ω            )                                              (        7        )                                                          ⁢                  =                                                    -                                  {                                                                                    E                        t                                            ⁡                                              [                                                                              -                                                          ϕ                              ⁡                                                              (                                                                  Y                                  ⁡                                                                      (                                                                          ω                                      ,                                      t                                                                        )                                                                                                  )                                                                                                              ⁢                                                                                    X                              ⁡                                                              (                                                                  ω                                  ,                                  t                                                                )                                                                                      T                                                                          ]                                                              -                                                                  (                                                                              W                            ⁡                                                          (                              ω                              )                                                                                T                                                )                                                                    -                        1                                                                              }                                            ⁢                                                W                  ⁡                                      (                    ω                    )                                                  T                            ⁢                              W                ⁡                                  (                  ω                  )                                                      ⁢                                                  ⁢                                                  =                                          {                                                      I                    n                                    +                                                            E                      t                                        ⁡                                          [                                                                        ϕ                          ⁡                                                      (                                                          Y                              ⁡                                                              (                                                                  ω                                  ,                                  t                                                                )                                                                                      )                                                                          ⁢                                                                              Y                            ⁡                                                          (                                                              ω                                ,                                t                                                            )                                                                                T                                                                    ]                                                                      }                            ⁢                              W                ⁡                                  (                  ω                  )                                                                                        (        8        )                                                      W            ⁢                                                  ⁢                          (              ω              )                                ←                                    W              ⁢                                                          ⁢                              (                ω                )                                      +                                          η                ·                Δ                            ⁢                                                          ⁢              W              ⁢                                                          ⁢                              (                ω                )                                                    ⁢                                  ⁢                  where          ,                                          ⁢                                          ⁢                                    Y              ⁡                              (                                  ω                  ,                  t                                )                                      =                          [                                                                                                                  Y                        1                                            ⁡                                              (                                                  ω                          ,                          t                                                )                                                                                                                                  ⋮                                                                                                                                      Y                        n                                            ⁡                                              (                                                  ω                          ,                          t                                                )                                                                                                        ]                                      ⁢                                  ⁢                                  ⁢                              ϕ            ⁡                          (                              Y                ⁡                                  (                                      ω                    ,                    t                                    )                                            )                                =                      [                                                                                                      ϕ                      1                                        ⁡                                          (                                                                        Y                          1                                                ⁡                                                  (                                                      ω                            ,                            t                                                    )                                                                    )                                                                                                                    ⋮                                                                                                                        ϕ                      n                                        ⁡                                          (                                                                        Y                          n                                                ⁡                                                  (                                                      ω                            ,                            t                                                    )                                                                    )                                                                                            ]                          ⁢                                  ⁢                                            ϕ              k                        ⁡                          (                                                Y                  k                                ⁡                                  (                                      ω                    ,                    t                                    )                                            )                                =                                                    ∂                                  ∂                                                            Y                      k                                        ⁡                                          (                                              ω                        ,                        t                                            )                                                                                  ⁢              log              ⁢                                                          ⁢                                                P                                      Y                                          k                      ⁡                                              (                        ω                        )                                                                                            ⁡                                  (                                                            Y                      k                                        ⁡                                          (                                              ω                        ,                        t                                            )                                                        )                                                      ⁢                                                  ⁢                                                  =                                                            ∂                                      ∂                                                                  Y                        k                                            ⁡                                              (                                                  ω                          ,                          t                                                )                                                                                            ⁢                                                                  ⁢                                                      P                                          Y                                              k                        ⁡                                                  (                          ω                          )                                                                                                      ⁡                                      (                                                                  Y                        k                                            ⁡                                              (                                                  ω                          ,                          t                                                )                                                              )                                                                                                P                                                            Y                      k                                        ⁡                                          (                      ω                      )                                                                      ⁡                                  (                                                            Y                      k                                        ⁡                                          (                                              ω                        ,                        t                                            )                                                        )                                                                                        (        9        )            
The above formula (7) can be modified so as to read as formula (8) above. In the formula (8), Et[·] represents the average in the temporal direction and φ (·) represents the differential of the logarithm of a probability density function that is referred to as score function (or “activation function”). While a score function includes the probability density function of Yk(ω), it is known that it is not necessary to use a real probability density function for the purpose of determining the smallest value of the KL, information quantity and probability density functions of two different types as shown in Table 1 can be used in a switched manner depending on if the distribution of Yk(ω) is super-gaussian or sub-gaussian.
TABLE 1distribution of Yk(ω)score functionprobability density functionsuper-gaussian−thna[Yk(ω, t)]h/cosh[Yk(ω, t)]sub-gaussian−Yk(ω, t)3h exp[−Yk(ω, t)4/4]
Alternatively, probability density functions of two different types as shown in Table 2 may be used in a switched manner as extended infomax method.
TABLE 2distribution ofprobabilityYk(ω)score functiondensity functionsuper-gaussian−[Yk(ω, t) + tanh[Yk(ω, t)]]h exp[−Yk(ω, t)2/2]/cosh[Yk(ω, t)]sub-gaussian−[Yk(ω, t) − tanh[Yk(ω, t)]]h exp[−Yk(ω, t)2/2]cosh[Yk(ω, t)]
In Tables 1 and 2, h represents a constant for making the value of the integral of the probability density function in the interval between −∞ and +∞ equal to 1. If the distribution of Yk(ω) is super-gaussian or sub-gaussian is determined according to if the value of the cumulant of the fourth degree ×4 (=Et[Yk(ω, t)4]−3Et[Yk(ω, t)2]2) is positive or negative. It is super-gaussian when ×4 is positive and sub-gaussian when ×4 is negative.
FIG. 6 is a flowchart of a separation process using the above formula (8) and (9). Referring to FIG. 6, firstly in Step S101, a separation matrix W(ω) is prepared for each frequency bin and substituted by an initial value (e.g., unit matrix). Then, in the next step, or Step S102, it is determined if W(ω) converges or not for all the frequency bins and the process is terminated if it converges but made to proceed to Step S103 if it does not converge. In Step S103, Y(ω, t) is defined as the above formula (4) and, in Step S104, the direction for minimizing the KL information quantity I(Y(ω)) is determined by means of the above formula (8). Then, in the next step, or Step S105, W(ω) is updated in the direction for minimizing the KL information quantity I(Y(ω)) according to the above formula (9) and returns to Step S102. The processing operations in Steps S102 through S105 are repeated until the level of independence of Y(ω) is sufficiently raised for each frequency bin and W(ω) substantially converges.