1. Field of the Invention
The present invention relates to a signal processing apparatus, a signal processing method, and a program. More particularly, the present invention relates to a signal processing apparatus, a signal processing method, and a program for separating a mixture signal of plural sounds per (sound) source by an ICA (Independent Component Analysis), and for performing an analysis of sound signals at an arbitrary position by using separation signals, i.e., separation results, such as an analysis of sound signals to be collected by each of microphones installed at respective arbitrary positions (i.e., projection-back to individual microphones).
2. Description of the Related Art
There is an ICA (Independent Component Analysis) as a technique for separating individual source signals which are included in a mixture signal of plural sounds. The ICA is one type of multi-variate analysis, and it is a method for separating multi-dimensional signals based on statistical properties of signals. See, e.g., “NYUMON DOKURITSU SEIBUN BUNSEKI (Introduction—Independent Component Analysis)” (Noboru Murata, Tokyo Denki University Press) for details of the ICA per se.
The present invention relates to a technique for separating a mixture signal of plural sounds per (sound) source by the ICA (Independent Component Analysis), and for performing, e.g., projection-back to individual microphones installed at respective arbitrary positions by using separation signals, i.e., separation results. Such a technique can realize, for example, the following processes.                (1) The ICA is performed based on sounds collected by directional microphones, and separation signals obtained as the results of separating the collected sounds are projected back to omnidirectional microphones.        (2) The ICA is performed based on sounds collected by microphones which are arranged to be adapted for source separation, and separation signals obtained as the results of separating the collected sounds are projected back to microphones which are arranged to be adapted for DOA (Direction of Arrival) estimation or source position estimation.        
The ICA for sound signals, in particular, the ICA in the time-frequency domain, will be described with reference to FIG. 1.
Assume a situation where, as illustrated in FIG. 1, a number N of sound sources are active to generate different sounds and a number n of microphones are used to observe those sounds. There are time delays and reflections until the sounds (source signals) generated from the sound sources arrive the microphones. Accordingly, a signal (observation signal) observed by a microphone j can be expressed as the following formula [1.1] by totalizing convolutions of the source signals and a transfer function for all the sound sources. Such mixtures are called “convolutive mixtures” hereinafter.
Also, observation signals of all the microphones can be expressed by the following single formula [1.2].
                                          x            k                    ⁡                      (            t            )                          =                                            ∑                              j                =                1                            N                        ⁢                                          ∑                                  l                  =                  0                                L                            ⁢                                                                    a                    kj                                    ⁡                                      (                    l                    )                                                  ⁢                                                      s                    j                                    ⁡                                      (                                          t                      -                      l                                        )                                                                                =                                    ∑                              j                =                1                            N                        ⁢                          {                                                a                  kj                                *                                  s                  j                                            }                                                          [        1.1        ]                                          x          ⁡                      (            t            )                          =                                            A                              [                0                ]                                      ⁢                          s              ⁡                              (                t                )                                              +          …          +                                    A                              [                L                ]                                      ⁢                          s              ⁡                              (                                  t                  -                  L                                )                                                                        [        1.2        ]                                where        ,        \                                                                                  s            ⁡                          (              t              )                                =                      [                                                                                                      s                      1                                        ⁡                                          (                      t                      )                                                                                                                    ⋮                                                                                                                        s                      N                                        ⁡                                          (                      t                      )                                                                                            ]                          ,                              x            ⁡                          (              t              )                                =                      [                                                                                                      x                      1                                        ⁡                                          (                      t                      )                                                                                                                    ⋮                                                                                                                        x                      n                                        ⁡                                          (                      t                      )                                                                                            ]                          ,                              A                          [              t              ]                                =                      [                                                                                                      a                      11                                        ⁡                                          (                      l                      )                                                                                        …                                                                                            a                                              1                        ⁢                        N                                                              ⁡                                          (                      l                      )                                                                                                                    ⋮                                                  ⋱                                                  ⋮                                                                                                                        a                                              n                        ⁢                                                                                                  ⁢                        1                                                              ⁡                                          (                      l                      )                                                                                        …                                                                                            a                      nN                                        ⁡                                          (                      l                      )                                                                                            ]                                              [        1.3        ]            
In the above formulae, x(t) and s(t) are column vectors having elements xk(t) and sk(t), respectively, and A[1] is an (n×N) matrix having elements akj(l). Note that n=N is assumed in the following description.
It is known that the convolution mixtures in the time domain can be expressed as instantaneous mixtures in the time-frequency domain. The ICA in the time-frequency domain utilizes such a feature.
Regarding the time-frequency domain ICA per se, see “19.2.4. Fourier Transform Method in ‘Detailed Explanation: Independent Component Analysis’”, Japanese Unexamined Patent Application Publication No. 2006-238409, “APPARATUS AND METHOD FOR SEPARATING AUDIO SIGNALS”, etc.
The following description is made primarily about points related to embodiments of the present invention.
By subjecting both sides of the formula [1.2] to the short-time Fourier transform, the following formula [2.1] is obtained.
                              X          ⁡                      (                          ω              ,              t                        )                          =                              A            ⁡                          (              ω              )                                ⁢                      S            ⁡                          (                              ω                ,                t                            )                                                          [        2.1        ]                                          X          ⁡                      (                          ω              ,              t                        )                          =                  [                                                                                          X                    1                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                                                                      ⋮                                                                                                          X                    n                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                                                ]                                    [        2.2        ]                                          A          ⁡                      (            ω            )                          =                  [                                                                                          A                    11                                    ⁡                                      (                    ω                    )                                                                              …                                                                                  A                                          1                      ⁢                      N                                                        ⁡                                      (                    ω                    )                                                                                                      ⋮                                            ⋱                                            ⋮                                                                                                          A                                          n                      ⁢                                                                                          ⁢                      1                                                        ⁡                                      (                    ω                    )                                                                              …                                                                                  A                    nN                                    ⁡                                      (                    ω                    )                                                                                ]                                    [        2.3        ]                                          S          ⁡                      (                          ω              ,              t                        )                          =                  [                                                                                          S                    1                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                                                                      ⋮                                                                                                          S                    N                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                                                ]                                    [        2.4        ]                                          Y          ⁡                      (                          ω              ,              t                        )                          =                              W            ⁡                          (              ω              )                                ⁢                      X            ⁡                          (                              ω                ,                t                            )                                                          [        2.5        ]                                          Y          ⁡                      (                          ω              ,              t                        )                          =                  [                                                                                          Y                    1                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                                                                      ⋮                                                                                                          Y                    n                                    ⁡                                      (                                          ω                      ,                      t                                        )                                                                                ]                                    [        2.6        ]                                          W          ⁡                      (            ω            )                          =                  [                                                                                          W                    11                                    ⁡                                      (                    ω                    )                                                                              …                                                                                  W                                          1                      ⁢                      n                                                        ⁡                                      (                    ω                    )                                                                                                      ⋮                                            ⋱                                            ⋮                                                                                                          W                                          n                      ⁢                                                                                          ⁢                      1                                                        ⁡                                      (                    ω                    )                                                                              …                                                                                  W                    nn                                    ⁡                                      (                    ω                    )                                                                                ]                                    [        2.7        ]            
In the above formula [2.1],                ω is index of frequency bin (ω=1 to M, M is a total number of frequency bins), and        t is index of frame (t=1 to T, T is a total number of frames).        
If ω is assumed to be fixed, the formula [2.1] can be regarded as representing instantaneous mixtures (i.e., mixtures without time delays). To separate the observation signal, therefore, a formula [2.5] for calculating separation signals [Y], i.e., separation results, is prepared and a separation matrix W(ω) is determined such that individual components of the separation results Y(ω,t) are most independent of one another.
The time-frequency domain ICA according to the related art has accompanied with the problem called “permutation problem”, i.e., the problem that it is not consistent among bins which component is separated into which channel. However, the permutation problem has been substantially solved by the approach disclosed in Japanese Unexamined Patent Application Publication No. 2006-238409, “APPARATUS AND METHOD FOR SEPARATING AUDIO SIGNALS”, which is a patent application made by the same inventor as in this application. Because the related-art approach is also used in embodiments of the present invention, the approach for solving the permutation problem, discloses in Japanese Unexamined Patent Application Publication No. 2006-238409, will be briefly described below.
In Japanese Unexamined Patent Application Publication No. 2006-238409, calculations of the following formulae [3.1] to [3.3] are iteratively executed until the separation matrix W(ω) is converged (or a predetermined number of times), for the purpose of obtaining the separation matrix W(ω):
                              Y          ⁡                      (                          ω              ,              t                        )                          =                              W            ⁡                          (              ω              )                                ⁢                      X            ⁡                          (                              ω                ,                t                            )                                ⁢                                          ⁢                      (                                          t                =                1                            ,              …              ⁢                                                          ,                                                T                  ⁢                                                                          ⁢                  ω                                =                1                            ,              …              ⁢                                                          ,              M                        )                                              [        3.1        ]                                          Δ          ⁢                                          ⁢                      W            ⁡                          (              ω              )                                      =                              {                          I              +                                                〈                                                                                    φ                        ω                                            ⁡                                              (                                                  Y                          ⁡                                                      (                            t                            )                                                                          )                                                              ⁢                                                                  Y                        ⁡                                                  (                                                      ω                            ,                            t                                                    )                                                                    H                                                        〉                                t                                      }                    ⁢                      W            ⁡                          (              ω              )                                                          [        3.2        ]                                          W          ⁡                      (            ω            )                          ←                              W            ⁡                          (              ω              )                                +                      ηΔ            ⁢                                                  ⁢                          W              ⁡                              (                ω                )                                                                        [        3.3        ]                                          Y          ⁡                      (            t            )                          =                              [                                                                                                      Y                      1                                        ⁡                                          (                                              1                        ,                        t                                            )                                                                                                                    ⋮                                                                                                                        Y                      1                                        ⁡                                          (                                              M                        ,                        t                                            )                                                                                                                    ⋮                                                                                                                        Y                      n                                        ⁡                                          (                                              1                        ,                        t                                            )                                                                                                                    ⋮                                                                                                                        Y                      n                                        ⁡                                          (                                              M                        ,                        t                                            )                                                                                            ]                    =                      [                                                                                                      Y                      1                                        ⁡                                          (                      t                      )                                                                                                                    ⋮                                                                                                                        Y                      n                                        ⁡                                          (                      t                      )                                                                                            ]                                              [        3.4        ]                                                      φ            ω                    ⁡                      (                          Y              ⁡                              (                t                )                                      )                          =                  [                                                                                          φ                    ω                                    ⁡                                      (                                                                  Y                        1                                            ⁡                                              (                        t                        )                                                              )                                                                                                      ⋮                                                                                                          φ                    ω                                    ⁡                                      (                                                                  Y                        n                                            ⁡                                              (                        t                        )                                                              )                                                                                ]                                    [        3.5        ]                                                      φ            ω                    ⁡                      (                                          Y                k                            ⁡                              (                t                )                                      )                          =                              ∂                          ∂                                                Y                  k                                ⁡                                  (                                      ω                    ,                    t                                    )                                                              ⁢          log          ⁢                                          ⁢                      P            ⁡                          (                                                Y                  k                                ⁡                                  (                  t                  )                                            )                                                          [        3.6        ]                                          P          ⁡                      (                                          Y                k                            ⁡                              (                t                )                                      )                          ⁢                  :                ⁢                                  ⁢        probability        ⁢                                  ⁢        density        ⁢                                  ⁢        function        ⁢                                  ⁢                  (          PDF          )                ⁢                                  ⁢        of        ⁢                                  ⁢                              Y            k                    ⁡                      (            t            )                                                                                        P          ⁡                      (                                          Y                k                            ⁡                              (                t                )                                      )                          ∝                  exp          ⁡                      (                                          -                γ                            ⁢                                                                                                            Y                      k                                        ⁡                                          (                      t                      )                                                                                        2                                      )                                              [        3.7        ]                                                                                                Y                k                            ⁡                              (                t                )                                                          m                =                              {                                          ∑                                  ω                  =                  1                                M                            ⁢                                                                                                            Y                      k                                        ⁡                                          (                                              ω                        ,                        t                                            )                                                                                        m                                      }                                1            /            m                                              [        3.8        ]                                                      φ            ω                    ⁡                      (                                          Y                k                            ⁡                              (                t                )                                      )                          =                              -            γ                    ⁢                                                    Y                k                            ⁡                              (                                  ω                  ,                  t                                )                                                                                                                          Y                    k                                    ⁡                                      (                    t                    )                                                                              2                                                          [        3.9        ]                                W        =                  [                                                                                                                                                                  W                          11                                                ⁡                                                  (                          1                          )                                                                                                                                                                                                                        0                                                                                                                                                                                                              ⋱                                                                                                                                                                                                              0                                                                                                                                                                                                                            W                          11                                                ⁡                                                  (                          M                          )                                                                                                                                                …                                                                                                                                                          W                                                      1                            ⁢                            n                                                                          ⁡                                                  (                          1                          )                                                                                                                                                                                                                        0                                                                                                                                                                                                              ⋱                                                                                                                                                                                                              0                                                                                                                                                                                                                            W                                                      1                            ⁢                            n                                                                          ⁡                                                  (                          M                          )                                                                                                                                                                        ⋮                                            ⋱                                            ⋮                                                                                                                                                                                  W                                                      n                            ⁢                                                                                                                  ⁢                            1                                                                          ⁡                                                  (                          1                          )                                                                                                                                                                                                                        0                                                                                                                                                                                                              ⋱                                                                                                                                                                                                              0                                                                                                                                                                                                                            W                                                      n                            ⁢                                                                                                                  ⁢                            1                                                                          ⁡                                                  (                          M                          )                                                                                                                                                …                                                                                                                                                          W                          nn                                                ⁡                                                  (                          1                          )                                                                                                                                                                                                                        0                                                                                                                                                                                                              ⋱                                                                                                                                                                                                              0                                                                                                                                                                                                                            W                          nn                                                ⁡                                                  (                          M                          )                                                                                                                                                  ]                                    [        3.10        ]                                          X          ⁡                      (            t            )                          =                  [                                                                                          X                    1                                    ⁡                                      (                                          1                      ,                      t                                        )                                                                                                      ⋮                                                                                                          X                    1                                    ⁡                                      (                                          M                      ,                      t                                        )                                                                                                      ⋮                                                                                                          X                    n                                    ⁡                                      (                                          1                      ,                      t                                        )                                                                                                      ⋮                                                                                                          X                    n                                    ⁡                                      (                                          M                      ,                      t                                        )                                                                                ]                                    [        3.11        ]                                          Y          ⁡                      (            t            )                          =                  WX          ⁡                      (            t            )                                              [        3.12        ]            
Those iterated executions are referred to as “learning” hereinafter. Note that the calculations of the following formulae [3.1] to [3.3] are executed for all the frequency bins and the calculation of the formula [3.1] is executed for all frames of the accumulated observation signals. In the formula [3.2], t represents a frame number and < >t represents a mean over frames within a certain zone. H attached to an upper right corner of Y(ω,t) represents a Hermitian transpose. The Hermitian transpose implies a process of taking a transpose of a vector or a matrix and converting an element to a conjugate complex number.
The separation signals Y(t), i.e., the separation results, are expressed by a formula [3.4] and are represented in the form of a vector including elements of all channels and all frequency bins for the separation results. Also, φω(Y(t)) is a vector expressed by a formula [3.5]. Each element φω(Yk(t)) of that vector is called a score function which is a logarithmic differential (formula [3.6]) of a multi-dimensional (multi-variate) probability density function (PDF) of Yk(t). For example, a function expressed by a formula [3.7] can be used as the multi-dimensional PDF. In that case, the score function φω(Yk(t)) can be expressed by a formula [3.9]. In the formula [3.9], ∥Yk(t)∥2 represents an L-2 norm of the vector Yk(t) (i.e., a square-root of the square sum of all the elements). An L-m norm of Yk(t), i.e., the generalized expression of the L-2 norm, is defined as a formula [3.8]. Also, γ in the formulae [3.7] and [3.9] is a term for adjusting a scale of Yk(ω,t), and a proper positive constant, e.g., sqrt(M) (square root of the number of frequency bins), is assigned to γ. Further, η in the formula [3.3] is called a learning rate or a learning coefficient and is a small positive value (e.g., about 0.1). The learning rate is used to reflect ΔW(ω), which is calculated based on the formula [3.2], upon the separation matrix W(ω) a little by a little.
Although the formula [3.1] represents separation for one frequency bin (see FIG. 2A), separation for all the frequency bins can be expressed by one formula (see FIG. 2B).
To that end, the separation results Y(t) for all the frequency bins, which are expressed by the formula [3.4], observation signals X(t) expressed by a formula [3.11], and a separation matrix W for all the frequency bins, which is expressed by a formula [3.10], are used. Thus, by using those vectors and matrix, the separation can be expressed by a formula [3.12]. In the explanation of embodiments of the present invention, the formulae [3.1] and [3.11] are selectively used as appropriate.
Representations denoted by X1 to Xn and Y1 to Yn in FIGS. 2A and 2B are called spectrograms in each of which the results of the short-time Fourier transform (STFT) are arranged in a direction of the frequency bin and in a direction of the frame. The vertical direction indicates the frequency bin, and the horizontal direction indicates the frame. In the formulae [3.4] and [3.11], lower frequencies are put on the upper side. Conversely, in the spectrograms, lower frequencies are put on the lower side.
The time-frequency domain ICA further has the problem called “scaling problem”. Namely, because scales (amplitudes) of the separation results differ from one another in individual frequency bins, balance among frequencies differs from that of source signals when re-converted to waveforms, unless the scale differences are properly adjusted. “Projection back to microphones”, described below, has been proposed to solve the problem of “scaling”.
[Projection Back to Microphones]
Projecting the separation results of the ICA back to microphones means determining respective components attributable to individual source signals from the collected sound signals, through analyzing sound signals collected by the microphones each set at a certain position. The respective components attributable to the individual source signals are equal to respective signals observed by the microphones when only one sound source is active.
For example, it is assumed that one separation signal Yk obtained as the signal separation result corresponds to a sound source 1 illustrated in FIG. 1. In that case, projecting the separation signal Y1 back to the microphones 1 to n is equivalent to estimating signals observed by the individual microphones when only the sound source 1 is active. The signals after the projection-back include influences of, e.g., phase delays, attenuations, and reverberations (echoes) upon the source signals and hence differ from one another per microphone as a projection-back target.
In a configuration where a plurality of microphones 1 to n are set as illustrated in FIG. 1, there are plural (n) projection-back targets for one separation result. Such a signal providing a plurality of outputs for one input is called the SIMO (Single Input, Multiple Outputs) type. In the setting illustrated in FIG. 1, for example, because a number N of separation results exist corresponding to the number N of sources, there are (N×n) signals in total after the projection-back. However, when solution of the scaling problem is just intended, it is sufficient to project the separation results back to any one microphone or to project Y1 to Yn back to the microphones 1 to n, respectively.
By projecting the separation results back to the microphone(s) as described above, signals having similar frequency scales to those of the source signals can be obtained. Adjusting the scales of the separation results in such a manner is called “rescaling”.
SIMO-type signals are also used in other applications than the rescaling. For example, Japanese Unexamined Patent Application Publication No. 2006-154314 discloses a technique for obtaining separation results with a sense of sound localization by separating signals, which are observed by each of two microphones, into two SIMO signals (two stereoscopic signals). Japanese Unexamined Patent Application Publication No. 2006-154314 further discloses a technique for enabling separation results to follow changes of sound sources at a shorter frequency than the update interval of a separation matrix in the ICA by applying another type of source separation, i.e., a binary mask, to the separation results provided as the stereo signals.
Methods for producing the SIMO-type separation results and projection-back results will be described below. With one method, the algorithm of the ICA is itself modified so as to directly produce the SIMO-type separation results. Such a method is called “SIMO ICA”. Japanese Unexamined Patent Application Publication No. 2006-154314 discloses that type of process.
With another method, after obtaining the ordinary separation results Y1 to Yn, the results of projection-back to the individual microphones are determined by multiplying proper coefficients. Such a method is called “Projection-back SIMO”. In the following, the latter Projection-back SIMO more closely related to embodiments of the present invention will be described.
See the following references, for example, regarding general explanations of the Projection-back SIMO:
Noboru Murata and Shiro Ikeda, “An on-line algorithm for blind source separation on speech signals.” In Proceedings of 1998 International Symposium on Nonlinear Theory and it's Applications (NOLTA'98), pp. 923-926, Crans-Montana, Switzerland, September 1998 (http://www.ism.ac.jp/˜shiro/papers/conferences/nolta1988.pdf), and
Murata et al.: “An approach to blind source separation based on temporal structure of speech signals”, Neurocomputing, pp. 1.24, 2001. (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.43.8460&rep=rep1&type=pdf).
The Projection-back SIMO more closely related to embodiments of the present invention is described below.
The result of projecting a separation result Yk(ω,t) back to a microphone i is written as Yk[i](ω,t). A vector made up of Yk[1](ω, t) to Yk[n](ω, t) which are the results of projecting the separation result Yk(ω,t) back to the microphones 1 to n, can be expressed by the following formula [4.1]. The second term of the right hand side of the formula [4.1] is a vector that is produced by setting other elements of Y(ω,t) expressed by the formula [2.6] than the k-th element to 0, and it represents the situation that only a sound source corresponding to Yk(ω,t) is active. An inverse matrix of the separation matrix represents a spatial transfer function. Consequently, the formula [4.1] corresponds to a formula for obtaining signals observed by the individual microphones under the situation that only the sound source corresponding to Yk(ω,t) is active.
                              [                                                                                          Y                    k                                          ⌊                      1                      ⌋                                                        ⁡                                      (                                          ω                      ,                      t                                        )                                                                                                      ⋮                                                                                                          Y                    k                                          ⌈                      n                      ⌉                                                        ⁡                                      (                                          ω                      ,                      t                                        )                                                                                ]                =                                            W              ⁡                              (                ω                )                                                    -              1                                ⁡                      [                                                            0                                                                                                                        Y                      k                                        ⁡                                          (                                              ω                        ,                        t                                            )                                                                                                                    0                                                      ]                                              [        4.1        ]                                =                              diag            ⁡                          (                                                                    B                                          k                      ⁢                                                                                          ⁢                      1                                                        ⁡                                      (                    ω                    )                                                  ,                …                ⁢                                                                  ,                                                      B                    kn                                    ⁡                                      (                    ω                    )                                                              )                                ⁢                                    Y              k                        ⁡                          (                              ω                ,                t                            )                                                          [        4.2        ]                                                      W            ⁡                          (              ω              )                                            -            1                          =                              B            ⁡                          (              ω              )                                =                      [                                                                                                      B                      11                                        ⁡                                          (                      ω                      )                                                                                        …                                                                                            B                                              1                        ⁢                        n                                                              ⁡                                          (                      ω                      )                                                                                                                    ⋮                                                  ⋱                                                  ⋮                                                                                                                        B                                              n                        ⁢                                                                                                  ⁢                        1                                                              ⁡                                          (                      ω                      )                                                                                        …                                                                                            B                      nn                                        ⁡                                          (                      ω                      )                                                                                            ]                                              [        4.3        ]                                          [                                                                                          Y                    1                                          ⌊                      k                      ⌋                                                        ⁡                                      (                                          ω                      ,                      t                                        )                                                                                                      ⋮                                                                                                          Y                    n                                          [                      k                      ]                                                        ⁡                                      (                                          ω                      ,                      t                                        )                                                                                ]                =                              diag            ⁡                          (                                                                    B                                          1                      ⁢                      k                                                        ⁡                                      (                    ω                    )                                                  ,                …                ⁢                                                                  ,                                                      B                    nk                                    ⁡                                      (                    ω                    )                                                              )                                ⁢                      Y            ⁡                          (                              ω                ,                t                            )                                                          [        4.4        ]                                =                              [                                                                                                      B                                              1                        ⁢                        k                                                              ⁡                                          (                      ω                      )                                                                                                                    ⋮                                                                                                                        B                      nk                                        ⁡                                          (                      ω                      )                                                                                            ]                    ⁢                                    Y              k                        ⁡                          (                              ω                ,                t                            )                                                          [        4.2        ]                                          [                                                                                          Y                    1                                          [                      k                      ]                                                        ⁡                                      (                                          ω                      ,                      t                                        )                                                                                                      ⋮                                                                                                          Y                    n                                          [                      k                      ]                                                        ⁡                                      (                                          ω                      ,                      t                                        )                                                                                ]                =                  diag          ⁡                      (                                                            B                                      k                    ⁢                                                                                  ⁢                    1                                                  ⁡                                  (                  ω                  )                                            ,              …              ⁢                                                          ,                                                B                  kn                                ⁡                                  (                  ω                  )                                                      )                                              [        4.4        ]            
The formula [4.1] can be rewritten to a formula [4.2]. In the formula [4.2], Bik(ω) represents each element of B(ω) that is an inverse matrix of the separation matrix W(ω) (see a formula [4.3]).
Also, diag(•) represents a diagonal matrix having elements in the parenthesis as diagonal elements.
On the other hand, a formula expressing the projection-back of the separation results Y1(ω,t) to Yn(ω,t) to a microphone k is given by a formula [4.4]. Thus, the projection-back can be performed by multiplying the vector Y(ω,t) representing the separation results by a coefficient matrix diag(Bk1(ω), . . . , Bkn(ω)) for the projection-back.
[Problems in Related-Art]
However, the above-described projection-back process in accordance with the formulae [4.1] to [4.4] is the projection-back to the microphones used in the ICA and is not adaptable for the projection-back to microphones not used in the ICA. Accordingly, there is a possibility that problems may occur when the microphones used in the ICA and the arrangement thereof are not optimum for other processes. The following two points will be discussed below as examples of the problems.
(1) Use of directional microphones
(2).Combined use with DOA (Direction of Arrival) estimation and source position estimation
(1) Use of Directional Microphones
The reason why a plurality of microphones are used in the ICA resides in obtaining a plurality of observation signals in which a plurality of sound sources are mixed with one another at different degrees. At that time, the larger difference in the mixing degrees among the microphones, the more convenient for the separation and the learning. In other words, the larger difference in the mixing degrees among the microphones is more effective not only in increasing a ratio of an objective signal to interference sounds that remain in the separation results without being erased (i.e., Signal-to-Interference Ratio: SIR), but also in converging a learning process to obtain the separation matrix in a smaller number of times.
A method using directional microphones has been proposed to obtain the observation signals having the larger difference in the mixing degrees. See, e.g., Japanese Unexamined Patent Application Publication No. 2007-295085. More specifically, the proposed method is intended to make the mixing degrees differ from one another by using microphones each having high (or low) sensitivity in a particular direction.
However, a problem arises when the ICA is performed on signals observed by directional microphones and the separation results are projected back to the directional microphones. In other words, because directivity of each directional microphone differs depending on frequency, there is a possibility that sounds of the separation results may be distorted (or may have frequency balance differing from that of the source signals). Such a problem will be described below with reference to FIG. 3.
FIG. 3 illustrates an exemplary configuration of a simple directional microphone 300. The directional microphone 300 includes two sound collection devices 301 and 302 which are arranged at a device interval d between them. One of signal streams observed by the sound collection devices, e.g., a stream observed by the sound collection device 302 in the illustrated example, is caused to pass through a delay processing module 303 for generating a predetermined delay (D) and a mixing gain control module 304 for applying a predetermined gain (a) to the passing signal. The delayed signals and the signals observed by the sound collection device 301 are mixed with each other in an adder 305, whereby an output signal 306 can be generated which has sensitivity differing depending on direction. With such a configuration, for example, the directional microphone 300 realizes the so-called directivity, i.e., sensitivity increased in a particular direction.
By setting the delay D=d/C (C is the sound velocity) and the mixing gain a=−1 in the configuration of the directional microphone 300 illustrated in FIG. 3, a directivity is formed so as to cancel sounds coming from the right side of the directional microphone 300 and to intensify sounds coming from the left side thereof. FIG. 4 illustrates the results of plotting the directivity (i.e., the relationship between an incoming direction and an output gain) for each of four frequencies (100 Hz, 1000 Hz, 3000 Hz, and 6000 Hz) on condition of d=0.04 [m] and C=340 [m/s]. In FIG. 4, a scale is adjusted per frequency such that output gains for sounds coming from the left side are all just 1. Also, it is assumed that sound collection devices 401 and 402 illustrated in FIG. 4 are respectively the same as the sound collection devices 301 and 302 illustrated in FIG. 3.
As illustrated in FIG. 4, the output gains are all just 1 for sounds (sounds A) incoming from the left side (front side of the directional microphone) as viewed in the direction in which the two sound collection devices 401 and 402 are arrayed at the interval, while the output gains are all just 0 for sounds (sounds B) incoming from the right side (rear side of the directional microphone) as viewed in the direction in which the two sound collection devices 401 and 402 are arrayed at the interval. In the other directions, however, the output gains differ with changes of frequency.
Further, when the sound wavelength corresponds to frequency is shorter than double of the device interval (d) (i.e., at frequency of 4250 [Hz] or higher on condition of d=0.04 [m] and C=340 [m/s]), a phenomenon called “spatial aliasing” occurs. Therefore, a direction in which sensitivity is low is additionally formed other than the right side. Looking at a plot of the directivity at 6000 Hz in FIG. 4, for example, the output gain also becomes 0 for a sound incoming from an oblique direction, such as denoted by “SOUNDS C”, for example. Thus, an observation region where a sound of a particular frequency is not detected is generated in addition to the particular direction.
The presence of a null beam in the rightward direction in FIG. 4 causes the following problem. In the case of obtaining the observation signals by using a plurality of directional microphones each illustrated in FIG. 3 (namely, two sound collection devices being regarded as one microphone), separating the observation signals with the ICA, and projecting the separation results back to the directional microphones, the projection-back results become substantially null for the separation result corresponding to the sound source (sounds B) present on the right side of the directional microphone.
Further, a large difference in gain in the direction of the sounds C depending on frequency causes the following problem. When the separation result corresponding to the sounds C is projected back to the directional microphone illustrated in FIG. 4, signals are produced such that a component of 300 Hz is intensified in comparison with components of 100 Hz and 1000 Hz, while a component of 6000 Hz is suppressed.
With the method described in Japanese Unexamined Patent Application Publication No. 2007-295085, the problem of distortion in frequency components is avoided by radially arranging microphones each having directivity in the forward direction, and by previously selecting one of the microphones, which is oriented closest to the direction toward each sound source. In order to simultaneously minimize the influence of the distortion and obtain the observation signals differing in the mixing degree to a large extent, however, microphones each having a sharp directivity in the forward direction are to be installed in directions as many as possible.
(2) Combined Use with DOA (Direction of Arrival) Estimation and Source Position Estimation
The DOA (Direction of Arrival) estimation is to estimate from which direction sounds arrive at each microphone. Also, specifying the positions of each sound source in addition to the DOA is called “source position estimation”. The DOA estimation and the source position estimation are common to the ICA in terms of using a plurality of microphones. However, the microphone arrangement optimum for those estimations is not equal to that optimum for the ICA in all cases. For that reason, a contradictory dilemma may occur in the microphone arrangement in a system aiming to perform both the source separation and the DOA estimation (or the source position estimation).
The following description is made about methods for executing the DOA estimation and the source position estimation and then about the problem occurred when those estimations are combined with the ICA.
A method of estimating the DOA after projecting the separation result of the ICA back to individual microphones will be described with reference to FIG. 5. This method is the same as a method described in Japanese Patent No. 3881367.
Consider an environment in which two microphones 502 and 503 are installed at an interval (distance) d between them. It is assumed that a separation result Yk(ω,t) 501, illustrated in FIG. 5, represents the separation result for one sound source, which has been obtained by executing a separation process on mixture signals from a plurality of sound sources. The results of projecting the separation result Yk(ω,t) 501 back to the microphone i (denoted by 502) and the microphone i′ (denoted by 503) illustrated in FIG. 5 are assumed to be Yk[i](ω,t) and Yk[i′](ω,t), respectively. When the distance between the sound source and each microphone is much larger than the distance dii′ between the microphones, sound waves can be regarded as being approximate to plane waves, the difference between the distance from the sound source Yk(ω,t) to the microphone i and the distance from the same source to the microphone i′ can be expressed by dii′ cos θkii′, That distance difference provides a path difference 505 illustrated in FIG. 5. Note that θkii′ represents the DOA, namely it is an angle 504 formed by a segment interconnecting both the microphones and a segment extending from the sound source to a midpoint between the two microphones.
The DOA θkii′ can be determined by obtaining the phase difference between Yk[i](ω,t) and Yk[i′](ω,t) which are the projection-back results. The relationship between Yk[i](ω,t) and Yk[i′](ω,t), i.e., the projection-back results, is expressed by the following formula [5.1]. Formulae for calculating the phase difference are expressed by the following formulae [5.2] and [5.3].
                                          Y            k                          ⌊                              i                ′                            ⌋                                (                                          ⁢                      ,            t                    )                ⁢                  exp          ⁡                      (                                          -                jπ                            ⁢                                                ω                  -                  1                                                  M                  -                  1                                            ⁢                                                                    d                                          il                      ′                                                        ⁢                  cos                  ⁢                                                                          ⁢                                      θ                                          kli                      ′                                                                      C                            ⁢              F                        )                          ⁢                              Y            k                          ⌊              i              ⌋                                (                                          ⁢                      ,            t                    )                                    [        5.1        ]                                t        ⁢                  :                ⁢                                  ⁢        frame        ⁢                                  ⁢        number                                                            ω        ⁢                  :                ⁢                                  ⁢        frequency        ⁢                                  ⁢        bin        ⁢                                  ⁢        index                                                            M        ⁢                  :                ⁢                                  ⁢        total        ⁢                                  ⁢        number        ⁢                                  ⁢        of        ⁢                                  ⁢        frequency        ⁢                                  ⁢        bins                                                            f        ⁢                  :                ⁢                                  ⁢        imaginary        ⁢                                  ⁢        unit                                                                                                                angle                ⁡                                  (                                                                                    Y                        k                                                  ⌈                          i                          ⌉                                                                    (                                                                                          ⁢                                              ,                        t                                            )                                                                                      Y                        k                                                  [                                                      i                            ′                                                    ]                                                                    (                                                                                          ⁢                                              ,                        t                                            )                                                        )                                            =                              angle                ⁡                                  (                                                                                    Y                        k                                                  ⌊                          i                          ⌋                                                                    (                                                                                          ⁢                                              ,                        t                                            )                                        ⁢                                                                                            Y                          k                                                      ⌊                                                          i                              ′                                                        ⌋                                                                          (                                                                                                  ⁢                                                  ,                          t                                                )                                            _                                                        )                                                                                                        =                              π                ⁢                                                      ω                    -                    1                                                        M                    -                    1                                                  ⁢                                                                            d                                              ii                        ′                                                              ⁢                    cos                    ⁢                                                                                  ⁢                                          θ                                              kii                        ′                                                                              C                                ⁢                F                                                                        [        5.2        ]                                                      θ                          kii              ′                                ⁡                      (            ω            )                          =                  acos          ⁡                      (                                                                                (                                          M                      -                      1                                        )                                    ⁢                  C                                                                      π                    ⁡                                          (                                              ω                        -                        1                                            )                                                        ⁢                                      d                                          ii                      ′                                                        ⁢                  F                                            ⁢                              angle                ⁡                                  (                                                                                    Y                        k                                                  [                          i                          ]                                                                    (                                                                                          ⁢                                              ,                        t                                            )                                        ⁢                                                                                            Y                          k                                                      [                                                          i                              ′                                                        ]                                                                          (                                                                                                  ⁢                                                  ,                          t                                                )                                            _                                                        )                                                      )                                              [        5.3        ]                                =                  acos          ⁡                      (                                                                                (                                          M                      -                      1                                        )                                    ⁢                  C                                                                      π                    ⁡                                          (                                              ω                        -                        1                                            )                                                        ⁢                                      d                                          ii                      ′                                                        ⁢                  F                                            ⁢                              angle                ⁡                                  (                                                                                    B                        ik                                            ⁡                                              (                                                                                                  )                                                              ⁢                                                                                            B                                                                                    i                              ′                                                        ⁢                            k                                                                          ⁡                                                  (                                                                                                          )                                                                    _                                                        )                                                      )                                              [        5.4        ]            
In the above formulae;                angle( ) represents a phase of a complex number, and        a cos( ) represents an inverse function of cos( )        
As long as the projection-back is performed by using the above-described formula [4.1], the phase difference is given by a value not depending on the frame number t, but depending on only the separation matrix W(ω). Therefore, the formula for calculating the phase difference can be expressed by a formula [5.4].
On the other hand, Japanese Patent Application No. 2008-153483, which has been previously filed by the same applicant as in this application, describes a method of calculating the DOA without using an inverse matrix. A covariance matrix Σxy(ω) between the observation signals X(ω,t) and the separation results Y(ω,t) has properties analogous to those of the inverse of the separation matrix, i.e., W(ω)−1, in terms of calculating the DOA. Accordingly, by calculating the covariance matrix Σxy(ω) as expressed in the following formula [6.1] or [6.2], the DOA θkii′ can be calculated based on the following formula [6.4]. In the formula [6.4], σik(ω) represents each component of Σxy(ω) as seen from a formula [6.3]. By using the formula [6.4], calculations of the inverse matrix are no longer necessary. Further, in a system running in real time, the DOA can be updated at a shorter interval (frame by frame at minimum) than in the case using the separation matrix based on the ICA.
                                          ∑            XY                    ⁢                      (            ω            )                          =                              〈                                          X                ⁡                                  (                                      ω                    ,                    t                                    )                                            ⁢                                                Y                  ⁡                                      (                                          ω                      ,                      t                                        )                                                  H                                      〉                    t                                    [        6.1        ]                                =                                            〈                                                X                  ⁡                                      (                                          ω                      ,                      t                                        )                                                  ⁢                                                      X                    ⁡                                          (                                              ω                        ,                        t                                            )                                                        H                                            〉                        t                    ⁢                                    W              (                                                          ⁢                              ,                t                            )                        H                                              [        6.2        ]                                                      ∑            XY                    ⁢                      (            ω            )                          =                  [                                                                                          σ                    11                                    ⁡                                      (                    ω                    )                                                                              …                                                                                  σ                                          1                      ⁢                      n                                                        ⁡                                      (                    ω                    )                                                                                                      ⋮                                            ⋱                                            ⋮                                                                                                          σ                                          n                      ⁢                                                                                          ⁢                      1                                                        ⁡                                      (                    ω                    )                                                                              …                                                                                  σ                    nn                                    ⁡                                      (                    ω                    )                                                                                ]                                    [        6.3        ]                                                      θ                          kii              ′                                ⁡                      (            ω            )                          =                  acos          ⁡                      (                                                                                (                                          M                      -                      1                                        )                                    ⁢                  C                                                                      π                    ⁡                                          (                                              ω                        -                        1                                            )                                                        ⁢                                      d                                          ii                      ′                                                        ⁢                  F                                            ⁢                              angle                ⁡                                  (                                                                                    σ                        ik                                            ⁡                                              (                                                                                                  )                                                              ⁢                                                                                            σ                                                                                    i                              ′                                                        ⁢                            k                                                                          ⁡                                                  (                                                                                                          )                                                                    _                                                        )                                                      )                                              [        6.4        ]            
A method of estimating the source position from the DOA will be described below. Basically, once the DOA is determined for each of plural microphone pairs, the source position is also determined based on the principle of triangulation. See Japanese Unexamined Patent Application Publication No. 2005-49153, for example, regarding the source position estimation based on the principle of triangulation. The source position estimation will be described in brief below with reference to FIG. 6.
Microphones 602 and 603 are the same as the microphones 502 and 503 in FIG. 5. It is assumed that the DOA θkii′ is already determined for each microphone pair 604 (including 602 and 603). Considering a cone 605 having an apex that is positioned at a midpoint between the microphones 602 and 603 and having an apical angle half of which is equal to θkii′, the sound source exists somewhere on the surface of the cone 605. The source position can be estimated by obtaining respective cones 605 to 607 for the microphone pairs in a similar manner, and by determining a point of intersection of those cones (or a point where the surfaces of those cones come closest to one another). The forgoing is the method of estimating the source position based on the principle of triangulation.
Problems with the microphone arrangement in both the ICA and the DOA estimation (or the source position estimation) will be described below. The problems primarily reside in the following three points.                a) Number of microphones        b) Interval between microphones        c) Microphone changing in its positiona) Number of Microphones        
Comparing the computational cost of the DOA estimation or the source position estimation with the computational cost of the ICA, the latter is much higher. Also, because the computational cost of the ICA is proportional to the square of the number n of microphones, the number of microphones may be restricted in some cases in view of an upper limit of the computational cost. As a result, the number of microphones necessary for the source position estimation, in particular, is not available in some cases. In the case of the number of microphone=2, for example, it is possible to separate two sound sources at most, and to estimate that each sound source exists on the surface of a particular cone. However, it is difficult to specify the source position.
b) Interval Between Microphones
To estimate the source position with high accuracy in the source position estimation, it is desired that the microphone pairs are positioned away from each other, for example, on substantially the same order as the distance between the sound source and the microphone. Conversely, two microphones constituting each microphone pair are desirably positioned so close to each other that a plane-wave assumption is satisfied.
In the ICA, however, using two microphones away from each other may be disadvantageous in some cases from the viewpoint of separation accuracy. Such a point will be described below.
Separation based on the ICA in the time-frequency domain is usually realized by forming a null beam (direction in which the gain becomes 0) in each of directions of interference sounds. In the environment of FIG. 1, for example, the separation matrix for separating and extracting the sound source 1 is obtained by forming the null beams in the directions toward the sources 2 to N, which are generating the interference sounds, so that signals in the direction toward the sound source 1, i.e., objective sounds, remain eventually.
Null beams can be formed at most n−1 (n: the number of microphones) in lower frequencies. In frequencies above C/(2d) (C: sound speed, and d: interval between the microphones), however, null beams are further formed in other directions than the predetermined ones due to a phenomenon called “spatial aliasing”. Looking at the directivity plot of 6000 Hz in FIG. 4, for example, null beams are formed in oblique directions, such as indicated by the sounds C, in addition to the sounds (indicated by B) incoming from the right side in the direction in which the sound collection devices are arrayed at the interval in FIG. 4 (i.e., incoming the rear side of the directional microphones). A similar phenomenon occurs in the separation matrix as well. As the distance d between the microphones increases, the spatial aliasing starts to generate at a lower frequency. Further, at a higher frequency, plural null beams are formed in other directions than the predetermined one. If any of the other directions of the null beams than the predetermined one coincides with the direction of the objective sounds, separation accuracy deteriorates.
Accordingly, the interval and the arrangement of the microphones used in the ICA are to be determined depending on a level of frequency up to which the separation is to be performed with high accuracy. In other words, the interval and the arrangement of the microphones used in the ICA may be contradictory to the arrangement of the microphones, which is necessary to ensure satisfactory accuracy in the source position estimation.
c) Microphone Changing in its Position
In the DOA estimation and the source position estimation, it is necessary that at least information regarding the relative positional relationship between the microphones is already known. In the source position estimation, absolute coordinates of each microphone are further necessary in addition to the relative position of the sound source with respect to the microphone when absolute coordinates of the sound source with respect to the fixed origin (e.g., the origin set at one corner of a room) are also estimated.
On the other hand, in the separation performed in the ICA, position information of the microphones is not necessary. (Although separation accuracy varies depending on the microphone arrangement, the position information of the microphones is not included in the formulae used for the separation and the learning). Therefore, the microphones used in the ICA may be not used in the DOA estimation and the source position estimation in some cases. Assume, for example, the case where the functions of the source separation and the source position estimation are incorporated in a TV set to extract user's utterance and to estimate its position. In that case, when the source position is to be expressed by using a coordinate system with a certain point of a TV housing (e.g., the screen center) being the origin, it is necessary that coordinates of each of microphones used in the source position estimation are known with respect to the origin. For example, if each microphone is fixed to the TV housing, the position of the microphone is known.
Meanwhile, from the viewpoint of source separation, an observation signal easier to separate is obtained by setting a microphone as close as possible to the user. Therefore, it is desired in some cases that the microphone is installed on a remote controller, for example, instead of the TV housing. However, when an absolute position of the microphone on the remote controller is not obtained, a difficulty occurs in determining the source position based on the separation result obtained from the microphone on the remote controller.
As described above, when the ICA (Independent Component Analysis) is performed as the source separation process in the related art, the ICA may be sometimes performed under the setting utilizing a plurality of directional microphones in the microphone arrangement optimum for the ICA.
As discussed above, however, when the separation results obtained as processing results utilizing directional microphones are projected back to the directional microphones, the problem of distortion of sounds provided by the separation results occurs because directivity of each directional microphone differs depending on frequency, as described above with reference to FIG. 4.
Further, the microphone arrangement optimum for the ICA is the optimum arrangement for the source separation, but it may be inappropriate for the DOA estimation and the source position estimation in some cases. Accordingly, when the ICA and the DOA estimation or the source position estimation are performed in a combined manner, processing accuracy may deteriorate in any of the source separation process and the DOA estimation or source position estimation process.