1. Field of the invention
The present invention relates to a sound-source separation system.
2. Description of the Related Art
In order to realize natural human-robot interactions, it is indispensable to allow a user to speak while a robot is speaking (barge-in). When a microphone is attached to a robot, since the speech of the robot itself enters the microphone, barge-in becomes a major impediment to recognizing the other's speech.
Therefore, an adaptive filter having a structure shown in FIG. 4 is used. Removal of self-speech is treated as a problem of estimating a filter h^, which approximates a transmission system h from a loudspeaker S to a microphone M. An estimated signal y^(k) is subtracted from an observed signal y(k) input from the microphone M to extract the other's speech.
An NLMS (Normalized Least Mean Squares) method has been proposed as one of adaptive filters. According to the NLMS method, the signal y(k) observed in the time domain through a linear time-invariant transmission system is expressed by Equation (1) using convolution between an original signal vector x(k)=t(x(k), x(k−1), . . . , x(k−N+1)) (where N is the filter length and t is transpose) and impulse response h=t(h1, h2, . . . hN) of the transmission system.y(k)=tx(k)h  (1)
The estimated filter h^=t(h1^, h2^, . . . , hN^) is obtained by minimizing the root mean square of an error e(k) between the observed signal and the estimated signal expressed by Equation (2). An online algorithm for determining the estimated filter h^ is expressed by Equation (3) using a small integer value for regularization. Note that an LSM method is the case that the learning coefficient is not regularized by ∥x(k)∥2+δ in Equation (3).e(k)=y(k)−tx(k)h^  (2)h^(k)=h^(k−1)+μNLMSx(k)e(k)/(∥x(k)∥2+δ)  (3)
An ICA (Independent Component Analysis) method has also been proposed. Since the ICA method is designed to assume noise, it has the advantage that detection of noise in a self-speech section is unnecessary and noise is separable even if it exists. Therefore, the ICA method is suitable for addressing the barge-in problem. For example, a time-domain ICA method has been proposed (see J. Yang et al., “A New Adaptive Filter Algorithm for System Identification Using Independent Component Analysis,” Proc. ICASSP2007, 2007, pp. 1341-1344). A mixing process of sound sources is expressed by Equation (4) using noise n(k) and N+1th matrix A:t(y(k),tx(k))=At(n(k),tx(k)),Aii=1 (i=1, . . . , N+1), A1j=hj−1 (j=2, . . . , N+1),Aik=0 (k≠i).
According to the ICA, an unmixing matrix in Equation (5) is estimated:t(e(k),tx(k))=Wt(y(k),tx(k)),W11=a,Wii=1(i=2, . . . , N+1),W1j=hj(j=2, . . . , N+1), Wik=0(k≠i).  (5)
The case that an element W11 in the first row and the first column in the unmixing matrix W is a=1 is a conventional adaptive filter model, and this is the largest difference from the ICA method. K-L information is minimized using a natural gradient method to obtain the optimum separation filter according to Equations (6) and (7) representing the online algorithm.h^(k+1)=h^(k)+μ1[{1−φ(e(k))e(k)}h^(k)−φ(e(k))x(k)]  (6)a(k+1)=a(k)+μ2[1−φ(e(k))e(k)]a(k)  (7)
The function φ is defined by Equation (8) using the density function px(x) of random variable e.φ(x)=−(d/dx)log px(x)  (8)
Further, a frequency-domain ICA method has been proposed (see S. Miyabe et al., “Double-Talk Free Spoken Dialogue Interface Combining Sound Field Control with SeMi-Blind Source Separation,” Proc. ICASSP2006, 2006, pp. 809-812). In general, since a convolutive mixture can be treated as an instantaneous mixture, the frequency-domain ICA method has better convergence than the time-domain ICA method. According to this method, short-time Fourier analysis is performed with window length T and shift length U to obtain signals in the time-frequency domain. The original signal x(t) and the observed signal y(t) are represented as X(ω,f) and Y(ω,f) using frame f and frequency ω as parameters, respectively. A separation process of the observed signal vector Y(ω,f)=t(Y(ω,f),X(ω,f)) is expressed by Equation (9) using an estimated original signal vector Y^(ω,f)=t(E(ω,f),X(ω,f)).Y^(ω,f)=W(ω)Y(ω,f), W21(ω)=0, W22(ω)=1  (9)
The learning of the unmixing matrix is accomplished independently for each frequency. The learning complies with an iterative learning rule expressed by Equation (10) based on minimization of K-L information with a nonholonomic constraint (see Sawada et al., “Polar Coordinate based Nonlinear Function for Frequency-Domain Blind Source Separation,” IEICE Trans., Fundamentals, Vol. E-86A, No. 3, March 2003, pp. 590-595).W(j+1)(ω)=W(j)(ω)−α{off-diag<φ(Y^)Y^H>}W(j)(ω),  (10)where α is the learning coefficient, (j) is the number of updates, <.> denotes an average value, the operation off-diagX replaces each diagonal element of matrix X with zero, and the nonlinear function φ(y) is defined by Equation (11).φ(yi)=tan h(|yi|)exp(iθ(yi))  (11)
Since the transfer characteristic from existing sound source to existing sound source is represented by a constant, only the elements in the first row of the unmixing matrix W are updated.
However, the conventional frequency-domain ICA method has the following problems. The first problem is that it is necessary to make the window length T longer to cope with reverberation, and this results in processing delay and degraded separation performance. The second problem is that it is necessary to change the window length T depending on the environment, and this makes it complicated to make a connection with other noise suppression techniques.
Therefore, it is an object of the present invention to provide a system capable of reducing the influence of sound reverberation or reflection to improve the accuracy of sound source separation.