1. Field of the Invention
The present invention relates to an audio signal separation device and a method thereof, which separate plural signals mixed in an audio signal, from one another, by independent component analysis (ICA).
2. Description of the Related Art
In the field of signal processing, attention has been paid to a method of independent component analysis in which original signals are separated and restored when plural original signals are linearly mixed up by an unknown coefficient. If this independent component analysis is applied to audio signals, for example, voices simultaneously spoken by plural speakers can be observed by plural microphones, and the observed voices can then be separated for respective speakers or into noise and voices.
Referring to FIG. 1, a description will now be made of a case of separating respective signals from an audio signal in which plural signals are mixed up, by use of the independent component analysis in a time-frequency domain. The independent component analysis in a time-frequency domain is a method in which signals observed by plural microphones are transformed into signals in a time-frequency domain (spectrograms) by short-time Fourier transformation, and separation is conducted in the time-frequency domain (see Non-Patent Document 1:“Guide/independent Component Analysis” written by Noboru Murata, Tokyo Denki University Press).
Suppose that there are n original signals s1 to sn which are generated by n sound sources and are independent from one another and that a vector with these signals as elements thereof. Observation signals observed by microphones each are a mixture of the plural original signals. Suppose that x1 to xn are signals observed by n microphones and x is a vector with these observation signals as elements thereof. FIG. 2A shows an example of an observation signal x where the number n of microphones is two, i.e., the number of channels is two. Next, short-time Fourier transformation is performed on the observation signal x to obtain an observation signal X in a time-frequency domain. Where elements of X are Xk(ω, t), Xk(ω, t) are complex numbers. A graph expressing absolute values of |Xk(ω, t)| of Xk(ω, t) by color shading is called a spectrogram. FIG. 2B shows an example of the spectrogram of the observation signal X. In this figure, t indicates the frame number (1≦t≦T), and ω indicates the number of frequencies bin (1≦ω≦M). Subsequently, each frequency bin of the signal X is multiplied by a separation matrix W(ω) to obtain a separate signal Y′. FIG. 2C shows an example of a spectrogram of a separate signal Y′.
According to the independent component analysis in a time-frequency domain as described above, signal separation processing is performed for each frequency bin. No consideration is taken into the relationship between the frequencies bin one another. Therefore, separation destinations are often inconsistent although the separation is complete successfully. The inconsistent separation destinations appear, for example, as a phenomenon that a signal caused by s1 appears as Y1 where ω=1 while a signal caused by s2 appears as Y1 where ω=2. This phenomenon is also called permutation.
The problem of this permutation is solved by postprocessing of exchanging signals with one another for each frequency bin, to rearrange consistently the separation destinations. FIG. 2D shows an example of a spectrogram of a separate signal Y which has solved the problem of permutation. Finally, the separate signal Y is subjected to inverse Fourier transformation, to obtain a separate signal Y in time domain as shown in FIG. 2E.