1. Field of the Invention
The present invention relates to a method for recovering target speech by extracting estimated spectra of the target speech, while resolving permutation ambiguity based on shapes of amplitude distributions of split spectra that are obtained by use of the Independent Component Analysis (ICA).
2. Description of the Related Art
A number of methods for separating a noise from a speech signal have been proposed by using blind signal separation through the ICA. (See, for example, “Adaptive Blind Signal and Image Processing” by A. Cichoki and S. Amari, first edition, USA, John Wiley, 2002; and “Independent Component Analysis: Algorithms and Applications” by A. Hyvarinen and E. Oja, Neural Networks, USA, Pergamon Press, June 2000, Vol. 13, No. 4-5, pp. 411-430.) The frequency-domain ICA has an advantage of providing good convergence as compared to the time -domain ICA. However, in the frequency-domain ICA, problems associated with the ICA-specific scaling or permutation ambiguity exist at each frequency bin of the separated signals, and all these problems need to be resolved in the frequency domain.
Examples addressing the above issues include a method wherein the scaling problems are resolved by use of split spectra and the permutation problems are resolved by analyzing the envelop curve of a split spectrum series at each frequency. This is referred to as the envelop method. (See, for example, “An Approach to Blind Source Separation based on Temporal Structure of Speech Signals” by N. Murata, S, Ikeda, and A. Ziehe, Neurocomputing, USA, Elsevier, October 2001, Vol. 41, No. 1-4, pp. 1-24.)
However, the envelope method is often ineffective depending on sound collection conditions. Also, the correspondence between the separated signals and the sound sources (speech and a noise) is ambiguous in this method; therefore, it is difficult to identify which one of the resultant split spectra after permutation correction corresponds to the target speech or to the noise. For this reason, specific judgment criteria need to be defined in order to extract the estimated spectra for the target speech as well as for the noise from the split spectra.