1. Field of the Invention
The present invention relates to a speech recognition system and a target speech extraction method of the system, and more particularly, to a microphone-array-based speech recognition system using a blind source separation (BSS) scheme and a target speech extraction method in the system.
This work was supported by the IT R&D program of MIC/IITA [2006-S-036-02, Development of large vocabulary/interactive distributed/embedded VUI for new growth engine industries]
2. Description of the Related Art
Recently, in a speech recognition technology, tens of thousand words can be recognized with a high speech recognition rate of 95% or more in a silent environment. However, in an actual environment where various types of noise exist, the speech recognition rate is drastically decreased. Therefore, for commercialization of the speech recognition technology, it is necessary to obtain a higher speech recognition rate even in an actual environment.
In addition, various noise processing methods for pre-processing, recognition, and post-processing stages of speech recognition have been researched and developed. However, a method adapted for all kinds of noise has not been developed.
In addition, microphone-array-based blind source separation (hereinafter, referred to as BSS) methods capable of separating speech signals by using two or more microphones have been actively researched. As an important method among the BSS methods, there is an independent component analysis (hereinafter, referred to as ICA). According to the ICA technology, interference signals or noises originated from a neighboring speaker, a TV, or a radio around a speech input apparatus such as a speech recognition unit, a telephone, and a mobile phone can be effectively reduced or removed. That is, in a case where N sound sources including an input speech and M microphones exist, if the numbers M and N are approximately equal to each other, the N sound-source signals can be recovered from the M microphone input signals.
However, the ICA technology has a problem in that the order of the N sound-source signals separated by using the ICA may be arbitrarily changed.
In a conventional ICA technology, the mixed signals are generated by multiplying the sound-source signals with arbitrary weighting factors in a time domain and adding the weighted sound-source signals, and speech recognition is performed by extracting the original sound-source signals from the mixed signal. Recently, due to development of the ICA technology, the original sound source signals can be extracted even in a case where reverberating sounds actually exist in a room.
However, in the recent ICA technology, a method of automatically identifying sources from which the separated sound-source signals are originated cannot be developed. Therefore, in order to perform the speech recognition, a target speech which is to be input to a speech recognition system is needed to be automatically identified.