The present invention relates to a noise suppress processing apparatus for suppressing noise and extracting target speech, using a plurality of microphones.
Since there are various noise sources in noisy environments, it is difficult to avoid noise which gets mixed from surrounding noise sources upon receiving a speech signal by a microphone. However, when a speech signal mixed with noise is reproduced, the speech becomes hard to discern. Therefore, a processing for reducing noise components is required.
As a conventional noise reduction processing technique for suppressing noise mixed in speech, a technique for suppressing noise using a plurality of microphones is known. Such microphone processing techniques have been studied and developed by many researchers for the purpose of speech input in a speech recognition apparatus, teleconference apparatus, and the like. Of these techniques, as for a microphone array using an adaptive beam former processing technique which can obtain great effects by a smaller number of microphones, various methods such as a generalized sidelobe canceller (GSC), frost type beam former, reference signal method, and the like are available, a described in reference 1 (The Institute of Electronics, Information and Communication Engineers (ed.), xe2x80x9cAcoustic System and Digital Processingxe2x80x9d) or reference 2 (Heykin, xe2x80x9cAdaptive Filter Theoryxe2x80x9d (Prentice Hall)).
Note that the adaptive beam former processing suppresses noise by a filter which makes a dead angle with the arrival direction of noise.
However, in this adaptive beam former processing technique, if the arrival direction of an actual target signal does not coincide with the assumed arrival direction, that target signal is determined as noise and removed, thus deteriorating performance.
To solve this problem, a technique which allows certain offset between the assumed and actual arrival directions has been developed, as disclosed in reference 3 (Hojuzan et al., xe2x80x9cRobust Global Sidelobe Canceller using Leak Adaptive Filter in Blocking Matrixxe2x80x9d, Journal of The Institute of Electronics, Information and Communication Engineers A, Vol. J79-A, No. 9, pp. 1516 to 1524 (1996. 9)). However, in this case, removal of a target signal can be suppressed, but the target signal may be distorted due to the offset between the assumed and actual arrival directions.
By contrast, a method of tracking the direction of a speaker and reducing distortion of a target signal by detecting the speaker direction as needed and correcting the input direction of a beam former in the detected direction using a plurality of beam formers has been disclosed in, e.g., Jpn. Pat. Appln. KOKAI Publication No. 9-9794.
However, since the method disclosed in Jpn. Pat. Appln. KOKAI Publication No. 9-9794 executes adaptive filter processing in the time domain, the filter coefficients in the time domain must be converted into those in the frequency domain upon estimating the speaker direction on the basis of the filter coefficients, resulting in a large computation amount.
As a technique for suppressing noise mixed in speech, an adaptive beam former processing technique which receives speech or an utterance of a speaker using a plurality of microphones, and suppresses noise component by filtering the received speech using a filter which makes a dead angle with the arrival direction of noise is known.
In the adaptive beam former processing technique, when the arrival direction of an actual target signal, i.e., the direction where a speaker is present, is different from the assumed arrival direction, the target signal is determined as noise and is removed.
To solve this problem, a technique which allows certain offset between the assumed and actual arrival directions has been developed. However, in this case, removal of a target signal can be suppressed, but the target signal may be distorted due to the offset between the assume and actual arrival directions. Hence, a problem which pertains to the quality of the obtained speech remains unsolved.
Also, a method of tracking the direction of a speaker and reducing distortion of a target signal by sequentially detecting the speaker direction and correcting the input direction of a beam former to make it coincide with the detected direction using a plurality of beam formers has been proposed. However, since this method executes adaptive filter processing in the time domain, the filter coefficients in the time domain must be converted into those in the frequency domain upon estimating the speaker direction on the basis of the filter coefficients, resulting in a large computation amount.
Therefore, the conventional techniques have both merits and demerits, and development of a beam former processing technique which can collect a high-quality target signal, and can shorten the processing time has been demanded.
It is an object of the present invention to provide a noise suppress processing apparatus and method, which can greatly reduce the computation amount using a beam former which operates in the frequency domain.
According to the first aspect of the present invention, there is provided a noise suppression apparatus for independently outputting speech frequency components and noise frequency components, comprising a speech input section which receives speech uttered by a speaker at different positions and generates speech signals corresponding to the different positions, a frequency analyzer section which frequency-analyzes the speech signals in units of channels of the speech signals to output frequency components for a plurality of channels, a first beam former processor section which suppresses arrival noise other than a target speech by adaptive filtering using the frequency components for the plurality of channels to output the target speech, a second beam former processor section which suppresses the target speech by adaptive filtering using the frequency components for the plurality of channels to outputting noise, a noise direction estimating section which estimates a noise direction from filter coefficients calculated by the first beam former processor section, a target speech direction estimating section which estimates a target speech direction from filter coefficients calculated by the second beam former processor section, a target speech direction correcting section which corrects a first input direction as an arrival direction of the target speech to be input in the first beam former processor section on the basis of the target speech direction estimated by the target speech direction estimating section, as needed, and a noise direction correcting section which corrects a second input direction as an arrival direction of noise to be input in the second beam former processor section on the basis of the noise direction estimated by the noise direction estimating section, as needed.
According to the second aspect of the present invention, there is provided a noise suppression apparatus for independently outputting speech frequency components and noise frequency components, comprising a speech input section which receives speech uttered by a speaker at least at two different position and generates speech signals corresponding to the speech receiving positions in units of channels, a frequency analyzer section which frequency-analyzes the speech signals and outputs frequency components for a plurality of channels, a first beam former processor section which executes arrival noise suppression processing for suppressing speech components other than speech from a speaker direction to obtain a target speech component, the noise suppression processing being performed by adaptive filtering of the frequency components for the plurality of channels obtained by the frequency analyzer section, using filter coefficients which are calculated to decrease sensitivity levels in directions other than a desired direction, a second beam former processor section which executes second speech suppression processing for suppressing the speech from the speaker direction to obtain a first noise component, the speech suppression processing being performed by adaptive filtering of the frequency components for the plurality of channels obtained by the frequency analyzer section, using filter coefficients which are calculated to decrease sensitivity levels in directions other than a desired direction, a third beam former processor section which executes second speech suppression processing for suppressing the speech from the speaker direction to obtain a second noise component, the second speech suppression processing being performed by adaptive filtering of the frequency components for the plurality of channels obtained by the frequency analyzer section, using filter coefficients which are calculated to decrease sensitivity levels in directions other than a desired direction, a noise direction estimating section which estimates a noise direction from the filter coefficients calculated by the first beam former processor section, a first target speech direction estimating section which estimates a first target speech direction from the filter coefficients calculated by the second beam former processor section, a second target speech direction estimating section which estimates a second target speech direction from the filter coefficients calculated by the third beam former processor section, a first input direction correcting section which corrects a first input direction as an arrival direction of target speech to be input in the first beam former processor section on the basis of at least one of the first target speech direction estimated by the first target speech direction estimating section and the second target speech direction estimated by the second target speech direction estimating section, as needed, a second input direction correcting section which, when the noise direction estimated by the noise direction estimating section falls with a predetermined first range, corrects a second input direction as an arrival direction of noise to be input in the second beam former processor section on the basis of the noise direction, as needed, a third input direction correcting section which, when the noise direction estimated by the noise direction estimating section falls with a predetermined second range, corrects a second input direction as an arrival direction of noise to be input in the third beam former processor section on the basis of the noise direction, as needed, and an effective noise determination section which determines one of the first and second output noise components as true noise output components on the basis of whether the noise direction estimated by the noise direction estimating section falls within the predetermined first or second range and outputs the determined output noise component, and at the same time, determines which estimation result of the first and second speech direction estimating sections is effective and outputs the determined speech direction estimation result to the first input direction correcting section.
According to the third aspect of the present invention, there is provided a noise suppression method for independently outputting speech frequency components and noise frequency components, as needed, comprising the steps of receiving speech uttered by a speaker at different positions to obtain speech signals of different channels, frequency-analyzing the speech signals in units of channels to obtain frequency spectrum components in units of channels, suppressing arrival noise other than a target speech by adaptive filtering using the frequency spectrum components in units of channels obtained in the frequency analyzing step, to output the target speech, suppressing the target speech by adaptive filtering using the frequency components in units of channels to obtain noise components, estimating a noise direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing arrival noise, estimating a target speech direction from filter coefficients used in adaptive filtering and calculated in the step of suppressing the target speech, correcting a first input direction as an arrival direction of the target speech to be input in the step of suppressing arrival noise on the basis of the target speech direction estimated in the step of estimating a target speech direction, as needed, and correcting a second input direction as an arrival direction of noise to be input in the step of suppressing the target speech on the basis of the noise direction estimated by the step of estimating a noise direction, as needed.
According to the fourth aspect of the present invention, there is provided a noise suppression method comprising the steps of receiving speech uttered by a speaker at different positions to obtain speech signals of different channels, frequency-analyzing speech signals in units of channels to obtain frequency spectrum components in units of channels, executing arrival noise suppression processing for suppressing speech components other than speech from a speaker direction to obtain target speech components, the arrival noise suppression processing being performed by adaptive filtering of the frequency spectrum components for the plurality of channels obtained in units of channels in the frequency analyzing step, using filter coefficients which are calculated to decrease sensitivity levels in directions other than a desired direction, executing first speech suppression processing for suppressing the speech from the speaker direction to obtain first noise components, the first speech suppression processing being performed by adaptive filtering of the frequency components for the plurality of channels using the frequency components obtained in units of channels in the frequency analyzing step, using filter coefficients which are calculated to decrease sensitivity levels in directions other than a desired direction, executing second speech suppression processing for suppressing the speech from the speaker direction to obtain first noise components, the second speech suppression processing being performed by adaptive filtering of the frequency spectrum components for the plurality of channels obtained in units of channels in the frequency analyzing step, using filter coefficients which are calculated to decrease sensitivity levels in directions other than a desired direction, estimating a noise direction from the filter coefficients calculated in the step of suppressing arrival noise suppression processing, estimating a first target speech direction from the filter coefficients calculated in the step of executing first speech suppression processing, estimating a second target speech direction from the filter coefficients calculated in the step of executing second speech suppression processing, correcting a first input direction as an arrival direction of target speech to be input in the step of executing arrival noise suppression processing on the basis of at least one of the first target speech direction and the second target speech direction, as needed, correcting a second input direction as an arrival direction of noise to be input in the step of executing first suppression processing on the basis of the noise direction estimated in the noise direction estimating step, as needed, when the noise direction falls with a predetermined first range, correcting a second input direction as an arrival direction of noise to be input in the step of executing second speech suppression processing on the basis of the noise direction, as needed, when the noise direction falls with a predetermined second range, and determining one of the first and second output noise components as true noise output components on the basis of whether the noise direction estimated in the noise direction estimating step falls within the predetermined first or second range and outputting the determined output noise component, and at the same time, determining that estimation result in the first and second speech direction estimating steps is effective and outputting the determined speech direction estimation result as a speech direction estimation result to be used in the first input direction correcting step.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.