The present invention relates to speech recognition, particularly in environments such as automobile interiors or inside homes, where ambient or environmental noise (such as from sound-generating electronic devices) may present a problem for speech recognition.
Speech recognition can be employed to perform various non-critical tasks inside an automobile, or at home. For example, in either environment, speech recognition could be utilized in increasing or decreasing the volume of a music system, tuning to a radio channel, or dialing a phone number using voice command. However, the performance of speech recognizers in such situations is usually limited by several factors. Primarily, since it is generally inconvenient to place a microphone very close to the mouth of a person whose speech is to be recognized, the microphones will be prone to pick up ambient sounds as well. These sounds could come from any of a wide variety of sources, such as music from a car radio or cassette player or CD within the car, or from a television in the home.
Accordingly, a need has oft been recognized in connection with suppressing or removing ambient sounds from speech that is to be recognized, to thus enhance the performance of the speech recognizer that processes the speech input. Previously, microphone arrays have been used towards this purpose (such as those manufactured by Andrea Electronics of Melville, N.Y.) by enhancing speech input and suppressing ambient noises. A general discussion on the function of microphone arrays can be found in R. A. Monzingo and T. W. Miller., Introduction to Adaptative Arrays (John Wiley and Sons, New York; Wiley Interscience Publications, 1980). However, it has been found that the effectiveness of such arrangements is often limited. Thus, a need has also been recognized in connection with improving upon the performance of such microphone arrays.
The present invention, in accordance with at least one presently preferred embodiment, is directed towards removing ambient noise from speech signals that are typically acquired through a microphone.
In one aspect, the invention involves:
(1) Capturing the speech signal through a microphone, and optionally converting it to digital form using an A/D converter.
(2) Capturing the unwanted sources of noise or music signal (that are picked up by the microphone as well) in its pure form, and optionally converting them into digital form using an A/ID converter.
(3) Applying a filter to each of the unwanted signals, to get the estimated unwanted signal that would be picked up by the microphone.
(4) Subtracting the estimates of the unwanted signals from the microphone signal, to get a clean speech signal that has almost no unwanted signal.
(It should be noted that since, in speech recognition, the software [or other medium, such as an electronic chip] usually analyzes not the speech signal itself, but certain xe2x80x9cfeaturesxe2x80x9d or xe2x80x9cparametersxe2x80x9d of the speech signal, it is conceivable to provide a scheme in which, instead of applying step 3 above, one would transform the original speech into features, [such as filterbank energies] and then apply step 4 in the transformed feature space.)
In another aspect, the step of applying a filter to each of the unwanted signals may comprise the steps of:
(1) Artificially creating an environment where only one of the unwanted sources is present, and in which there is no speech. Both the microphone signal, and the source signal are captured and stored for a certain length of time. This process is repeated for all the potential sources (for example the four speakers of the car stereo system). If there is a control on the nature of the noise source (e.g. if it is through a loudspeaker), preferably, white noise is preferred.
(2) Adaptive filter estimation techniques, such as Least Mean Square (LMS), or Recursive Least Squares (RLS) or their variants such as NLMS, or sub-band LMS are used to estimate filter parameters for each of the noise source.
(3) An optional step of incrementally modifying all the filter parameters while the system is operational, and removing noise from the microphone pickup.
In one aspect, the present invention provides an apparatus for providing speech recognition, the apparatus comprising: a first input medium which receives speech input; at least one second input medium which receives ambient input from at least one source separate from the speech input; and an arrangement for reconciling the speech input with the ambient input so as to provide clean speech output.
In another aspect, the present invention provides a method of providing speech recognition, the method comprising the steps of: receiving speech input; receiving ambient input from at least one source separate from the speech input; and reconciling the speech input with the ambient input so as to provide clean speech output.
Furthermore, in another aspect, the present invention provides a program storage device readable by machine, tangibly embodying a program of instructions executable by the machine to perform method steps for providing speech recognition, the method comprising the steps of: receiving speech input; receiving ambient input from at least one source separate from the speech input; and reconciling the speech input with the ambient input so as to provide clean speech output.
For a better understanding of the present invention, together with other and further features and advantages thereof, reference is made to the following description, taken in conjunction with the accompanying drawings, and the scope of the invention will be pointed out in the appended claims.