1. Field of Invention
The invention relates generally to detecting and processing acoustic signal data and more specifically to reducing noise in acoustic systems.
2. Art Background
Acoustic systems employ acoustic sensors such as microphones to receive audio signals. Often, these systems are used in real world environments which present desired audio and undesired audio (also referred to as noise) to a receiving microphone simultaneously. Such receiving microphones are part of a variety of systems such as a mobile phone, a handheld microphone, a hearing aid, etc. These systems often perform speech recognition processing on the received acoustic signals. Simultaneous reception of desired audio and undesired audio have a negative impact on the quality of the desired audio. Degradation of the quality of the desired audio can result in desired audio which is output to a user and is hard for the user to understand. Degraded desired audio used by an algorithm such as in speech recognition (SR) or Automatic Speech Recognition (ASR) can result in an increased error rate which can render the reconstructed speech hard to understand. Either of which presents a problem.
Undesired audio (noise) can originate from a variety of sources, which are not the source of the desired audio. Thus, the sources of undesired audio are statistically uncorrelated with the desired audio. The sources can be of a non-stationary origin or from a stationary origin. Stationary applies to time and space where amplitude, frequency, and direction of an acoustic signal do not vary appreciably. For, example, in an automobile environment engine noise at constant speed is stationary as is road noise or wind noise, etc. In the case of a non-stationary signal, noise amplitude, frequency distribution, and direction of the acoustic signal vary as a function of time and or space. Non-stationary noise originates for example, from a car stereo, noise from a transient such as a bump, door opening or closing, conversation in the background such as chit chat in a back seat of a vehicle, etc. Stationary and non-stationary sources of undesired audio exist in office environments, concert halls, football stadiums, airplane cabins, everywhere that a user will go with an acoustic system (e.g., mobile phone, tablet computer etc. equipped with a microphone, a headset, an ear bud microphone, etc.) At times the environment the acoustic system is used in is reverberant, thereby causing the noise to reverberate within the environment, with multiple paths of undesired audio arriving at the microphone location. Either source of noise, i.e., non-stationary or stationary undesired audio, increases the error rate of speech recognition algorithms such as SR or ASR or can simply make it difficult for a system to output desired audio to a user which can be understood. All of this can present a problem.
Various noise cancellation approaches have been employed to reduce noise from stationary and non-stationary sources. Existing noise cancellation approaches work better in environments where the magnitude of the noise is less than the magnitude of the desired audio, e.g., in relatively low noise environments. Spectral subtraction is used to reduce noise in speech recognition algorithms and in various acoustic systems such as in hearing aids. Systems employing Spectral Subtraction do not produce acceptable error rates when used in Automatic Speech Recognition (ASR) applications when a magnitude of the undesired audio becomes large. This can present a problem.
In addition, existing algorithms, such as Spectral Subtraction, etc., employ non-linear treatment of an acoustic signal. Non-linear treatment of an acoustic signal results in an output that is not proportionally related to the input. Speech Recognition (SR) algorithms are developed using voice signals recorded in a quiet environment without noise. Thus, speech recognition algorithms (developed in a quiet environment without noise) produce a high error rate when non-linear distortion is introduced in the speech process through non-linear signal processing. Non-linear treatment of acoustic signals can result in non-linear distortion of the desired audio which disrupts feature extraction which is necessary for speech recognition, this results in a high error rate. All of which can present a problem.
Various methods have been used to try to suppress or remove undesired audio from acoustic systems, such as in Speech Recognition (SR) or Automatic Speech Recognition (ASR) applications for example. One approach is known as a Voice Activity Detector (VAD). A VAD attempts to detect when desired speech is present and when undesired speech is present. Thereby, only accepting desired speech and treating as noise by not transmitting the undesired speech. Traditional voice activity detection only works well for a single sound source or a stationary noise (undesired audio) whose magnitude is small relative to the magnitude of the desired audio. Therefore, traditional voice activity detection renders a VAD a poor performer in a noisy environment. Additionally, using a VAD to remove undesired audio does not work well when the desired audio and the undesired audio are arriving simultaneously at a receive microphone. This can present a problem.
Acoustic systems used in noisy environments with a single microphone present a problem in that desired audio and undesired audio are received simultaneously on a single channel. Undesired audio can make the desired audio unintelligible to either a human user or to an algorithm designed to use received speech such as a Speech Recognition (SR) or an Automatic Speech Recognition (ASR) algorithm. This can present a problem. Multiple channels have been employed to address the problem of the simultaneous reception of desired and undesired audio. Thus, on one channel, desired audio and undesired audio are received and on the other channel an acoustic signal is received which also contains undesired audio and desired audio. Over time the sensitivity of the individual channels can drift which results in the undesired audio becoming unbalanced between the channels. Drifting channel sensitivities can lead to inaccurate removal of undesired audio from desired audio. Non-linear distortion of the original desired audio signal can result from processing acoustic signals obtained from channels whose sensitivities drift over time. This can present a problem.