1. Field
This disclosure relates to audio signal processing.
2. Background
Many activities that were previously performed in quiet office or home environments are being performed today in acoustically variable situations like a car, a street, or a café. For example, a person may desire to communicate with another person using a voice communication channel. The channel may be provided, for example, by a mobile wireless handset or headset, a walkie-talkie, a two-way radio, a car-kit, or another communications device. Consequently, a substantial amount of voice communication is taking place using portable audio sensing devices (e.g., smartphones, handsets, and/or headsets) in environments where users are surrounded by other people, with the kind of noise content that is typically encountered where people tend to gather. Such noise tends to distract or annoy a user at the far end of a telephone conversation. Moreover, many standard automated business transactions (e.g., account balance or stock quote checks) employ voice-recognition-based data inquiry, and the accuracy of these systems may be significantly impeded by interfering noise.
For applications in which communication occurs in noisy environments, it may be desirable to separate a desired speech signal from background noise. Noise may be defined as the combination of all signals interfering with or otherwise degrading the desired signal. Background noise may include numerous noise signals generated within the acoustic environment, such as background conversations of other people, as well as reflections and reverberation generated from the desired signal and/or any of the other signals. Unless the desired speech signal is separated from the background noise, it may be difficult to make reliable and efficient use of it. In one particular example, a speech signal is generated in a noisy environment, and speech processing methods are used to separate the speech signal from the environmental noise.
Noise encountered in a mobile environment may include a variety of different components, such as competing talkers, music, babble, street noise, and/or airport noise. As the signature of such noise is typically nonstationary and close to the user's own frequency signature, the noise may be hard to model using traditional single microphone or fixed beamforming type methods. Single-microphone noise reduction techniques typically require significant parameter tuning to achieve optimal performance. For example, a suitable noise reference may not be directly available in such cases, and it may be necessary to derive a noise reference indirectly. Therefore multiple-microphone based advanced signal processing may be desirable to support the use of mobile devices for voice communications in noisy environments.