This section is intended to introduce the reader to various aspects of art, which may be related to various aspects of the present disclosure that are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light, and not as admissions of prior art.
An audio communication, especially a wireless communication, might be taken in a noisy environment, for example, on a street with high traffic or in a bar. In this case, it is often very difficult for one party in the communication to understand the speech due to a background noise. It is therefore an important topic in the audio communication to suppress the undesirable background noise and at the same time to keep the target speech, which will be beneficial to enhance the speech intelligibility.
There is a far-end implementation of the noise suppression where the suppressing is implemented on the communication device of the listening person and a near-end implementation where it is implemented on the communication device of the speaking person. It can be appreciated that the mentioned communication device of either the listening or the speaking person can be a smart phone, a tablet, etc. From the commercial point of view the far-end implementation is more attractive.
The prior art comprises a number of known solutions that provide noise suppression for an audio communication.
One of the known solutions in this respect is called speech enhancement. One exemplary method was discussed in the reference written by Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean square error short-time spectral amplitude estimator.” IEEE Trans. Acoust. Speech Signal Process. 32, 1109-1121, 1984 (hereinafter referred to as reference 1). However, such solutions of speech enhancement have some disadvantages. Speech enhancement only suppresses backgrounds represented by stationary noises, i.e., noisy sounds with time-invariant spectral characteristics.
Another known solution is called online source separation. One exemplary method was discussed in the reference written by L. S. R. Simon and E. Vincent, “A general framework for online audio source separation,” in International conference on Latent Variable Analysis and Signal Separation, Tel-Aviv, Israel, March 2012 (hereinafter referred to as reference 2). A solution of online source separation allows dealing with non-stationary backgrounds, which normally is based on advanced spectral models of both sources: the speech and the background. However, the online source separation depends strongly on the fact whether the source models represent well the actual sources to be separated.
Consequently, there remains a need to improve the noise suppression in an audio communication for separating the speech data from the background data of the audio communication so that the speech quality can be improved.