Noise
Removing additive noise from acoustic signals, such as speech has a number of applications in telephony, audio voice recording, and electronic voice communication. Noise is pervasive in urban environments, factories, airplanes, vehicles, and the like.
It is particularly difficult to denoise time-varying noise, which more accurately reflects real noise in the environment. Typically, non-stationary noise cancellation cannot be achieved by suppression techniques that use a static noise model. Conventional approaches such as spectral subtraction and Wiener filtering have traditionally used static or slowly-varying noise estimates, and therefore have been restricted to stationary or quasi-stationary noise.
Non-Negative Matrix Factorization
Non-negative matrix factorization (NMF) optimally solves an equationV≈WH.
The conventional formulation of the NMF is defined as follows. Starting with a non-negative M×N matrix V, the goal is to approximate the matrix V as a product of two non-negative matrices W and H. An error is minimized when the matrix V is reconstructed approximately by the product WH. This provides a way of decomposing a signal V into a convex combination of non-negative matrices.
When the signal V is a spectrogram and the matrix is a set of spectral shapes, the NMF can separate single-channel mixtures of sounds by associating different columns of the matrix with different sound sources, see U.S. Patent Application 20050222840 “Method and system for separating multiple sound sources from monophonic input with non-negative matrix factor deconvolution,” by Smaragdis et al. on Oct. 6, 2005, incorporated herein by reference.
NMF works well for separating sounds when the spectrograms for different acoustic signals are sufficiently distinct. For example, if one source, such as a flute, generates only harmonic sounds and another source, such as a snare drum, generates only non-harmonic sounds, the spectrogram for one source is distinct from the spectrogram of other source.
Speech
Speech includes harmonic and non-harmonic sounds. The harmonic sounds can have different fundamental frequencies at different times. Speech can have energy across a wide range of frequencies. The spectra of non-stationary noise can be similar to speech. Therefore, in a speech denoising application, where one “source” is speech and the other “source” is additive noise, the overlap between speech and noise models degrades the performance of the denoising.
Therefore, it is desired to adapt non-negative matrix, factorization to the problem of denoising speech with additive non-stationary noise.