In an audio or video signal, background noise is an unwanted signal that is transmitted together with the wanted audio or video signal. Conventional speech denoising and speech enhancement techniques use source separation to isolate this unwanted signal. The techniques model a noise signal with a single spectral profile that is estimated from several clean noise signal frames beforehand. However, when a background noise is non-stationary (i.e., having a noise spectrum that fluctuates or changes significantly and rapidly over time), these conventional techniques perform poorly. This is because non-stationary noises, such as keyboard noises or sirens, cannot be modeled well by a single spectrum.
Probabilistic formulations of nonnegative matrix factorization (NMF) source separation models have been used for audio source separation. In particular, Bayesian approaches have been used to handle uncertainty, enable hierarchical formulations, deal with hyperparameter learning, and automatically discover how many latent components are needed to model the data. These approaches use a dictionary comprising signal-related information such as representative signals and model parameters. An appropriate size for the dictionary is needed so that the dictionary is large enough for source separation algorithms to function properly. One issue with the NMF approaches is that they require that an appropriate number of dictionary elements be chosen a priori, before data modeling is performed.
Bayesian nonparametric (BNP) models have been used to address the requirement of choosing an appropriate number of dictionary elements a priori in NMF source separation models. For example, a Markov chain Monte Carlo (MCMC) model has been proposed. However, using such models can be computationally expensive and inefficient. Attempts to choose a number of dictionary elements using BNP models with variational inference algorithms to improve efficiency have been generally undesirable because using the variational algorithms has involved breaking dependencies between model parameters and variables. Breaking such dependencies can introduce local optima and thus reduce the accuracy of a source separation algorithm. Thus, prior techniques have only gained greater efficiencies at the cost of significantly sacrificing accuracy.