Speech that is electronically captured may be noisy in the sense that perceived quality of the speech is adversely affected by noise. For example, the perceived quality can be affected by background sounds, a low-quality microphone, transmission over various communications channels, and so on. If the perceived quality of speech is low, then it may difficult for a person to understand the speech or for speech to be further processed electronically, for example, using speech recognition techniques. To improve the perceived quality of such speech, various speech enhancement techniques have been employed such as filtering techniques (e.g., Weiner filtering), spectral restoration, and so on.
Sparsity is an important property of data that is exploited in a variety of signal-processing problems. In the context of non-negative matrix factorization (“NMF”), sparsity allows for the controlling of the uniqueness of signal representation. NMF may be used to factor amplitude spectrograms VM×T representing speech into a product of dictionary atoms WM×K and activations HK×T where M represents frequency ranges, T represents number of frames or windows, and K represents the number of dictionary atoms. In the cases of KM (denoting under- and over-complete representations), there are many possible solutions. In such cases, the non-uniqueness of the solutions found using NMF can be limited a certain degree by imposing constraints on the sparsity using a regularization term. Unfortunately, solutions to such a non-convex problem can be found through iterative updates that only guarantee local minima. As result of being solved via iterative updates, the quality of the factorization depends heavily on the initialization strategy.
When used for speech enhancement, NMF allows latent structures (speech is sparse and noise is not) in noisy speech signals to be inferred by factorizing their amplitude spectrograms V into a linear combination of basis functions W that define a convex cone as represented by Equation 1:
                              V          ≈          WH                =                                            arg              ⁢                                                          ⁢              min                                                      W                _                            ,                              H                _                                              ⁡                      [                                          D                ⁡                                  (                                      V                    ⁢                                                                                                        ⁢                                          W                      _                                        ⁢                                          H                      _                                                        )                                            +                              v                ⁢                                                                                                H                      _                                                                            1                                                      ]                                              (        1        )            where v controls the sparsity weight and D represents one of many possible divergence metrics. Since Equation 1 has no closed form solution, algorithms to solve it may use multiplicative updates to get the best approximation. The update algorithms start out with an initial seed for W and H, and continue to refine the estimates iteratively until they reach the desired level of error convergence. Although NMF has been used for speech enhancement, the quality of NMF factorizations are sensitive to initializations. Unfortunately, because globally optimal solutions cannot be guaranteed, it is likely that some random initialization will beat the best proposed strategy, which is why a random initialization strategy is often employed. However, with NMF, as the number of atoms in the dictionary atoms increases, the factorizations become consistent across different initializations. However, the computational resources (e.g., the amount of memory and the number of floating point operations) tends to increase O(K). Thus, higher-order factorizations (with larger number of atoms) lead to higher computational costs and are not desirable.