In signal processing applications, it is commonplace to decompose a signal into parts or components and use all or a subset of these components in order to perform one or more operations on the original signal. In other words, decomposition techniques extract components from signals or signal mixtures. Then, some or all of the components can be combined in order to produce desired output signals. Factorization can be considered as a subset of the general decomposition framework and generally refers to the decomposition of a first signal into a product of other signals, which when multiplied together represent the first signal or an approximation of the first signal.
Signal decomposition is often required for signal processing tasks including but not limited to source separation, signal restoration, signal enhancement, noise removal, un-mixing, up-mixing, re-mixing, etc. As a result, successful signal decomposition may dramatically improve the performance of several processing applications. Therefore, there is a great need for new and improved signal decomposition methods and systems.
Since signal decomposition is often used to perform processing tasks by combining decomposed signal parts, there are many methods for automatic or user-assisted selection, categorization and/or sorting of said parts. By exploiting such selection, categorization and/or sorting procedures, an algorithm or a user can produce useful output signals. Therefore there is a need for new and improved selection, categorization and/or sorting techniques of decomposed signal parts. In addition there is a great need for methods that provide a human user with means of combining such decomposed signal parts.
Source separation is an exemplary technique that is mostly based on signal decomposition and requires the extraction of desired signals from a mixture of sources. Since the sources and the mixing processes are usually unknown, source separation is a major signal processing challenge and has received significant attention from the research community over the last decades. Due to the inherent complexity of the source separation task, a global solution to the source separation problem cannot be found and therefore there is a great need for new and improved source separation methods and systems.
A relatively recent development in source separation is the use of non-negative matrix factorization (NMF). The performance of NMF methods depends on the application field and also on the specific details of the problem under examination. In principle, NMF is a signal decomposition approach and it attempts to approximate a non-negative matrix V as a product of two non-negative matrices W (the basis matrix) and H (the weight matrix). To achieve said approximation, a distance or error function between V and WH is constructed and minimized. In some cases, the matrices W and H are randomly initialized. In other cases, to improve performance and ensure convergence to a meaningful and useful factorization, a training step can be employed (see for example Schmidt, M., & Olsson, R. (2006). “Single-Channel Speech Separation using Sparse Non-Negative Matrix Factorization”, Proceedings of Interspeech, pp. 2614-2617 and Wilson, K. W., Raj, B., Smaragdis, P. & Divakaran, A. (2008), “Speech denoising using nonnegative matrix factorization with priors,” IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4029-4032). Methods that include a training step are referred to as supervised or semi-supervised NMF. Such training methods typically search for an appropriate initialization of the matrix W, in the frequency domain. There is also, however, an opportunity to train in the time domain. In addition, conventional NMF methods typically initialize the matrix H with random signal values (see for example Frederic, J, “Examination of Initialization Techniques for Nonnegative Matrix Factorization” (2008). Mathematics Theses. Georgia State University). There is also an opportunity for initialization of H using multichannel information or energy ratios. Therefore, there is overall a great need for new and improved NMF training methods for decomposition tasks and an opportunity to improve initialization techniques using time domain and/or multichannel information and energy ratios.
Source separation techniques are particularly important for speech and music applications. In modern live sound reinforcement and recording, multiple sound sources are simultaneously active and their sound is captured by a number of microphones. Ideally each microphone should capture the sound of just one sound source. However, sound sources interfere with each other and it is not possible to capture just one sound source. Therefore, there is a great need for new and improved source separation techniques for speech and music applications.