Source separation has attracted attention in a variety of applications where it may be desirable to extract a set of original source signals from a set of mixed signal observations.
Source separation may find use in a wide variety of signal processing applications, such as audio signal processing, optical signal processing, speech separation, neural imaging, stock market prediction, telecommunication systems, facial recognition, and more. Where knowledge of the mixing process of original signals that produces the mixed signals is not known, the problem has commonly been referred to as blind source separation (BSS).
Independent component analysis (ICA) is an approach to the source separation problem that models the mixing process as linear mixtures of original source signals, and applies a demixing operation that attempts to reverse the mixing process to produce a set of estimated signals corresponding to the original source signals. Basic ICA assumes linear instantaneous mixtures of non-Gaussian source signals, with the number of mixtures equal to the number of source signals. Because the original source signals are assumed to be independent, ICA estimates the original source signals by using statistical methods extract a set of independent (or at least maximally independent) signals from the mixtures.
While conventional ICA approaches for simplified, instantaneous mixtures in the absence of noise can give very good results, real world source separation applications often need to account for a more complex mixing process created by real world environments. A common example of the source separation problem as it applies to speech separation is demonstrated by the well-known “cocktail party problem,” in which several persons are speaking in a room and an array of microphones are used to detect speech signals from the separate speakers. The goal of ICA would be to extract the individual speech signals of the speakers from the mixed observations detected by the microphones. The mixing process may be mathematically represented by a mixing matrix in the ICA process. However, the mixing process may be complicated by a variety of factors, including noises, music, moving sources, room reverberations, echoes, and the like. In this manner, each microphone in the array may detect a unique mixed signal that contains a mixture of the original source signals (i.e. the mixed signal that is detected by each microphone in the array includes a mixture of the separate speakers' speech), but the mixed signals may not be simple instantaneous mixtures of just the sources. Rather, the mixtures can be convolutive mixtures, resulting from room reverberations and echoes (e.g. speech signals bouncing off room walls), and may include any of the complications to the mixing process mentioned above.
Mixed signals to be used for source separation can initially be time domain representations of the mixed observations (e.g. in the cocktail party problem mentioned above, they would be mixed audio signals as functions of time). ICA processes have been developed to perform the source separation on time-domain signals from convolutive mixed signals and can give good results; however, the separation of convolutive mixtures of time domain signals can be very computationally intensive, requiring lots of time and processing resources and thus prohibiting its effective utilization in many common real world ICA applications.
A much more computationally efficient algorithm can be implemented by extracting frequency data from the observed time domain signals. In doing this, the convolutive operation in the time domain is replaced by a more computationally efficient multiplication operation in the frequency domain. A Fourier-related transform, such as a short-time Fourier transform (STFT), can be performed on the time-domain data in order to generate frequency representations of the observed mixed signals and load frequency bins, whereby the STFT converts the time domain signals into the time-frequency domain. A STFT can generate a spectrogram for each time segment analyzed, providing information about the intensity of each frequency bin at each time instant in a given time segment.
Traditional approaches to frequency domain ICA involve performing the independent component analysis at each frequency bin (i.e. independence of the same frequency bin between different signals will be maximized) without any constraints derived from prior information. Unfortunately, this approach inherently suffers from a well-known permutation problem, which can cause estimated frequency bin data of the source signals to be grouped in incorrect sources. As such, when resulting time domain signals are reproduced from the frequency domain signals (such as by an inverse STFT), each estimated time domain signal that is produced from the separation process may contain frequency data from incorrect sources. Furthermore, traditional approaches typically rely on unconstrained models that fail to account for additional information regarding the source signals. However, in many real world applications, additional information can be utilized to improve the separation process, and traditional ICA techniques generally fail to appreciate ways in which the complexity of the underlying processing operations can be simplified utilizing prior information regarding the sources.
Various approaches to solving the misalignment of frequency bins in source separation by frequency domain ICA have been proposed. However, to date none of these approaches achieve high enough performance in real world noisy environments to make them an attractive solution for acoustic source separation applications.
Conventional approaches include performing frequency domain ICA at each frequency bin as described above and applying post-processing that involves correcting the alignment of frequency bins by various methods. However, these approaches can suffer from inaccuracies and poor performance in the correcting step. Additionally, because these processes require an additional processing step after the initial ICA separation, processing time and computing resources required to produce the estimated source signals are greatly increased.
To date, known approaches to frequency domain ICA suffer from one or more of the following drawbacks: inability to accurately align frequency bins with the appropriate source, requirement of a post-processing that requires extra time and processing resources, poor performance (i.e. poor signal to noise ratio), inability to efficiently analyze multi-source speech, complex optimization functions that consume processing resources, and a requirement for a limited time frame to be analyzed.
For the foregoing reasons, there is a need for methods and apparatus that can efficiently implement frequency domain independent component analysis to produce estimated source signals from a set of mixed signals without the aforementioned drawbacks. It is within this context that a need for the present invention arises.