Source separation has attracted attention in a variety of applications where it may be desirable to extract a set of original source signals from a set of mixed signal observations.
Source separation may find use in a wide variety of signal processing applications, such as audio signal processing, optical signal processing, speech separation, neural imaging, stock market prediction, telecommunication systems, facial recognition, and more. Where knowledge of the mixing process of original signals that produces the mixed signals is not known, the problem has commonly been referred to as blind source separation (BSS).
Independent component analysis (ICA) is an approach to the source separation problem that models the mixing process as linear mixtures of original source signals, and applies a de-mixing operation that attempts to reverse the mixing process to produce a set of estimated signals corresponding to the original source signals. Basic ICA assumes linear instantaneous mixtures of non-Gaussian source signals, with the number of mixtures equal to the number of source signals. Because the original source signals are assumed to be independent, ICA estimates the original source signals by using statistical methods extract a set of independent (or at least maximally independent) signals from the mixtures.
While conventional ICA approaches for simplified, instantaneous mixtures in the absence of noise can give very good results, real world source separation applications often need to account for a more complex mixing process created by real world environments. A common example of the source separation problem as it applies to speech separation is demonstrated by the well-known “cocktail party problem,” in which several persons are speaking in a room and an array of microphones are used to detect speech signals from the separate speakers. The goal of ICA would be to extract the individual speech signals of the speakers from the mixed observations detected by the microphones; however, the mixing process may be complicated by a variety of factors, including noises, music, moving sources, room reverberations, echoes, and the like. In this manner, each microphone in the array may detect a unique mixed signal that contains a mixture of the original source signals (i.e. the mixed signal that is detected by each microphone in the array includes a mixture of the separate speakers' speech), but the mixed signals may not be simple instantaneous mixtures of just the sources. Rather, the mixtures can be convolutive mixtures, resulting from room reverberations and echoes (e.g. speech signals bouncing off room walls), and may include any of the complications to the mixing process mentioned above.
Mixed signals to be used for source separation can initially be time domain representations of the mixed observations (e.g. in the cocktail party problem mentioned above, they would be mixed audio signals as functions of time). ICA processes have been developed to perform the source separation on time-domain signals from convolutive mixed signals and can give good results; however, the separation of convolutive mixtures of time domain signals can be very computationally intensive, requiring lots of time and processing resources and thus prohibiting its effective utilization in many common real world ICA applications.
A much more computationally efficient algorithm can be implemented by extracting frequency data from the observed time domain signals. In doing this, the convolutive operation in the time domain is replaced by a more computationally efficient multiplication operation in the frequency domain. A Fourier-related transform, such as a short-time Fourier transform (STFT), can be performed on the time-domain data in order to generate frequency representations of the observed mixed signals and load frequency bins, whereby the STFT converts the time domain signals into the time-frequency domain. A STFT can generate a spectrogram for each time segment analyzed, providing information about the intensity of each frequency bin at each time instant in a given time segment.
Traditional approaches to frequency domain ICA involve performing the independent component analysis at each frequency bin (i.e. independence of the same frequency bin between different signals will be maximized) without any constraints derived from prior information. Unfortunately, this approach inherently suffers from a well-known permutation problem, which can cause estimated frequency bin data of the source signals to be grouped in incorrect sources. As such, when resulting time domain signals are reproduced from the frequency domain signals (such as by an inverse STFT), each estimated time domain signal that is produced from the separation process may contain frequency data from incorrect sources.
Various approaches to solving the misalignment of frequency bins in source separation by frequency domain ICA have been proposed. However, to date none of these approaches achieve high enough performance in real world noisy environments to make them an attractive solution for acoustic source separation applications.
Conventional approaches include performing frequency domain ICA at each frequency bin as described above and applying post-processing that involves correcting the alignment of frequency bins by various methods. However, these approaches can suffer from inaccuracies and poor performance in the correcting step. Additionally, because these processes require an additional processing step after the initial ICA separation, processing time and computing resources required to produce the estimated source signals are greatly increased.
Moreover, moving sources can especially complicate source separation because the movements alter the mixing process that mixes the separate source signals before being observed, causing the underlying mixing models used in the separation process to change over time. As such, the source separation process has to account for new mixing models, and utilizing ICA for source separation of moving sources typically requires estimating new mixing models each time any of the sources change position. When using this approach without any further constraints, extremely large amounts of data are needed to produce accurate source separation models from real-time data, rendering the source separation process inefficient and impractical.
To date, known approaches to frequency domain ICA suffer from one or more of the following drawbacks: inability to accurately align frequency bins with the appropriate source, requirement of a post-processing that requires extra time and processing resources, poor performance (i.e. poor signal to noise ratio), inability to efficiently analyze multi-source speech, complex optimization functions that consume processing resources, and a requirement for a limited time frame to be analyzed.
For the foregoing reasons, there is a need for methods and apparatus that can efficiently implement frequency domain independent component analysis to produce estimated source signals from a set of mixed signals without the aforementioned drawbacks. It is within this context that a need for the present invention arises.