This specification relates to signal processing, and, more particularly, to systems and methods for concurrent signal recognition.
In most applications, any given signal may be treated as a mixture of signals from various sources. In the field of audio processing, for example, recorded music typically includes a mixture of overlapping parts played with different instruments. Also, in social environments, multiple people often tend to speak concurrently—referred to as the “cocktail party effect.” In fact, even signals from so-called single sources can actually be modeled a mixture of signal and noise.
Recognition of concurrent, superimposed, or otherwise overlapping signals is a significantly hard task. Current models for signal recognition cannot be easily extended to deal with additive interference, and often need to be complemented with a source separation algorithm that preprocesses the data before recognition takes place. This is often a risky combination insofar because the output of a separation algorithm is not always guaranteed to be recognizable—at least not by typical recognition systems.
A different temporally-sensitive approach characterizes signals from concurrent sources by Hidden Markov Models (HMMs). The sum of the speech is then characterized by a factorial HMM, which is essentially a product of the HMMs representing the individual sources. Inference can be run on the factorial HMM to determine what was emitted by individual sources. Still, this approach involves source separation and computationally intensive operations.