This specification relates to signal processing, and, more particularly, to systems and methods for language informed source separation.
Statistical signal modeling is a challenging technical field, particularly when it deals with mixed signals—i.e., signals produced by two or more sources.
In audio processing, most sounds may be treated as a mixture of various sound sources. For example, recorded music typically includes a mixture of overlapping parts played with different instruments. Also, in social environments, multiple people often tend to speak concurrently—referred to as the “cocktail party effect.” In fact, even so-called single sources can actually be modeled a mixture of sound and noise.
The human auditory system has an extraordinary ability to differentiate between constituent sound sources. This basic human skill remains, however, a difficult problem for computers.