1. Technical Field
The invention relates to the processing of audio signals. More particularly, the invention relates to a method and apparatus for removing or isolating voice or instruments on stereo recordings, and further relates to a training-based method and apparatus for removing or isolating voice or instruments on stereo recordings.
2. Description of the Prior Art
There are many situations in which we have a stereo recording of many voice or music sources mixed together, but we wish to listen to only one of those sources. In music, we may want to remove the lead vocals to perform karaoke, so that a new singer can seem to sing with the rest of the instruments or voices on the recording. In speech, we may have a recording that contains many independent or quasi-independent voices, and we wish to listen to only one. Separating this one voice out has been called the cocktail party problem.
Existing systems that try to solve these problems suffer from several problems:                There is much musical noise in the output, which sounds like beeps and tones added to the desired sound.        Sources that are much louder in one channel than the other are difficult to separate, and may not even be separable.        Some systems require that sources be independent, i.e. random relative to each other, and they thus perform poorly on music because musicians play together.        Some systems produce monophonic output, which is perceptually less desirable than stereo output. By producing monophonic output, they discard panning (loudness) and phase (delay) information between channels. Some systems discard both, some discard one, and some discard neither.        Some systems do not use all of the information between the two stereo channels, preventing them from achieving the best results.        
We have several reasons to extract any voice or instrument on a recording. For example, we may then listen to it separately, re-mix or re-pan it on another recording, or analyze it using speech or pitch recognition software.
In the source separation community, the demixing of N sources from two channel (stereo) input is termed underdetermined source separation when N>2.
Underdetermined sound source separation systems perform four steps in general:                front end processing,        mixing parameter estimation,        demixing, and        front end inversion.        
The invention herein disclosed focuses primarily on the demixing stage, which occurs after the mixing parameters are known.