Known systems which are configured to extract a specific voice track from an audio mix depend on special properties of the original audio mix. For example they require the voice to be panned to the center or similar. Systems which allow cancelling or fading out the main vocals from a stereo mix are based on the premise that the voice is panned to center location for most of the popular music. Such systems however fail for vocals that are not panned to the center and they cannot remove instruments.
Also systems are known which extract data concerning notes, force, instruments, duration from an audio mix. This data is then used to resynthesize the audio mix. Such systems do not output the exact music played in the original mix but a resynthesized version of it, which may negatively affect sound quality and yield to a loss of the original timbre of the instruments.