Conventional reproduction of stereophonic audio on two loudspeakers dates back to the 30's with the invention of Blumlein stereo (British Pat. No. GB394,325). In accordance with the teachings of Blumlein an audio signal is recorded and transmitted as a set of two channels, allowing each of two synchronized loudspeakers to reproduce a different audio signal, where the phase differences and amplitude differences between the two signals generate imaginary sound-source locations to the listener's ears. These imaginary sound-sources are referred to in the art as ‘phantom images’. The totality of phantom images is commonly referred to as the ‘stereo image’.
The invention of stereo and phantom images revolutionized audio reproduction technologies. For example, by maintaining certain relations between the signals in the two stereo channels, the perceived direction of each phantom-image could be designated such that it closely corresponds to the direction of the real source in a recorded acoustic environment, as long as that direction is not to the left of the leftmost loudspeaker or to the right of the rightmost loudspeaker. Using stereo related technology it is also possible to generate a stereo signal from a mono signal (one channel), in a way that the mono sound source will appear as a phantom image in a desired direction, by simply routing the mono signal into both channels of the stereo, and by manipulating the relative amplitudes of the channels or their relative delays. The latter method is commonly referred to as ‘panning’ and is described in greater detail in Griesinger D., Stereo and Surround panning in practice, 112 Audio Engineering Society Convention, Germany 2002 (hereinafter “Griesinger”).
In conventional stereo, the perceived direction of a phantom image in steady-state sound is determined by the phase-difference between the channels in low frequencies, and by the amplitude differences between the channels in high frequencies, as is described in greater detail in Bernfeld B., Attempts for better understanding of the directional stereophonic listening mechanism, 44th Audio Engineering Society Convention, February 1973 (hereinafter “Bernfeld”). On transient sounds, if there is a delay difference between the transients in the two channels, then inter-channel delay and HAAS effect are also involved in the perceived phantom direction, as is described in greater detail in Gardner M. B, Historical background of the Haas and or precedence effect, J. of Acoustical Society of America, No. 43, 1968 (hereinafter “Gardner M. B”).
Many alternative two-or-more-loudspeakers audio reproduction methods have been proposed in prior art. Still, conventional stereo remains the most popular method. In conventional stereo reproduction, mainly 3 procedures (or their combination) are used to obtain a stereophonic audio signal: (1) Stereo is recorded as two channels via a stereophonic microphone technique, (2) A single mono channel is recorded and stereo is generated from the mono channel by amplitude panning as described above, and (3) A single mono channel is recorded and artificial effects (such as artificial reverberation, delay effects, phase effects, HRTF (“Head Related Transfer Functions”) filters) are used to generate artificial two-channel stereo. Other methods also exist. For all 3 procedures described here, sounds may appear to the listener to arrive from the center position between the loudspeakers. This effect is called “phantom center” and is generally perceived only when the two-channel signals in the stereo contain a part of the signal which relates to “direct sound” and that part is identical or almost identical in the two channels (see Bernfeld B. referenced in the previous section). Phantom center differs from “hard-center” which is an attempt to reproduce sound arriving from the center using an additional (typically a 3rd) loudspeaker positioned in-between the left and right frontal speakers and substantially in front of the listener.
A stereo signal may contain a mixture of many sound sources for which the phantom images may appear to arrive from many directions, center, sides, and in between. In many applications it is important to separate the sources which generate the center phantom images. For example, surround sound reproduction standard formats typically use 3 frontal loudspeakers with a “hard-center” loudspeaker. If a two-channel stereo recording is reproduced on a surround loudspeaker system, the center channel needs to be generated artificially by extracting audio from the two-channels input signal, such as in matrix surround decoders (for example, see U.S. Pat. No. 4,799,260 to Mandell, et al.). In cinema applications it is important to separate dialogue audio, usually residing in the center direction, from the rest of the audio mixture, in order to make the dialogue clearer and more intelligible without substantially affecting the background and music. Further by way of example, in karaoke applications one of the desired features is the ability to obtain a common song and to eliminate from it the lead vocals, which usually reside in the center direction.
In other applications, when applying an artificial sound-effect to a stereo audio signal, the effect to be applied sometimes needs to change in accordance with the sound direction (or according to the phantom image direction). Such is the case for example when applying artificial acoustic filters to the audio (early reflections, Doppler effects), or when applying stereo widening effects (width matrix), or when applying virtualization effects (such as cross-talk cancellation, HRTF filter, dipole processing). In such cases, one may need to de-integrate the mixture into all its individual components of sound sources, and apply the desired effect separately to each. This task is considered difficult and scientifically virtually impossible. While some blind-source-separation methods (“BSS”), that attempt to “guess” the sound sources, do function in rather non-reverberant and well defined acoustics, modern stereo music already uses a complex mixture of microphone techniques, reverberant spaces, panning techniques, and a great amount of effects (linear and non-linear) that make BSS practically impossible. Also for this application, a more practical approach would be to separate only center sources from side sources. Since applying to the center sound sources (usually consisting primarily of vocals) a sound-effect which is designed for the sides would introduce audible artifacts to those center sources, separation of just the center sources may be effective for eliminating the artifacts. In the same manner, one may also apply an effect to the center sources only.
For many of the applications described above, there are two conditions which may be useful requirements or “ideals” for maintaining the sound reproduction of the processed stereo substantially faithful to the original stereo, for a system separating the center sources from the stereo mixture. For convenience, a mathematical representation of the two conditions is now provided by way of example. For input stereo L=left and R=Right, and for the separated 3-channels denoted center Cx, left Lx, and right Rx:                (1) Condition C1: L=Lx+g*Cx and R=Rx+g*Cx, with a gain g. A common value used for g is g=sqrt(½) hence the original stereo is reproduced back through split of the center energy between the left and the right channels.        (2) Condition C2: The stereo channel pair Lx,Rx, when reproduced separately from Cx, should sound to the ears of a human listener close to the original stereo pair L,R, for which the sounds arriving from the center have been omitted. Since this is considered virtually or practically impossible, the requirement is:                    a. For any individual sound source in the center of the stereo reproduction of L,R, hence when L=R, it is expected that Cx=g1*L where g is a gain, and Lx=0 and Rx=0.            b. For any individual sound source fully-panned to any of the sides, hence L!=0 and R=0 or vise versa, it is expected that Cx=0 and Lx,Rx to maintain Lx=L and Rx=R.Condition (C1) is important even when the summation does not happen in the music production, and the separated center sound channel Cx is transmitted as an individual channel. Note that when the 3-channel audio output is played back on a 2-channel system which many homes still have, conventional surround receivers and DVD players tend to mix the center channel back into the left and right channels. In surround sound this quality is usually called “stereo mix-down compatibility”. The reproduction still needs to preserve the exact (or close to the exact) original stereo signal when summed back together. Also, in other applications as described above, when using center separation to apply an audio effect only to the sides or only to the center, it may be important to maintain a quality referred to herein as “transparency”. Transparency essentially means that as the sound-effect is minimized the audio signal becomes as close as desired to the original.                        
Derived by the motivation of the applications described above, some prior art methods (such as disclosed for example by U.S. Pat. No. 4,748,669 to Klayman) separate the stereo signal into the scaled sum M=(L+R)/2 and scaled difference S=(L−R)/2, and apply some desired effect only to the scaled sum (M) or only to the scaled difference (S), then regenerate the stereo through the inverse transformation L=M+S and R=M−S. However, it should be noted that by taking the sum of the left and right stereo channels, hence the M=L+R signal, one does not extract the center sound sources from the stereo mix. For example, if a sound source was generated at the very left direction using amplitude-panning, then the right channel will be zero and we obtain M=L/2, thus M contains also half of the left-panned sound source. If one attempts to derive the separated Lx and Rx from the difference signal S, it would be apparent that for this approach condition (C2) does not hold.
Other systems that attempt to separate the center sound are surround matrix decoders. In such systems, the assumption is typically that most of the input stereo signals have been pre-encoded into the stereo to localize particular sound events in a surround multi-channel system. Resulting from this assumption and from the requirement with respect to the output of the decoders to preserve the directions of the original surround, the matrix decoders must localize, at each instant in time, the sound to only one given direction. It is then obvious that for a common case of a 2-channels stereo input containing a complex mixture of events and directions, the condition (C1) does not hold.