There are many devices with two transducers on the market, such as laptops, tablet computer, mobile phones, and smartphones, as well as iPod or smartphone docking stations and soundbars for TVs. Compared to a conventional stereo system with two discrete loudspeakers, the two transducers of such devices are located in a single cabinet or enclosure and are typically placed very close to each other (due to the size of the device, they are usually spaced by only few centimeters, between 2 and cm for mobile device such as smartphones or tablets). For typical listening distances, the loudspeaker span angle θ as illustrated in FIG. 1a is small, i.e., less than 60 degrees as recommended for stereo playback according to ITU Recommendation BS.775-3, “Multichannel stereophonic sound system with and without accompanying picture”, ITU-R, 2012.
This results in sound reproduction which is narrow, almost “mono-like”. When playing a stereo recording on such devices, all sound sources are perceived as being centered, any spatial information, where sounds sources would be localized for example on the left or on the right side of the listener is missing. Even worse, multi-channel signals with the goal to create a surround effect with sources placed all around the listener cannot be realized using single-cabinet loudspeakers.
A typical approach to increasing the spatial effect of such single cabinet devices is to use crosstalk cancellation techniques as described by Bauer, B. B., “Stereophonic earphones and binaural loudspeakers”, Journal Audio Engineering Society 9, 148-151, 1961. The general goal of crosstalk cancellation is to attenuate crosstalk. Crosstalk refers to the undesired signal path C between a speaker, e.g. a loudspeaker 105, 107 of a mobile device 103 as depicted in FIG. 1, and the contra-lateral ear i.e., the path between the right speaker R 107 and left ear l and the path between the left speaker L 105 and the right ear r as shown in FIG. 1b. As a result of cancelling crosstalk, it is possible to present binaural signals to the listener's ears which allows positioning acoustic sources 109 virtually in an area 111 all around the listener and obtaining a stereo widening or virtual surround effect as illustrated in FIG. 1c. 
In practice, crosstalk cancellation may be implemented using filter inversion techniques. Channel separation is achieved by means of destructive wave interference at the position of the listener's ears. Intuitively speaking, each desired signal intended for the ispi-lateral ear produced by one speaker is output a second time (delayed and phase inverted) in order to obtain the desired cancellation at the position of the contra-lateral ear. As a result, high signal amplitudes and sound pressure levels are required to be produced by the speakers only to be later canceled at the listener ears. This effect reduces the efficiency of the electro-acoustic system; it may lead to distortions as well as a reduced dynamic range and reduced maximum output level.
The applicability of crosstalk cancellation systems for creating enhanced spatial effects in mobile devices is limited by the high load they typically put on the electro-acoustic system consisting of amplifiers and speakers.
The performance of crosstalk cancellation based on filter inversion techniques or first-order directivity processing shows strong frequency dependence. In particular for low-frequencies, the difference Δl between the direct path D and the crosstalk path C is very small (relative to the wavelength). In this case, the required delay
      τ    c    =            Δ      ⁢                          ⁢      l              c      s      (with the speed of sound cs≈340 m/s) is very small which results in ipsi- and contra-lateral signals being very similar. FIG. 2 shows an example frequency response 200 of a typical crosstalk cancellation filter. Obviously, in particular for low frequencies a large gain is required.
In fact, for small ωτc a desired attenuation of the contra-lateral signal induces an undesired attenuation of the ipsi-lateral signal. To overcome this attenuation of the ipsi-lateral attenuation, a high amplification of certain frequencies is required. In particular for systems with loudspeakers exhibiting small span angles θ (see FIG. 1a), low frequencies need to be amplified significantly and high sound pressure levels need to be produced by the speakers (only to be later cancelled by destructive wave interferences at the listener's ears) which results in a significant loss of gain and dramatically constraints the maximum output level and limits the dynamic range of the system. Overall, this characteristic which is common to all crosstalk cancellation techniques limits the crosstalk cancellation efficiency (i.e., the ratio of the sound pressure at the desired signal resulting at the position of listeners ears to the overall sound pressure produced by the speakers). In other words, there is a high crosstalk cancellation effort put on the speakers.
This problem becomes particularly severe for applications of crosstalk cancellation in mobile devices. Such devices typically are equipped with very small speakers and low-power amplifiers. Furthermore, the speakers are placed at small loudspeaker span angles. As the ability to produce high sound pressure levels (in particular for low frequencies) is limited using such small transducers and low-power amplifiers, any further amplification required by the crosstalk cancellation system typically results in inadequately low sound pressure levels, drastically reduced dynamic range, and even distortions resulting from overloading the loudspeakers and amplifiers, as well as saturating the digital signal processing equipment.
Several solutions to this problem exist which require an adaptive placement of the speakers in terms of spanning angle or use regularization to restrict the maximum amplification level.
Regularization (constant parameter and frequency-depended regularization) can be used for reducing the loss of dynamic range loss caused by the system inversion. Regularization constraints the additional amplification introduced by the crosstalk cancellation systems. However, in turn, it also constraints the ability of the signal to cancel crosstalk and therefore constitutes a means to control the unavoidable trade-off between accepted loss of dynamic range and desired attenuation of crosstalk. High dynamic range and high crosstalk attenuation for creating a large spatial effect cannot be achieved simultaneously.
Optimal Source Distribution is a technique which reduces the loss of dynamic range loss by continuously varying the loudspeaker span angle based on frequency. For high frequencies, a small loudspeaker span angle is used, for low frequencies the loudspeaker span angle is more and more increased resulting in larger ωτc values. Obviously, this technique requires several loudspeakers (more than two) which are spanned up to 180°. For each frequency range, the loudspeakers are used which require the least effort, i.e., need to emit the smallest output power. For mobile devices, this solution is not applicable because all speakers are placed in a single (typically small) enclosure which limits the achievable span angles.
The main advantage of using crosstalk cancellation techniques is that binaural signals can be presented to the listener which opens the possibility to place acoustic sources virtually all around the listener's head, spanning the entire 360° azimuth as well as elevation range as illustrated in FIG. 3. A number of factors affect the spatial aspects of how a sound is perceived; mainly interaural-time and interaural-level differences cues are relevant for azimuth localization of sound sources.
The separation of an audio signal into frontal and surrounding sources is a well-studied problem in the field of 2-to-3 or 2-to-5 channel up-mixing, see Vickers, E.; “Frequency-Domain Two- To Three-Channel Upmix for Center Channel Derivation and Speech Enhancement,” Audio Engineering Society Convention 127, 2009 and Irwan, R., Aarts, R. M., “Two-to-Five Channel Sound Processing”, JASA 50(11), 2002. Here, given a conventional stereo recording (consisting of 2 channels left L and right R), the goal is to derive additional channels to obtain an additional center channel or 5.1 multi-channel surround sound signal for improved playback using 5.1 speaker setups.
For extracting a center channel, the goal is to decompose a stereo signal by first extracting any information common to the left and right inputs L, R and assigning this to the center channel and assigning the residual signal energy to the left and right channel (see FIG. 5a). The same principal can be used for separating the stereo signal into frontal sources and surrounding sources. Here, information common to the left and right channels corresponds to frontal sources M; any residual audio energy is assigned to the left side surrounding SL or right side surrounding SR sources (see FIG. 5b).
The separation may be based on the following signal model as described by Vickers, E.; “Frequency-Domain Two- To Three-Channel Upmix for Center Channel Derivation and Speech Enhancement,” Audio Engineering Society Convention 127, 2009:L=0.5M+SL R=0.5M+SR, 
where M corresponds to the common signal parts which are the same in L and R, SL and SR correspond to the residual side signal parts. The basic assumption is that there is a primary or dominant source P which can be observed in a framed subband representation of the signal. P is assumed to be panned somewhere between the left and the right channel of the input signal. For the separation into common and surrounding signal parts, the idea is to represent P using a Mid component M and a side component SL (in the case P is pointing further to the left side) or right component SR (in the case P is pointing further to the right), see FIG. 5.
As described in Irwan, R., Aarts, R. M., “Two-to-Five Channel Sound Processing”, JASA 50(11), 2002, see FIG. 4, the separation unit 400 may perform PCA (Principal Component Analysis) 403 on framed sub-bands 404 in frequency domain obtained by FFT transform and subband decomposition 401 to derive the signals M, SL, and SR 406, according to the following instructions:
Compute the rotation angle between left and right input channels 404 using PCA (Principal Component Analysis) 403 which corresponds to the direction of the dominant source P in the respective framed sub-band;
Derive M corresponding to the projection of the dominant source to the frontal direction; and S represents the remaining parts of the stereo content;
SL and SR can be obtained by mapping S to the more pronounced channel depending on the contribution of L and R to S;
M, SL and SR 406 may be transformed into time domain 408 by using an IFFT 405.
Many different solutions may be applied to obtain the desired separation and different terms may be used for the different components, e.g., common or centered or frontal parts are equivalent terms, also surrounding or side or ambient parts are equivalent terms.
The Mid signal M contains all frontal sources, the side signals SL and SR contain the surrounding sources. For widening the stereo signal when playing on mobile devices with small loudspeaker span angles, the stereo widening using crosstalk cancellation is only required for processing the surrounding signals SL and SR. The mid signal M containing frontal source can be reproduced using conventional amplitude panning.
Applications of crosstalk cancellation techniques as described above in mobile devices with the goal to create an enhanced spatial effect (stereo widening, virtual surround playback, binaural reproduction) suffer from either low channel separation (low attenuation of crosstalk) or low dynamic range and limited maximum output level when achieving high attenuation of crosstalk. Prior art solutions only provide a means for controlling the unavoidable trade-off between the two contradicting aspects.