Multi-channel audio material is becoming more and more popular also in the consumer home environment. This is mainly due to the fact that movies on DVD offer 5.1 multi-channel sound and therefore even home users frequently install audio playback systems, which are capable of reproducing multi-channel audio. Such a setup consists e.g. of 3 speakers L, C, R in the front, 2 speakers Ls, Rs in the back and a low frequency enhancement channel LFE and provides several well-known advantages over 2-channel stereo reproduction, e.g.:                improved front image stability even outside of the optimal central listening position due to the Center channel (larger “sweet-spot”=optimum listening position)        increased sense of listener “involvement” created by the rear speakers.        
Nevertheless, there exists a huge amount of legacy audio content, which consists only of two (“stereo”) audio channels, e.g. on Compact Discs (CDs).
To play back two-channel legacy audio material over a 5.1 multi-channel setup there are two basic options:                1. Reproduce the left and right channel stereo signals over the L and R speakers, respectively, i.e., play it back in the legacy way. This solution does not take advantage of the extended loudspeaker setup (Center and rear loudspeakers).        2. One may use a method to convert the two channels of the content material to a multi-channel signal (this may happen “on the fly” or by means of preprocessing) that makes use of all the 5.1 speakers and in this way benefits from the previously discussed advantages of the multi-channel setup.        
Solution #2 clearly has advantages over #1, but also contains some problems especially with respect to the conversion of the two front channels (Left and Right=LR) to three front channels (Multi-channel Left, Center and Right=L′C′R′).
A good LR to L′C′R′ conversion solution should fulfill the following requirements:                1) To recreate a similar, but more stable front image in the L′C′R′ than in the LR playback case, The Center channel shall reproduce all the sound events which usually are perceived to come from the middle between the Left and Right loudspeaker, if the listener is in the “sweet spot”. Furthermore, signals in left front positions shall be reproduced by L′C′, and signals in the right front positions shall be reproduced by R′C′, respectively (see J. M. Jot and C. Avendano, “Spatial Enhancement of Audio Recordings”, AES 23rd Conference, Copenhagen, 2003).        2) The sum of the acoustical energy emitted by the channels L′C′R′ should be equal to the sum of the acoustical energy of the source channels LR in order to achieve an equally loud sound impression for L′C′R as for LR. Assuming equal characteristics in all reproduction channels, this translates into “the sum of the electrical energy of the channels L′C′R′ should be equal to the sum of the electrical energy of the source channels LR.”        
Due to requirement #1 the signals of the Left and Right channels may be mixed into one (single) center channel. This is particularly true, if the Left and the Right channel signals are near identical, i.e. they represent a phantom sound source in the middle of the front sound stage. This phantom image is now replaced by a “real” image generated by the Center speaker. Due to requirement #2, this Center signal shall carry the sum of the Left and the Right energy. If the level of the Left or the Right channel signals is close to the maximum amplitude that can be transmitted by the channel (=0 dBFS; dBFS=dB Full Scale), the sum of the levels of both channels will exceed the maximum level, which can be represented by the channel/system. This usually results in the undesirable effect of “clipping”.
The clipping situation is shown in FIG. 6. FIG. 6 illustrates a time waveform of a signal 60 processed by a processor having a maximum positive threshold 61a and a maximum negative threshold 61b. Depending on the capability of the digital processor processing the digital signal, the maximum positive threshold and the maximum negative thresholds may be +1 and −1. Alternatively, when a digital processor is used representing the numbers in integers, the maximum positive threshold will be 32768 corresponding to 215, and the maximum negative threshold will be −32768 corresponding to −215.
Since a time waveform signal is represented by a sequence of samples, each sample being a digital number between −32768 and +32768, it is easily clear that higher numbers can be obtained, when, for a certain time instance, the first channel has a quite high value and the second channel also has a quite high value, and when these quite high values are added together. Theoretically, the maximum number obtained by this adding together of two channels can be 65536. However, the digital signal processor is not able to represent this high number. Instead, the digital processor will only represent numbers equal to the maximum positive threshold or the maximum negative threshold. Therefore, the digital signal processor performs clipping in that a number higher or equal to the maximum positive threshold or the maximum negative threshold is replaced by a number equal to the maximum positive threshold and the maximum negative threshold so that, with regard to FIG. 6, the illustrated situation appears. Within a clipping time portion 62, the waveform 60 does not have its natural (sine) shape, but is flattened or clipped. When this clipped waveform is evaluated from a spectral point of view, it becomes clear that this time domain clipping results in strong harmonic components caused by a high gradient magnitude at the beginning and the end of the clipping time portion 62.
This “digital clipping” is not related to the replay setup, i.e., the amplifier and the loudspeakers used for rendering the audio signal. However, each amplifier/loudspeaker combination also has only a limited linear range, and, when this linear range is exceeded by a processed signal, also a kind of clipping takes place, which can be avoided using the inventive concept.
In any case, the occurrence of clipping introduces heavy distortions in the audio signal, which degrade the perceived sound quality very much. Thus, the occurrence of clipping has to be avoided. This is even more due to the fact that the sound improvement by rendering a stereo signal by a multichannel setup such as a 5.1 speaker system is small compared to the very annoying clipping distortions. Therefore, when one cannot guaranty that clipping does not occur, one would prefer to only use the left and the right speakers of a multi-channel setup for rendering a stereo signal.
There exist prior art solutions to overcome this clipping problem.
A simple solution to overcome this problem is to scale down all channels equally to a level where none of the channel signal (especially the Center signal) exceeds the 0 dBFS limit. This can be done statically by a predefined fixed value. In this case the fixed value must also be valid for worst case situations, where the Left and Right channel have maximum levels. For the average LR to L′C′R′ conversion this leads to a significantly quieter L′C′R′ version than the original stereo LR, which is undesirable, especially when users are switching between stereo and multi-channel reproduction. This behavior can be observed at commercially available matrix decoders (Dolby ProLogicII and Logic7 Decoder) that can be used as LR to L′C′R′ converters. See Dolby Publication: “Dolby Surround Pro Logic II Decoder—Principles of Operation”, http://www.dolby.com/assets/pdf/tech_library/209_Dolby_Surround_Pro_Logic_II_Decoder_Principles_of_Operation.pdf or Griesinger, D.: “Multichannel Matrix Surround Decoders for Two-Eared Listeners”, 101st AES Convention, Los Angeles, USA, 1996, Preprint 4402.
Another simple solution is to use dynamic range compression in order to dynamically (depending on the signal) limit the peak signal, sometimes also called a “limiter”. A disadvantage of this approach is that the true dynamic range of the audio program is not reproduced but subjected to compression (see Digital Audio Effects DAFX; Udo Zölzer, Editor; 2002; Wiley & Sons; p. 99ff: “Limiter”).
The downscaling problem is undesirable, since it reduces the level or volume of a sound signal compared to the level of the original signal. In order to completely avoid any even theoretical occurrence of clipping, one would have to downscale all channels by a scaling factor equal to 0.5. This results in a strongly reduced output level of the multi-channel signal compared to the original signal. When one only listens to this downscaled multi-channel signal, one can compensate for this level reduction by increasing the amplification of the sound amplifier. However, when one switches between several sources, the (legacy) stereo signal will appear to a listener very loud, when it is replayed using the same amplification setting of the amplifier a set for the multichannel reproduction.
Thus, a user would have to think about reducing the amplification setting of its amplifier before switching from a multi-channel representation of a stereo signal to a true stereo representation of the stereo signal in order to not damage her or his ears or equipment.
The other prior art method using dynamic range compression effectively avoids clipping. However, the audio signal itself is changed. Thus, the dynamic compression leads to a non-authentic audio signal, which, even when the introduced artifacts are not too annoying, is questionable from the authenticity point of view.