Multi-channel audio material is increasing in popularity. This has resulted in many end users now possessing multi-channel reproduction systems. This can mainly be attributed to the fact that DVDs are increasing in popularity and that many users of DVDs are now in the possession of 5.1 multi-channel equipment. Reproduction systems of this kind generally include three loudspeakers L (left), C (center) and R (right) which are typically arranged in front of the user, and two loudspeakers Ls and Rs arranged behind the user, and typically one LFE channel which is also referred to as low frequency effect channel or subwoofer. Such a channel scenario is indicated in FIGS. 10 and 11. While the positioning of the loudspeakers L, C, R, Ls, Rs with regard to the user is to be performed as indicated in FIG. 10 and FIG. 11 in order for the user to receive the best hearing impression possible, the positioning of the LFE channel (not shown in FIGS. 10 and 11) is not that important since the ear cannot perform localization at such low frequencies and the LFE channel can thus be arranged at any place where it has no disturbing effect due to its considerable size.
Such a multi-channel system produces several advantages compared to a typical stereo reproduction which is a two-channel reproduction, as is exemplarily shown in FIG. 9.
Outside the optimum central hearing position, the result will also be improved stability of the front hearing impression which is also referred to as “front image”, due to the center channel. Thus, the result is greater a “sweet-spot”, “sweet spot” representing the optimum hearing position.
In addition, due to the two back loudspeakers Ls and Rs the listener has an improved sensation of “delving into” the audio scene.
Nevertheless, there is a huge quantity of audio material in the possession of users or generally available which is only present as stereo material which thus only has two channels, namely the left channel and the right channel. Typical sound carriers for stereo pieces of this kind are compact discs.
In order to reproduce such a stereo material via a 5.1 multi-channel audio apparatus, there are two options recommended according to the ITU.
The first option is reproducing the left and right channels via the left and right loudspeakers of the multi-channel reproduction system. However, this solution is disadvantageous in that the plurality of loudspeakers already present are not made use of, i.e. that the center loudspeaker and the two back loudspeakers present are not made use of in an advantageous manner.
Another option is converting the two channels to form a multi-channel signal. This may take place during reproduction or by special preprocessing, which makes advantageous use of all six loudspeakers of the 5.1 reproduction system exemplarily already present and thus results in an improved hearing impression when upmixing from two channels to five and/or six channels is performed without any errors.
Only then will the second option, i.e. using all the loudspeakers of the multi-channel system, be of advantage compared to the first solution, in case no upmixing errors occur. Upmixing errors of this kind can be particularly disturbing when the signals for the back loudspeakers, which are also known as ambience signals, are not generated in an error-free manner.
A way of performing this so-called upmixing process is known under the keyword “direct ambience concept”. The direct sound sources are reproduced by the three front channels present such that they are perceived by the user at the same position as in the original two-channel version. The original two-channel version is illustrated schematically in FIG. 9 using the example of different drum instruments.
FIG. 10 shows an upmix version of the concept in which all the original sound sources, i.e. the drum instruments, are again reproduced by the three front loudspeakers L, C and R, wherein additionally special ambience signals are output by the two back loudspeakers. The term “directed sound source” thus is used to describe a tone coming only and directly from a discreet sound source, such as, for example, a drum instrument or another instrument, or generally, a special audio object, as is exemplarily schematically illustrated in FIG. 9 using a drum instrument. Any additional sounds, such as, for example, due to wall reflections, etc., are not present in such a direct sound source. In this scenario, the sound signals emitted by the two back loudspeakers Ls, Rs in FIG. 10 include only ambience signals present in the original recording or not. Ambience signals of this kind do not belong to a single sound source, but contribute to the reproduction of the room acoustics of a recording and thus result in the so-called sensation of “delving in” by the listener.
Another alternative concept referred to as “in-the-band” concept is illustrated schematically in FIG. 11. Every type of sound, i.e. direct sound sources and ambience-type tones, are all positioned around the listener. The position of a tone is independent of its characteristic (direct sound sources or ambience-type tones) and only depends on the specific design of the algorithm, as is exemplarily illustrated in FIG. 11. Thus, it has been determined in FIG. 11 by the upmix algorithm that the two instruments 1100 and 1102 are positioned laterally with regard to the listener, whereas the two instruments 1104 and 1106 are positioned in front of the user. The result of this is that the two back loudspeakers Ls, Rs also contain portions of the two instruments 1100 and 1102 and no longer only ambience-type tones, as has been the case in FIG. 10 where the same instruments were all positioned in front of the user.
The specialist publication “C. Avendano and J. M. Jot: “Ambience Extraction and Synthesis from Stereo Signals for Multichannel Audio Mixup”, IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 02, Orlando, Fla., May 2002” discloses a frequency domain technology for identifying and extracting ambience information in stereo audio signals. This concept is based on calculating an inter-channel coherence and a non-linear mapping function which is to allow determining time-frequency regions in the stereo signals which mainly include ambience components. Ambience signals are then synthesized and used to store the back channels or “surround” channels Ls, Rs (FIGS. 10 and 11) of a multi-channel reproduction system.
In the specialist publication “R. Irwan and Ronald M Aarts: “A method to covert stereo to multi-channel sound”, The proceedings of the AES 19th International Conference, Schloss Elmau, Germany, June 21-24, pages 139-143, 2001”, a method for converting a stereo signal to a multi-channel signal is presented. The signal for the surround channels is calculated using a cross-correlation technique. Principle component analysis (PCA) is used to calculate a vector indicating a direction of the dominant signal. This vector is then mapped from a two-channel representation to a three-channel representation to produce the three front channels.
The specialist publication “G. Soulodre, “Ambience-Based Up-mixing”, Workshop “Spatial Coding of Surround Sound: A Progress Report”, 117th AES Convention, San Francisco, Calif., USA, 2004” discloses a system producing a multi-channel signal from a stereo signal. The signal is broken down into so-called individual source streams and ambience streams. Based on these streams, a so-called “esthetics processor” synthesizes the multi-channel output signal.
All technologies known in different manners try to extract the ambience signals from the original stereo signal or even to synthesize same from noise and/or further information, wherein information which is not in the stereo signal may also be used for synthesizing the ambience signals. In the end, however, it is all about extracting information from the stereo signal and/or feeding information to a reproduction scenario, the information not being present explicitly, since typically only a two-channel stereo signal and, maybe, additional information and/or meta information are available.
From that point of view, the extraction or part-extraction and part-synthesizing of such ambience signals is a risky matter since a user would perceive it as being disturbing if information from sound sources was contained in the ambience channels, which the user identifies as coming directly from the front, i.e. from the left channel, center channel and right channel. For this reason, a production of ambience signals would be rendered very “defensive” in order to ensure that no artifacts perceived by the user as being disturbing are produced. The other extreme case when acting too defensively when producing the ambience signals is an ambience signal which is very faint or hardly perceivable to be extracted or the ambience signal only comprising noise, but no more special information so that the ambience signal contributes very slightly to a hearing pleasure and in this case could really be omitted completely.
It is problematic when producing the ambience signal that, on the one hand, an ambience signal which includes information going beyond normal noise is produced, but that the ambience signal does not result in audible artifacts, i.e. that an appropriate measure between audibility and information contents must be maintained.