The present invention generally relates to an apparatus and a method for generating an ambient signal from an audio signal, to an apparatus and a method for deriving a multi-channel audio signal from an audio signal, and to a computer program. Specifically, the present invention relates to a method and concept for calculating an ambient signal from an audio signal for upmixing mono audio signals for playback on multi-channel systems.
In the following, the motivation underlying the present invention will be discussed. Currently, multi-channel audio material is experiencing increasing popularity in consumer home environments as well. The main reason for this is that films on DVD media often offer 5.1 multi-channel sound. For this reason, even home users frequently install audio playback systems capable of reproducing multi-channel audio signals.
A corresponding setup may, for example, consist of three loudspeakers (exemplarily designated with L, C and R) arranged in the front, two loudspeakers (designated with LS and RS) arranged behind or to a listener's back and one low-frequency effects channel (also referred to as LFE). The three loudspeakers arranged in the front (L, C, R) are in the following also referred to as front loudspeakers. The loudspeakers arranged behind and in the back of the listener (LS, RS) are in the following also referred to as back loudspeakers.
In addition, it is to be noted that for reasons of convenience, the following details and explanations refer to 5.1 systems. The following details may, of course, also be applied to other multi-channel systems, with only small modifications to be made.
Multi-channel systems (such as a 5.1 multi-channel audio system) provide several well-known advantages over two-channel stereo reproduction. This is exemplified by the following advantages:                Advantage 1: improved front image stability, even of or out of the optimal (central) listening position. The “sweet spot” is enlarged by means of the center channel. The term “sweet spot” denotes an area of listening positions where an optimal sound impression may be perceived (by a listener).        Advantage 2: Establishing a better approximation of a concert hall impression or experience. Increased experience of “envelopment” and spaciousness is obtained by the rear-channel loudspeakers or the back channel loudspeakers.        
Nevertheless, there is still a large amount of legacy audio contents consisting of only two (“stereo”) audio channels such as on compact discs. Even very old recordings and old films and TV series are sold on CDs and/or DVDs that are available in mono quality and/or by means of a one-channel “mono” audio signal only.
Therefore, there are options for the playback of mono legacy audio material via a 5.1 multi-channel setup:                Option 1: Reproduction or playback of the mono channel through the center or through the center loudspeaker so as to obtain a true mono source.        Option 2: Reproduction or playback of the mono signal over the L and R loudspeakers (i.e. over the front left loudspeaker and the front right loudspeaker). This approach produces a phantom mono source having a wider perceived source width than a true mono source but having a tendency towards the loudspeaker closest to the listener when the listener is not seated in or at the sweet spot.        This method may also be used if a two-channel playback system is available only, and it makes no use of the extended loudspeaker setup (such as a loudspeaker setup with 5 or 6 loudspeakers). The C loudspeaker or center loudspeaker, the LS loudspeaker or rear left loudspeaker, the RS loudspeaker or rear right loudspeaker and the LFE loudspeaker or low-frequency effects channel loudspeaker remain unused.        Option 3: A method may be employed for converting the channel of the mono signal to a multi-channel signal using all of the 5.1 loudspeakers (i.e. all six loudspeakers used in a 5.1 multi-channel system). In this manner, the multi-channel signal benefits from the previously discussed advantages of the multi-channel setup. The method may be employed in real time or “on the fly” or by means of preprocessing and is referred to as upmix process or “upmixing”.        
With respect to audio quality or sound quality, option 3 provides advantages over option 1 and option 2. Particularly with respect to the signal generated for feeding the rear loudspeakers, however, the signal processing required is not obvious.
In literature, two different concepts for an upmix method or upmix process are described. These concepts are the “direct/Ambient Concept” and the “In-the-band Concept”. The two concepts stated will be described in the following.
Direct/Ambient Concept
The “direct sound sources” are reproduced or played back through the three front channels such that they are perceived at the same position as in the original two-channel version. The term “direct sound source” is used here so as to describe sound coming solely and directly from one discrete sound source (e.g. an instrument) and exhibiting little or no additional sound, for example due to reflections from the walls.
In this scenario, the sound or the noise fed to the rear loudspeakers should only consist of ambience-like sound or ambience-like noise (that may or may not be present in the original recording). Ambience-like sound or ambience-like noise is not associated with one single sound source or noise source but contributes to the reproduction or playback of the acoustical environment (room acoustics) of a recording or to the so-called “envelopment feeling” of the listener. Ambience-like sound or ambience-like noise is further sound or noise from the audience at live performances (such as applause) or environmental sound or environmental noise added by artistic intent (such as recording noise, birdsong, cricket chirping sounds).
For illustration, FIG. 7 represents the original two-channel version (of an audio recording). FIG. 8 shows an upmixed rendition using the Direct/Ambient Concept.
In-the-Band Concept
Following the surrounding concept, often referred to as “In-the-band Concept”, each sound or noise (direct sound as well as ambient noise) may be completely and/or arbitrarily positioned around the listener. The position of the noise or sound is independent of its properties (direct sound or direct noise or ambient sound or ambient noise) and depends on the specific design of the algorithm and its parameter settings only.
FIG. 9 represents the surrounding concept.
Summing up, FIGS. 7, 8 and 9 show several playback concepts. Here, FIGS. 7, 8 and 9 describe where the listener perceives the origin of the sound (as a dark plotted area). FIG. 7 describes the acoustical perception during stereo playback. FIG. 8 describes the acoustical perception and/or sound localization using the Direct/Ambient Concept. FIG. 9 describes the sound perception and/or sound localization using the surrounding concept.
The following section gives an overview over the conventional approaches regarding upmixing a one-channel or two-channel signal to form a multi-channel version. The literature teaches several methods for upmixing one-channel signals and multi-channel signals.
Non-Signaladaptive Methods
Most methods for generating a so-called “pseudo stereophonic” signal are non-signaladaptive. This means that they process any mono signal in the same manner, irrespectively of the contents of the signal. These systems often operate with simple filter structures and/or time delays so as to decorrelate the generated signals. An overall survey of such system may be found, for example, in [1].
Signaladaptive Methods
Matrix decoders (such as the Dolby Pro Logic II decoder, described in [2], the DTS NEO:6 decoder, described, for example, in [3] or the Harman Kardon/Lexicon Logic 7 decoder, described, for example, in [4]) are contained in almost every audio/video receiver currently sold. As a by-product of their actual or intended function, these matrix decoders are capable of performing blind upmixing.
The decoders mentioned use inter-channel differences and signaladaptive steering mechanisms so as to create multi-channel output signals.
Ambience Extraction and Synthesis from Stereo Signals for Multi-Channel Audio Upmixing
Avendano and Jot propose a frequency-domain technique so as to identify and extract the ambience information in stereo audio signals (see [5]).
The method is based on calculating an inter-channel-coherence index and a non-linear mapping function that is to enable the determination of time-frequency regions mainly consisting of ambience components or ambience portions in the two-channel signal. Then, ambience signals are synthesized and used to feed the surround channels of a multi-channel playback system.
A Method for Converting Stereo Sound to Multi-Channel Sound
Irwan and Aarts show a method for converting a signal from a stereo representation to a multi-channel representation (see [6]). The signal for the surround channels is calculated using a cross-correlation technique. A principal component analysis (PCA) is used for calculating a vector indicating the direction of the dominant signal. This vector is then mapped from a two-channel representation to a three-channel representation so as to generate the three front channels.
Ambience-Based Upmixing
Soulodre shows a system that generates a multi-channel signal from a stereo signal (see [7]). The signal is decomposed into so-called “individual source streams” and “ambience streams”. Based on these streams, a so-called “aesthetic engine” synthesizes the multi-channel output. However, no further technical details regarding the decomposition step and the synthesis step are given.
Pseudostereophony Based on Spatial Cues
A quasi-signaladaptive pseudo-stereophonic process is described by Faller in [1]. This method uses a mono signal and given stereo recordings of the same signal. Additional spatial information or spatial cues are extracted from the stereo signal and used to convert the mono signal to a stereo signal.