Audio processing has advanced in many ways. In particular, surround systems have become more and more important. However, most music recordings are still encoded and transmitted as a stereo signal and not as a multi-channel signal. As surround systems comprise a plurality of loudspeakers, e.g. four or five speakers, it has been subject of many studies which signals should be provided to the plurality of loudspeakers, when there are only two input signals available.
In this context, format conversion of stereo signals for playback using surround sound systems, i.e. upmixing, plays an important role. The term “m-to -n upmixing describes the conversion of an m-channel audio signal to an audio signal with n-channels, where n>m. Two concepts of upmixing are widely known: upmixing with additional information guiding the upmix process and unguided (“blind”) upmixing without the use of any side information, which is focused on here.
In the literature, two different approaches for an upmix process are reported. These concepts are the direct/ambient approach and the “in-the-band”-approach. The core component of direct/ambience-based techniques is the extraction of an ambient signal which is fed into the rear channels of a multi-channel surround sound signal. Ambient sounds are those forming an impression of a (virtual) listening environment, including room reverberation, audience sounds (e.g. applause), environmental sounds (e.g. rain), artistically intended effect sounds (e.g. vinyl crackling) and background noise. The reproduction of ambience using the rear channels evokes an impression of envelopment (being “immersed in sound”) by the listener. Additionally, the direct sound sources are distributed among the front channels according to their position in the stereo panorama.
The “In-the-band”-approach aims at positioning all sounds (direct sound as well as ambient sounds) around the listener using all available loudspeakers. The positions of the sound sources perceived when reproducing upmixed format is ideally a function of their perceived positions in the stereo input signal. This approach can be implemented using the proposed signal processing.
Various approaches to upmixing in the frequency-domain have been developed in the past [9, 10]. They attempt a decomposition of the input signal and to direct and ambient signal component and a decomposition based on the spatial positions of the sound sources. Ambient signal components are identified based on measures of inter-channel coherence between the left and right channel. Direction-based decomposition is achieved based on the similarity of the magnitudes of the spectral coefficients. The patent application US 2009/0080666 describes a method for extracting an ambient signal using spectral weighting.
US 2010/0030563 describes a method for extracting an ambient signal for the application of upmixing. The method uses spectral subtraction. The time-frequency domain representation is obtained from the difference of the time-frequency-domain representation of the input signal and a compressed version of it, advantageously computed using non-negative matrix factorization.
US 2010/0296672 describes a frequency-domain upmix method using a vector-based signal decomposition. The decomposition aims at the extraction of a centered channel in contrast to a direct/ambient-signal decomposition [13]. An output signal for the center channel is computed which contains all information which is common to the left and right input channel signals. The residual signal of input signals and the center channel signals are computed for the left and right output channel signals.