Embodiments according to the invention are related to an apparatus for generating an enhanced downmix signal, to a method for generating an enhanced downmix signal and to a computer program for generating an enhanced downmix signal.
An embodiment according to the invention is related to an enhanced downmix computation for spatial audio microphones.
Recording surround sound with a small microphone configuration remains a challenge. One of the most widely known such configuration is a Soundfield microphone and corresponding surround decoders (see, for example, reference [3]), which filter and combine its four nearly-coincident microphone capsule signals to generate the surround sound output channels. While high single channel signal fidelity is maintained, the weakness of this approach is its limited channel separation related to limited directivity of first order microphone directional responses.
Alternatively, techniques based on a parametric representation of the observed sound field can be applied. In reference [2], a method has been proposed using conventional coincident stereo microphone pairs to record surround sound. It was shown how to estimate the spatial cue parameters direct-to-diffuse-sound-ratios and directions-of-arrival of sound from these directional microphone signals and how to apply this information to drive a spatial audio coding synthesis to generate surround sound. In reference [2] it has also been discussed, how the parametric information, i.e., direction-of-arrival (DOA) of sound and the diffuse-sound-ratio (DSR) of the sound field can be used to directly computing the specific spatial parameters that are used in MPEG Surround (MPS) coding scheme (see, for example, reference [6]).
MPEG Surround is parametric representation of multi-channel audio signals, representing an efficient approach to high-quality spatial audio coding. MPS exploits the fact that, from a perceptual point of view, multi-channel audio signals contain significant redundancy with respect to the different loudspeaker channels. The MPS encoder takes multiple loudspeaker signals as input, where the corresponding spatial configuration of the loudspeakers has to be known in advance. Based on these input signals, the MPS encoder computes spatial parameters in frequency subbands, such as channel level differences (CLD) between two channels and inter channel correlation (ICC) between two channels. The actual MPS side information is then derived from these spatial parameters. Furthermore, the encoder computes a downmix signal, which could consist of one or more audio channels.
It has been found out that the stereo microphone input signals are well suitable to estimate the spatial cue parameters. However, it has also been found out that the unprocessed stereo microphone input signal is in general not well suitable to be directly used as the corresponding MPEG Surround downmix signal. It has been found that in many cases, crosstalk between left and right channels is too high, resulting in a poor channel separation in the MPEG Surround decoded signals.
In view of this situation, there is a need for a concept for generating an enhanced downmix signal on the basis of a multi-channel microphone signal, such that the enhanced downmix signals leads to a sufficiently good spatial audio quality and localization property after MPEG Surround decoding.