The present invention relates to the field of audio encoding/decoding, especially to spatial audio coding and spatial audio object coding, e.g. the field of 3D audio codec systems. Embodiments of the invention relate to a method for processing an audio signal in accordance with a room impulse response, to a signal processing unit, a binaural renderer, an audio encoder and an audio decoder.
Spatial audio coding tools are well-known in the art and are standardized, for example, in the MPEG-surround standard. Spatial audio coding starts from a plurality of original input, e.g., five or seven input channels, which are identified by their placement in a reproduction setup, e.g., as a left channel, a center channel, a right channel, a left surround channel, a right surround channel and a low frequency enhancement channel. A spatial audio encoder may derive one or more downmix channels from the original channels and, additionally, may derive parametric data relating to spatial cues such as interchannel level differences in the channel coherence values, interchannel phase differences, interchannel time differences, etc. The one or more downmix channels are transmitted together with the parametric side information indicating the spatial cues to a spatial audio decoder for decoding the downmix channels and the associated parametric data in order to finally obtain output channels which are an approximated version of the original input channels. The placement of the channels in the output setup may be fixed, e.g., a 5.1 format, a 7.1 format, etc.
Also, spatial audio object coding tools are well-known in the art and are standardized, for example, in the MPEG SAOC standard (SAOC=spatial audio object coding). In contrast to spatial audio coding starting from original channels, spatial audio object coding starts from audio objects which are not automatically dedicated for a certain rendering reproduction setup. Rather, the placement of the audio objects in the reproduction scene is flexible and may be set by a user, e.g., by inputting certain rendering information into a spatial audio object coding decoder. Alternatively or additionally, rendering information may be transmitted as additional side information or metadata; rendering information may include information at which position in the reproduction setup a certain audio object is to be placed (e.g. over time). In order to obtain a certain data compression, a number of audio objects is encoded using an SAOC encoder which calculates, from the input objects, one or more transport channels by downmixing the objects in accordance with certain downmixing information. Furthermore, the SAOC encoder calculates parametric side information representing inter-object cues such as object level differences (OLD), object coherence values, etc. As in SAC (SAC=Spatial Audio Coding), the inter object parametric data is calculated for individual time/frequency tiles. For a certain frame (for example, 1024 or 2048 samples) of the audio signal a plurality of frequency bands (for example 24, 32, or 64 bands) are considered so that parametric data is provided for each frame and each frequency band. For example, when an audio piece has 20 frames and when each frame is subdivided into 32 frequency bands, the number of time/frequency tiles is 640.
In 3D audio systems it may be desired to provide a spatial impression of an audio signal as if the audio signal is listened to in a specific room. In such a situation, a room impulse response of the specific room is provided, for example on the basis of a measurement thereof, and is used for processing the audio signal upon presenting it to a listener. It may be desired to process the direct sound and early reflections in such a presentation separated from the late reverberation.
It is the object underlying the present invention to provide an approved approach for separately processing the audio signal with an early part and a late reverberation of the room impulse response allowing to achieve a result being perceptually as far as possible identical to the result of a convolution of the audio signal with the complete impulse response.