The present invention relates to audio signal encoding and decoding and, in particular, to audio signal processing using parallel frequency domain and time domain encoder/decoder processors.
The perceptual coding of audio signals for the purpose of data reduction for efficient storage or transmission of these signals is a widely used practice. In particular when lowest bit rates are to be achieved, the employed coding leads to a reduction of audio quality that often is primarily caused by a limitation at the encoder side of the audio signal bandwidth to be transmitted. Here, typically the audio signal is low-pass filtered such that no spectral waveform content remains above a certain pre-determined cut-off frequency.
In contemporary codecs well-known methods exist for the decoder-side signal restoration through audio signal Bandwidth Extension (BWE), e.g. Spectral Band Replication (SBR) that operates in frequency domain or so-called Time Domain Bandwidth Extension (TD-BWE) being is a post-processor in speech coders that operates in time domain.
Additionally, several combined time domain/frequency domain coding concepts exist such as concepts known under the term AMR-WB+ or USAC.
All these combined time domain/coding concepts have in common that the frequency domain coder relies on bandwidth extension technologies which incur a band limitation into the input audio signal and the portion above a cross-over frequency or border frequency is encoded with a low resolution coding concept and synthesized on the decoder-side. Hence, such concepts mainly rely on a pre-processor technology on the encoder side and a corresponding post-processing functionality on the decoder-side.
Typically, the time domain encoder is selected for useful signals to be encoded in the time domain such as speech signals and the frequency domain encoder is selected for non-speech signals, music signals, etc. However, specifically for non-speech signals having prominent harmonics in the high frequency band, the known frequency domain encoders have a reduced accuracy and, therefore, a reduced audio quality due to the fact that such prominent harmonics can only be separately parametrically encoded or are eliminated at all in the encoding/decoding process.
Furthermore, concepts exist in which the time domain encoding/decoding branch additionally relies on the bandwidth extension which also parametrically encodes an upper frequency range while a lower frequency range is typically encoded using an ACELP or any other CELP related coder, for example a speech coder. This bandwidth extension functionality increases the bitrate efficiency but, on the other hand, introduces further inflexibility due to the fact that both encoding branches, i.e., the frequency domain encoding branch and the time domain encoding branch are band limited due to the bandwidth extension procedure or spectral band replication procedure operating above a certain crossover frequency substantially lower than the maximum frequency included in the input audio signal
Relevant topics in the state-of-art comprise                SBR as a post-processor to waveform decoding [1-3]        MPEG-D USAC core switching [4]        MPEG-H 3D IGF [5]        
The following papers and patents describe methods that are considered to constitute known technology for the application:    [1] M. Dietz, L. Liljeryd, K. Kjörling and O. Kunz, “Spectral Band Replication, a novel approach in audio coding,” in 112th AES Convention, Munich, Germany, 2002.    [2] S. Meltzer, R. Böhm and F. Henn, “SBR enhanced audio codecs for digital broadcasting such as “Digital Radio Mondiale” (DRM),” in 112th AES Convention, Munich, Germany, 2002.    [3] T. Ziegler, A. Ehret, P. Ekstrand and M. Lutzky, “Enhancing mp3 with SBR: Features and Capabilities of the new mp3PRO Algorithm,” in 112th AES Convention, Munich, Germany, 2002.    [4] MPEG-D USAC Standard.    [5] PCT/EP2014/065109.
In MPEG-D USAC, a switchable core coder is described. However, in USAC, the band-limited core is restricted to transmit a low-pass filtered signal. Therefore, certain music signals that contain prominent high frequency content e.g. full-band sweeps, triangle sounds, etc. cannot be reproduced faithfully.