The present application is concerned with frequency-domain audio coding supporting transform length switching.
Modern frequency-domain speech/audio coding systems such as the Opus/Celt codec of the IETF [1], MPEG-4 (HE-)AAC [2] or, in particular, MPEG-D xHE-AAC (USAC) [3], offer means to code audio frames using either one long transform—a long block—or eight sequential short transforms—short blocks—depending on the temporal stationarity of the signal.
For certain audio signals such as rain or applause of a large audience, neither long nor short block coding yields satisfactory quality at low bitrates. This can be explained by the density of prominent transients in such recordings; coding only with long blocks can cause frequent and audible time-smearing of the coding error, also known as pre-echo, whereas coding only with short blocks is generally inefficient due to increased data overhead, leading to spectral holes.
Accordingly, it would be favorable to have a frequency-domain audio coding concept at hand which supports transform lengths which are also suitable for the just-outlined kinds of audio signals. Naturally, it would be feasible to build-up a new frequency-domain audio codec supporting switching between a set of transform lengths which, inter alias, encompasses a certain wanted transform length suitable for a certain kind of audio signal.
However, it is not an easy task to get a new frequency-domain audio codec adopted in the market. Well-known codecs are already available and used frequently. Accordingly, it would be favorable to be able to have a concept at hand which enables existing frequency-domain audio codecs to be extended in a way so as to additionally support a wanted, new transform length, but which, nevertheless, keeps backward compatibility with existing coders and decoders.