In the last years there has been an increasing demand for transmitting and storing encoded audio information. There is also an increasing demand for an audio encoding and an audio decoding of audio signals comprising both speech and general audio (like, for example, music, background noise, and the like).
In order to improve the coding quality and also in order to improve a bitrate efficiency, switched (or switching) audio codecs have been introduced which switch between different coding schemes, such that, for example, a first frame is encoded using a first encoding concept (for example, a CELP-based coding concept), and such that a subsequent second audio frame is encoded using a different second coding concept (for example, an MDCT-based coding concept). In other words, there may be a switching between an encoding in a linear-prediction-coding domain (for example, using a CELP-based coding concept) and a coding in a frequency domain (for example, a coding which is based on a time-domain-to-frequency-domain transform or a frequency-domain-to-time-domain transform, like, for example, an FFT transform, an inverse FFT transform, an MDCT transform or an inverse MDCT transform). For example, the first coding concept may be a CELP-based coding concept, an ACELP-based coding concept, a transform-coded-excitation-linear-prediction-domain based coding concept, or the like. The second coding concept may, for example, be a FFT-based coding concept, a MDCT-based coding concept, an AAC-based coding concept or a coding concept which can be considered as a successor concept of the AAC-based coding concept.
In the following, some examples of conventional audio coders (encoders and/or decoders) will be described.
Switched audio codecs, like, for example, MPEG USAC, are based on two main audio coding schemes. One coding scheme is, for example, a CELP codec, targeted for speech signals. The other coding scheme is, for example, an MDCT-based codec (simply called MDCT in the following), targeted for all other audio signals (for example, music, background noise). On mixed content signals (for example, speech over music), the encoder (and consequently also the decoder) often switches between the two encoding schemes. It is then necessitated to avoid any artifacts (for example, a click due to a discontinuity) when switching from one mode (or encoding scheme) to another.
Switched audio codecs may, for example, comprise problems which are caused by CELP-to-MDCT transitions.
CELP-to-MDCT transitions generally introduce two problems. Aliasing can be introduced due to the missing previous MDCT frame. A discontinuity can be introduced at the border between the CELP frame and the MDCT frame, due to the non-perfect waveform coding nature of the two coding schemes operating at low/medium bitrates.
Several approaches already exist to solve the problems introduced by the CELP-to-MDCT transitions, and will be discussed in the following.
A possible approach is described in the article “Efficient cross-fade windows for transitions between LPC-based and non-LPC based audio coding” by Jeremie Lecomte, Philippe Gournay, Ralf Geiger, Bruno Bessette and Max Neuendorf (presented at the 126-th AES Convention, May 2009, paper 771). This article describes an approach in section 4.4.2 “ACELP to non-LPD mode”. Reference is also made, for example, to FIG. 8 of said article. The aliasing problem is solved first by increasing the MDCT length (here from 1024 to 1152) such that the MDCT left folding point is moved at the left of the border between the CELP and the MDCT frames, then by changing the left-part of the MDCT window such that the overlap is reduced, and finally by artificially introducing the missing aliasing using the CELP signal and an overlap-and-add operation. The discontinuity problem is solved at the same time by the overlap-and-add operation.
This approach works well but has the disadvantage to introduce a delay in the CELP decoder, the delay being equal to the overlap length (here: 128 samples).
Another approach is described in U.S. Pat. No. 8,725,503 B2, dated May 13, 2014 and titled “Forward time domain aliasing cancellation with application in weighted or original signal domain” by Bruno Bessette.
In this approach, the MDCT length is not changed (nor the MDCT window shape). The aliasing problem is solved here by encoding the aliasing correction signal with a separate transform-based encoder. Additional side-information bits are sent into the bitstream. The decoder reconstructs the aliasing correction signal and adds it to the decoded MDCT frame. Additionally, the zero input response (ZIR) of the CELP synthesis filter is used to reduce the amplitude of the aliasing correction signal and to improve the coding efficiency. The ZIR also helps to reduce significantly the discontinuity problem.
This approach also works well but the disadvantage is that it necessitates a significant amount of additional side-information and the number of bits necessitated is generally variable which is not suitable for a constant-bitrate codec.
Another approach is described in US patent application US 2013/0289981 A1 dated Oct. 31, 2013 and titled “Low-delay sound-encoding alternating between predictive encoding and transform encoding” by Stephane Ragot, Balazs Kovesi and Pierre Berthet. According to said approach, the MDCT is not changed, but the left-part of the MDCT window is changed in order to reduce the overlap length. To solve the aliasing problem, the beginning of the MDCT frame is coded using a CELP codec, and then the CELP signal is used to cancel the aliasing, either by replacing completely the MDCT signal or by artificially introducing the missing aliasing component (similarly to the above mentioned article by Jeremie Lecomte et al.). The discontinuity problem is solved by the overlap-add operation if an approach similar to the article by Jeremie Lecomte et al. is used, otherwise it is solved by a simple cross-fade operation between the CELP signal and the MDCT signal.
Similarly to U.S. Pat. No. 8,725,503 B2, this approach generally works well but the disadvantage is that it necessitates a significant amount of side-information, introduced by the additional CELP.
In view of the above described conventional solutions, there is a desire to have a concept which comprises improved characteristics (for example, an improved tradeoff between bitrate overhead, delay and complexity) for switching between different coding modes.