The present invention is concerned with a codec supporting a time-domain aliasing cancellation transform coding mode and a time-domain coding mode as well as forward aliasing cancellation for switching between both modes.
It is favorable to mix different coding modes in order to code general audio signals representing a mix of audio signals of different types such as speech, music or the like. The individual coding modes may be adapted for particular audio types, and thus, a multi-mode audio encoder may take advantage of changing the encoding mode over time corresponding to the change of the audio content type. In other words, the multi-mode audio encoder may decide, for example, to encode portions of the audio signal having speech content, using a coding mode especially dedicated for coding speech, and to use another coding mode in order encode different portions of the audio content representing non-speech content such as music. Time-domain coding modes such as codebook excitation linear prediction coding modes, tend to be more suitable for coding speech contents, whereas transform coding modes tend to outperform time-domain coding modes as far as the coding of music is concerned, for example.
There have already been solutions for addressing the problem of coping with the coexistence of different audio types within one audio signal. The currently emerging USAC, for example, suggests switching between a frequency domain coding mode largely complying with the AAC standard, and two further linear prediction modes similar to sub-frame modes of the AMR-WB plus standard, namely a MDCT (Modified Discrete Cosine Transformation) based variant of the TCX (TCX=transform coded excitation) mode and an ACELP (adaptive codebook excitation linear prediction) mode. To be more precise, in the AMR-WB+ standard, TCX is based on a DFT transform, but in USAC TCX has a MDCT transform base. A certain framing structure is used in order to switch between FD coding domain similar to AAC and the linear prediction domain similar to AMR-WB+. The AMR-WB+ standard itself uses an own framing structure forming a sub-framing structure relative to the USAC standard. The AMR-WB+ standard allows for a certain sub-division configuration sub-dividing the AMR-WB+ frames into smaller TCX and/or ACELP frames. Similarly, the AAC standard uses a basis framing structure, but allows for the use of different window lengths in order to transform code the frame content. For example, either a long window and an associated long transform length may be used, or eight short windows with associated transformations of shorter length.
MDCT causes aliasing. This is, thus, true, at TCX and FD frame boundaries. In other words, just as any frequency domain coder using MDCT, aliasing occurs at the window overlap regions, that is cancelled by the help of the neighbouring frames. That is, for any transitions between two FD frames or between two TCX (MDCT) frames or transition between either FD to TCX or TCX to FD, there is an implicit aliasing cancellation by the overlap/add procedure within the reconstruction at the decoding side. Then, there is no more aliasing after the overlap add. However, in case of transitions with ACELP, there is no inherent aliasing cancellation. Then, a new tool has to be introduced which may be called FAC (forward aliasing cancellation). FAC is to cancel the aliasing coming from the neighbouring frames if they are different from ACELP.
In other words, aliasing cancellation problems occur whenever transitions between transform coding mode and time domain coding mode, such as ACELP, occur. In order to perform the transformation from the time domain to the spectral domain as effective as possible. time-domain aliasing cancellation transform coding is used, such as MDCT, i.e. a coding mode using a overlapped transform where overlapping windowed portions of a signal are transformed using a transform according to which the number of transform coefficients per portion is less than the number of samples per portion so that aliasing occurs as far as the individual portions are concerned, with this aliasing being cancelled by time-domain aliasing cancellation, i.e. by adding the overlapping aliasing portions of neighboring re-transformed signal portions. MDCT is such a time-domain aliasing cancellation transform. Disadvantageously, the TDAC (time-domain aliasing cancellation) is not available at transitions between the transform coding (TC) coding mode and the time-domain coding mode.
In order to solve this problem, forward aliasing cancellation (FAC) may be used according to which the encoder signals within the data stream additional FAC data within a current frame whenever a change in the coding mode from transform coding to time-domain coding occurs. This, however, necessitates the decoder to compare the coding modes of consecutive frames in order to ascertain as to whether the currently decoded frame comprises FAC data within its syntax or not. This, in turn, means that there may be frames for which the decoder may not be sure as to whether the decoder has to read or parse FAC data from the current frame or not. In other words, in case that one or more frames were lost during transmission, the decoder does not know for the immediately succeeding (received) frames as to whether a coding mode change occurred or not, and as to whether the bit stream of the current frame encoded data contains FAC data or not. Accordingly, the decoder has to discard the current frame and wait for the next frame. Alternatively, the decoder may parse the current frame by performing two decoding trials, one assuming that FAC data is present, and another assuming that FAC data is not present, with subsequently deciding as to whether one of both alternatives fails. The decoding process would most likely make the decoder crash in one of the two conditions. That is, in reality, the latter possibility is not a feasible approach. The decoder should at any time know how to interpret the data and not rely on its own speculation on how to treat the data.