The term “audio transcoding” usually denotes the derivation of a bit stream representing an audio signal according to a specific audio coding format from another bit stream, which is organized according to a different audio coding format. In this sense, “transcoding” denotes the full procedure of obtaining e.g. an MPEG AAC compliant bit stream from an MPEG 1 layer III (mp3) compliant bit stream.
In this document, however, the term “audio transcoding” is used in a more technical sense to describe the conversion of the audio signal from one sub-band or transform domain to another. That is, the term describes just one principal step in the conversion from one representation to another one, instead of the full procedure.
The basic principle of generic perceptual audio encoding as known from literature1 is shown in FIG. 1. 1T. Painter and A. Spanias (2000): Perceptual Coding of Digital Audio, Proceedings of the IEEE, vol. 88
Today's compression methods and formats for audio signals generally use a time-frequency analysis 102, i.e. a filter bank or a transform, to represent the parameters 110 of the audio signal 107. These parameters are subject to quantization and encoding 104, entropy coding 105 and bit stream operations 106; all of these steps are controlled by a psycho acoustic analysis 101 of the input audio signal. FIG. 2 shows a corresponding generic perceptual audio decoder with bit stream operations 201, entropy decoding 202, bit allocation 203, decoding and de-quantization 204 and finally time-frequency synthesis, which generates the time domain signal 214 from parameters 212,213.
FIGS. 1 and 2 illustrate and exemplify the basic principle of perceptual audio codecs. However, although particular implementations may differ to a certain extent, they usually employ time-frequency analysis and the inverse thereof, the time-frequency synthesis.
Focusing now on the time-frequency analysis and synthesis, the intermediate encoding and decoding steps will not be considered further.
For the time-frequency analysis 102, numerous different algorithms are used in today's audio codecs. For example, the MPEG audio codec standards include the MPEG-1 layer I and II codecs, which use a 32-band pseudo-QMF (quadrature mirror filter) filter bank, and MPEG-1 layer III (mp3) that employs a hybrid filter bank, namely a cascade of a 32-band pseudo-QMF filter bank followed by an MDCT (modified DCT) filter bank. The MDCT filtering (default 18 bins, reduced to 6 bins for transients) leads to a spectral resolution of 576 or 192 bins, respectively. The MPEG AAC codec and derivatives thereof use a full-band MDCT approach with a default resolution of 1024 bins (reduced to 256 bins for transients). Audio frames are often temporally overlapping to a certain extent, e.g. 50%, which defines the so-called frame advance (100%-overlap)*frame_size.
In the sequel, the domain between the output of the time-frequency analysis 102 and the input of the time-frequency synthesis 205 (wherein the output signal 116 of the encoder is input 206 to the decoder) will be denoted as “frequency domain” or “parameter domain”, regardless whether the specific audio coding format uses a filter bank or block transform for the time-frequency analysis.
Owing to the ever increasing number of existing and emerging audio formats, there is rising need for algorithms for transcoding audio content from one bit stream format to another. FIG. 3 shows an approach to audio transcoding that is typically used today, because it involves only available standard modules already described in FIGS. 1 and 2. The input bit stream encoded in a source format is decoded DEC_A into the continuous time domain PCM signal TD. An independent encoder ENC_B produces then a new bit stream according to the target format. The only interface between the signal processing blocks is the time domain audio signal TD that is passed from the decoder to the encoder.
Although this approach is simple to use, the following problems occur. First, since the two blocks DEC_A, ENC_B do not know from each other, the time-frequency analysis procedures may be desynchronized: in general there is a series of operations for decoding (de-quantization) and encoding (quantization) which leads to degradations of the signal quality, so-called tandem errors. Second, the computational complexity of the approach is high, so that it is desirable to reduce it significantly.
A better transcoding result can be obtained if some side information that is to a certain extent common to source and target formats is extracted by the decoder and reused in the encoder. FIG. 4a) shows an example for this approach, which can be used e.g. for transcoding from the Dolby AC-3 to the BSAC (Bit Sliced Arithmetic Coding) format2. In this particular example, the AC-3 bit allocation can be re-used to derive and control a new bit allocation 403 within the BSAC encoder. Besides re-using side information SI from the source bit stream, the time-frequency synthesis and analysis procedures are temporally synchronized. For this case, the advanced concept of FIG. 4a) reduces computational complexity as compared to the previously described transcoding scheme, and may lead to a better quality of the target signal. 2Kyoung Ho Bang, Young Cheol Park, and Dae Hee Youn (2006). Audio Transcoding Algorithm for Mobile Multimedia Application, Proc. of ICASSP, vol. 3
If (and only if) the codec formats of source and target bit stream are identical in terms of their time-frequency analysis domain, i.e. the analysis and synthesis blocks are fully complementary (e.g. transcoding of an mp3 bit stream from a given to a lower data rate), the transcoding can be further simplified as shown in FIG. 4b): the time-frequency analysis and synthesis procedures can be omitted, so that the data rate modification takes place directly in the parameter domain PD, e.g. by re-quantizing certain parameters. It is also beneficial to reuse the side information, e.g. the bit allocation, from the source bit stream.