It is generally known to encode audio and/or video signals by using a certain encoding method to obtain an encoded version of the original time signal, wherein the encoded version of the original time signal should differ basically from the original time signal in that the amount of data of the encoded signal is smaller than the amount of data of the original time signal. In such a case, the encoding algorithm for obtaining the encoded signal from the original signal and also the decoding algorithm that is essentially a reversal of the encoding algorithm referred to as a data reduced encoding algorithm.
Different encoding algorithms exists for data reduction of audio signals, which are subject of a series of international standards, wherein the encoding algorithm MPEG-2 AAC, for example, is described in detail in the international standard ISO/IEC 13818-7.
In the following, reference will be made to FIG. 8, which shows a block diagram of a MPEG audio encoding method. Such an audio encoder typically comprises an audio input 70, where a stream of time discrete samples is fed in, that are PCM samples, for example, that are 16 bit wide, for example. In an analysis filter bank 71, the stream of time discrete audio samples is divided into encoding blocks or frames of samples, windowed by using a respective window function and then converted into a spectral representation, for example by a filter bank or by a Fourier transform or a variation of the Fourier transform, such as a modified discrete cosine transform (MDCT). Thus, subsequent encoding blocks or frames of spectral coefficients are present at the output of the analysis filter bank 71, wherein a block of spectral coefficients is the spectrum of an encoding block of audio samples. Often, a 50% overlapping of subsequent encoding blocks is used, so that one window of, for example, 2048 audio samples is viewed per block, and by this processing 1024 new spectral coefficients will be generated.
The time discrete audio signal at input 70 will further be fed into a psychoacoustic model 72 to obtain a data reduction, such that, as is known, the masking threshold of the audio signal will be calculated depending on the frequency to perform a quantization of the spectral coefficients in a block 73 denoted with quantization and encoding, which depends on the masking threshold.
In other words, quantizing the spectral coefficients will be performed so coarsely, that the quantization noise introduced thereby lies below the psychoacoustic masking threshold, which is calculated by the psychoacoustic model 72, so that the quantization noise is ideally inaudible. This procedure causes that typically a certain number of spectral coefficients that are unequal to 0 at the output of the analysis filter bank 71 will be set to 0 after quantization, since the psychoacoustic model 72 has established that they will be masked by adjacent spectral coefficients and are therefore inaudible.
After quantizing a spectral representation of the encoding block of time discrete samples is present, wherein the quantization noise is, if possible, below the psychoacoustic masking threshold. These data reduced quantized spectral values can then be encoded without any loss, depending on the encoder that will be used, by using an entropy encoding, for example a Huffman encoding. Thereby, a stream of code words will be obtained, to which side information needed by a decoder will be added in a bit stream multiplexer 74, such as information regarding the analysis filter bank, information regarding the quantization, such as scale factors, or side information regarding further function blocks. Such further function blocks are in MPEG-2-AAC for example TNS processing, intensity stereo processing, center/side stereo processing or a prediction from spectrum to spectrum.
At an output 75 of the encoder, also referred to as bit stream output, the signal encoded according to the encoding algorithm shown in FIG. 8 will be present blockwise.
In the case of the decoder, the encoded signal will be fed into a bit stream input 80 of a decoder shown in FIG. 9, at the output 75 of the encoder shown in FIG. 8, which first carries out a bit stream demultiplex operation in a block 81 denoted as a bit stream demultiplexer, to separate the spectral data from the side information. At the output of block 81 then the code words will be present again, which represent the individual spectral coefficients. By using a respective table, the code words will be decoded to obtain quantized spectral values. These quantized spectral values will then be processed in a block 82 denoted with “inverse quantization” to recalculate the quantization introduced in block 73 (FIG. 8). At the output of block 82, dequantized spectral coefficients will be present, which will now be converted into the time domain via a synthesis filter bank 83, working inverse to a analysis filter bank 71 (FIG. 8), to obtain the decoded signal at an audio output 84.
When considering the encoding/decoding concept illustrated in FIGS. 8 and 9, it becomes clear that this is a block-oriented method, wherein the block generation is caused by the analysis filter bank block 71 of FIG. 8, and wherein the block forming will only be cancelled at the audio output 84 of the decoder shown in FIG. 9.
It further becomes clear, that this is a lossy encoding concept, since the decoded signal present the audio output 84 generally comprises less information than the original time signal present at the audio input 70. By the quantizer 73, controlled by the psychoacoustic model 72, information will be removed from the original time signal present at the audio input 70, which will not be added again in the decoder, but will be abandoned. Subjectively, this abandonment of information does, however, not lead to any quality losses in the ideal case, due to the psychoacoustic model 72 that is adapted to the human hearing properties, but merely to a wanted data compression.
Here, it should be noted, that the encoding concept described in FIG. 8 and FIG. 9 with the example of an audio signal, will be used correspondingly for image or video signals, wherein instead of the timely audio signal a video signal is present, wherein the spectral representation is no audio-frequency spectrum, but a location spectrum. Otherwise, an analysis filter bank or transform, respectively, a psycho optic model, a thereby controlled quantization and entropy encoding also take place in the video signal compression, wherein the whole encoding/decoding concept also runs blockwise.
The decoded signal (in the example of FIG. 9 the decoded audio signal at the audio output 84) is typically again a stream of time discrete samples based on an encoding block raster that is, however, generally, not visible in the decoded signal, except when special measures are taken.
While the process of the decoding is the normal case in the application, namely the transmission and storage of audio and/or image signals, there are, however, cases where it is of interest to “retranslate” a given decoded signal into a bit stream representation. This is especially of interest in the following cases, when only the decoded signal is available.
On the one hand, there is often a need to examine encoding systems with reference to the signals, which are encoded and re-decoded by them, for example to find out why a still unknown encoder sounds so well.
On the other hand, there is a need in the area of copyright protection to prove beyond doubt that a piece of music or an image has been encoded originally with a certain encoder.
Finally, there is a need in the area of transmission, for example via several networks, to encode a decoded signal again. In this case, the encoder/decoder concept shown in FIG. 8 and FIG. 9 is performed several times subsequently on an original audio time signal. There are problems in that so-called tandem encoding distortions of subsequent codec stages will be introduced.
In the specialist publication “NMR Measurements on Multiple Generations Audio Coding”, Michael Keyhl, Jürgen Herre, Christian Schmidmer, 96. AES-Meeting, 26th Feb. to 1st Mar. 1994, Amsterdam, Preprint 3803, it is suggested to introduce an identification mark into a decoded signal for reducing the tandem encoding distortions, subsequent encoder stages being able to access this mark to perform its encoding block division of the signal to be encoded/decoded again based on this identification mark, such that all codec stages in a chain of codec stages use the same encoding block raster.
Although this method reduces the tandem encoding distortions significantly, it is still disadvantageous, in that the identification mark has to be introduced by a decoder and has to be extracted again and interpreted by a subsequent encoder. Therefore, changes both at a decoder and at an encoder are necessary. Further, this concept is of course only applicable for a tandem encoding of decoded signals having this identification mark for the encoding block raster. For signals that do not have this identification mark, a codec stage in a chain of codec stages can, of course, not access an identification mark.
Similar problems or restrictions of the flexibility occur also with the MOLE concept, described in “ISO/MPEG Layer 2—Optimum re-Encoding of Decoded Audio using a MOLE-Signal”, John Fletcher, 104th AES-Convention, 16th to 19th May 1998, Preprint No. 4706. Generally, additional data describing in detail, in which way the present decoded audio signal has been encoded and decoded, are introduced into the decoded audio signal. These data are referred to as MOLE signal. When the decoded audio signal has to be re-encoded, a specially designed encoder will extract this MOLE signal from the signal to be encoded, and perform the individual encoding steps based on this signal.
Similar to the concept of the identification mark, there is also a disadvantage that the decoder decoding an originally encoded time signal for the first time has to introduce the signal into the decoded audio signal. Such a decoder is thus different to common standard decoders. Further, an encoder re-encoding a decoded signal will have to extract the determination signal in order to work correspondingly. This so-called second encoder also has to be modified such that it can read and interpret the determination signal. Finally, disadvantageously, this concept is also only applicable for decoded signals having such a determination signal, but not for signals that do not have such a determination signal.
For the quantization (block 73 in FIG. 8), a significant effort is made in the calculation of scale factors, for example by lying the quantization noise introduced by quantization in a psychoacoustic encoder below the psychoacoustic masking threshold of the audio signal at the audio input 70. Thereby, it has to be taken into consideration at the same time that a certain bit stream rate can be necessary at the output. Finally, there is also the general aim to compress the audio signal as strongly as possible, essentially without deterioration of the audio quality.
In the international standard MPEG-2 AAC that has already been mentioned in the beginning, one possible quantization method is described in paragraph B.2.7, wherein an expensive iterative method with an outer iteration loop and an inner iteration loop is used to calculate optimum scale factors for each scale factor band and thus the optimum quantization step width for all three conditions.
Thus, calculating the iteration loops for determining the quantization step width takes up a significant part of the computing effort when encoding an audio signal.
When, for example, in the case of tandem encoding, a signal has been encoded and re-decoded, and will be re-encoded, normally the full quantization has to be calculated again by using the psychoacoustic model, the inner iteration loop and the outer iteration loop, even when the encoding block raster underlying the signal to be processed is known. This is in so far unsatisfactory, since the quantization parameters have already been calculated in the earlier encoding of the original time signal. There are, however, no explicit references in the re-decoded signal that can be used in a further encoding in order to do without the expensive calculation of scale factors and thus the quantization step width.