1. Field of the Invention
The present invention relates to the encoding and decoding, respectively, of information signals and in particular to scalable encoders and scalable decoders, respectively.
2. Description of the Prior Art
Scalable encoders are shown in EP-0 846 375 B1. Scalability is generally understood as the possibility to decode a subset of a bitstream representing an encoded data signal, such as an audio signal or a video signal, into a useful signal. This property is particularly desirable when, for example, a data transmission channel does not provide the required full bandwidth for a transmission of a full bitstream. On the other hand, an incomplete decoding on a decoder with lower complexity and thus reduced costs is possible. Generally, various discrete scalability layers are defined in practice.
FIG. 4 shows the basic structure of a scalable encoder. An information signal, such as an audio signal and/or a video signal, is provided at an input 100. A first encoder 102 encodes the input signal to generate a first scaling layer at an output 103 of the first encoder 102. The encoded signal of the first encoder is further made available to a first decoder 104, which is formed to reverse the encoding in order to obtain an encoded/decoded information signal at the output of the first decoder 104. Optionally, as illustrated by the dashed border of a block 106, the encoded/decoded information signal can be converted into a spectral representation, for example, by means of an MDCT (MDCT=modified discrete cosine transform). A dashed detour line 108 in FIG. 4 indicates that the encoded/decoded input signal can also be provided directly to a comparator 110. The comparator 100 includes a first input 110a and a second input 110b. The information signal is also fed into the second input 110b, a detour line 112 being used if the detour line 108 was also active in the first branch. However, if a spectral representation is fed into the first input 110a, the information signal is also converted into a spectral representation by means of, for example, an MDCT 114 so that the same conditions are present at both inputs 110a, 110b of the comparator 110.
First, the comparator 110 forms a difference between the signal at the first input 110a and at the second input 110b and then compares whether the difference between the signals at the two inputs can be encoded better than the original information signal and the converted information signal at the second input 110b, respectively. If this is the case, it is more favorable to encode the difference signal in a second encoder 114 to obtain a second scaling layer at an output 116 of the second encoder. However, if it turns out that the difference signal needs more bits for encoding than the original signal, then processing the difference signal in the second encoder would lead to the fact that a worse encoding efficiency than absolutely necessary is obtained. For this reason, the information signal is directly passed through to the second encoder 114 in this latter case.
In the art, the mode of operation of the scalable encoder of FIG. 4, wherein in the second encoder the difference signal is encoded between the encoded/decoded information signal and the original information signal is referred to as difference operation. The mode of operation wherein the difference signal is less favorable at encoding than the actual original information signal is referred to as simulcast operation. A simulcast operation is required, for example, when the information signal has highly transient properties and the first encoder is not suitable for transient signal properties, so that a very large encoding error is produced which can lead to the fact that the difference signal that the comparator 110 determines needs more bits for encoding than the original information signal.
Such a scalable encoder as shown in FIG. 4 is, for example, defined in the MPEG-4-standard (ISO/IEC 14496-3:1999 subpart 4). For example, an MPEG CELP encoder can be used as first encoder or core encoder. The second encoder is an AAC encoder providing a high-quality audio encoding and being defined in the standard MPEG-2 AAC (ISO/IEC 13818). If the first encoder 102 is a CELP encoder, a downsampling stage is provided before the first encoder and an upsampling stage is provided after the first encoder. Further, a stage 105 with variable delay may be provided in the second branch, both prior to the first encoder and prior to the MDCT block 114 to delay time signals, so that the delay introduced by the first encoder and the first decoder is compensated so that signals corresponding to each other are compared at the first input 110a and at the second input 110b in the comparator.
The downsampling stage prior to the first encoder and the upsampling stage after the first decoder in the case of a CELP-encoder as core encoder serve to adjust the sampling rate of the information signal at the input 100 to the sampling rate required by the CELP encoder and vice versa.
Both the first scaling layer and the output 103 of the first encoder and the second scaling layer at the output 116 of the second encoder are fed to a bitstream multiplexer (not shown in FIG. 4) writing a bitstream according to a format that is also specified in the MPEG 4 standard.
It will be appreciated that the scalability does not only work for two scaling layers but that in principle any number of scaling layers may be provided, wherein an individual encoder must be present for each scaling layer in the encoder and wherein other comparators are further provided, which form a difference from two “channels” to be compared, in order to provide input signals to a encoder for a next higher scaling layer.
The decoder for decoding a scalded data stream first includes a bitstream demultiplexer to extract the first scaling layer and to further extract the second scaling layer. The first scaling layer is fed into the first decoder to obtain a decoded first scaling layer. The second scaling layer is fed into a second decoder to obtain a second decoded scaling layer. Depending on the implementation, the two decoded scaling layers can then either be combined in the time domain or in the frequency domain to obtain a decoded audio signal, which further has to be converted into the time domain to provide a time-decoded information signal when the combination is performed in the frequency domain.
Depending on the delay between the first decoder and the second decoder, delay stages are also provided in the decoder so that corresponding signals can be combined by the combiner.
In principle, any encoder can be employed as first encoder. This is a substantial feature and substantial advantage, respectively, for the concept of the scalable encoder, which is versatile in that the different encoder can be selected independently from one another, because the relation between them is established by the comparator. Thus, different encoders exist, which can be employed as first encoder which involves filtering the input signal. A simple voice encoder, processing, for example, only a bandwidth of 0 to 4 kHz can comprise a low-pass on the input side allowing only frequency portions of the information signal between 0 and 4 kHz to pass. Further, various encoders require a high-pass filter ensuring that no d.c. components of the information signal are fed into the encoder. Thus, the high-pass filter has a very low cutoff frequency, which is adapted for not allowing the d.c. component to pass, but for allowing the entire rest of the spectral content of the information signal to pass the filter, or for allowing spectral components above, for example, 20 Hz to pass the input high-pass filter in the case of an audio signal.
Filterings with a frequency-selective filter can introduce a non-linear frequency-dependent phase shift in the pass band of the filter. Thus, in the ideal case, the filter leaves the magnitude of the signal in the pass band untouched. However, each filter has a frequency response with respect to the phase. As is known, an exemplary first-order low-pass filter has a phase response in that d.c. components are almost not phase-shifted, but that frequency components of the information signal are shifted towards higher frequencies towards negative phases to be phase shifted by −45° at the cutoff frequency of the low-pass filter and to be phase shifted by up to −90° in the reject band of the low-pass filter. Thus, generally speaking, various encoders, which can, in principle, be employed as first encoder include filters and other components, respectively, which introduce a non-linear frequency-dependent phase shift into the information signal processed by the first encoder.
When looking at FIG. 4, it can be seen that in the case where the first encoder involves a non-linear frequency-dependent phase shift, signals corresponding to each other with regard to time are compared in the comparator, if respective time-delay stages exist, but that the signal fed at the first input 110a is non-linear and frequency-dependent phase-shifted in comparison to the signal fed in at the second input 110b. This phase shift shows in that the difference signal, which the comparator 110 calculates, is increased, because in the other branch, i.e. the branch connected to the second input 110b, there is no or very likely a different frequency-dependent phase shift.
Thus, the difference signal is greater than it actually should be, which basically decreases the total encoding efficiency since the second encoder 114 typically requires more bits for encoding a signal having more energy. In particular, the comparator will trigger a simulcast operation more often due to the increased difference signal, which is not advantageous for encoding efficiency reasons. Even if no simulcast operation is trigged, since the difference signal is still smaller than the information signal itself, the difference signal is still more bit-intensive in encoding than it actually should be.