The present invention relates to methods of and apparatus for coding discrete signals and decoding coded discrete signals, respectively, and in particular to implementing differential coding for scalable audio coders in efficient manner.
Scalable audio coders are coders of modular construction. There are endeavors to employ existing speech coders capable of processing signals, which are sampled e.g. with 8 kHz, and of outputting data rates of, for example, 4.8 to 8 kilobit per second. These known coders, such as e.g. the coders G.729, G.723, FS1016 and CELP known to experts, serve mainly for coding speech signals and in general are not suitable for coding higher-quality music signals since they are usually designed for signals sampled with 8 kHz, so that they can code only an audio bandwidth of 4 kHz at maximum. However, in general they exhibit faster operation and low calculating expenditure.
For audio coding of music signals, in order to obtain for example HIFI quality or CD quality, a scalable coder thus employs a combination of a speech coder and an audio coder that is capable of coding signals with a higher sampling rate, such as e.g. 48 kHz. It is of course also possible to replace the above-mentioned speech coder by a different coder, for example a music/audio coder according to the standards MPEG1, MPEG2 or MPEG3.
Such a cascade connection of a speech coder with a higher-grade audio coder usually employs the method of differential coding in the time domain. An input signal having e.g. a sampling rate of 48 kHz is downsampled to the sampling frequency suitable for the speech coder by means of a downsampling filter. The downsampled signal is then coded. The coded signal can be fed directly to a bit stream formatting means for transmission thereof. However, it contains only signals with a bandwidth of e.g. 4 kHz at maximum. The coded signal, furthermore, is decoded again and upsampled by means of an upsampling filter. However, due to the downsampling filter, the signal then obtained contains only useful information with a bandwidth of e.g. 4 kHz. Furthermore, it is to be noted that the spectral content of the upsampled coded/decoded signal in the lower band range up to 4 kHz does not correspond exactly to the first 4 kHz band of the input signal sampled with 48 kHz, since coders in general introduce coding errors (cf. xe2x80x9cFirst Ideas on Scalable Audio Codingxe2x80x9d, K. Brandenburg, B. Grill, 97th AES-Convention, San Francisco, 1994, Preprint 3924).
As was already pointed out, a scalable coder comprises both a generally known speech coder and an audio coder that is capable of processing signals with higher sampling rates. In order to be able to transmit signal components of the input signal having frequencies above 4 kHz, a difference is formed of the input signal with 8 kHz and the coded/decoded upsampled output signal of the speech coder for each individual time-discrete sampled value. This difference then may be quantized and coded by means of a known audio coder, as known to experts. It is to be noted here that the differential signal fed into the audio coder capable of coding signals with higher sampling rates, is substantially zero in the lower frequency range, leaving apart coding errors of the speech coder. In the spectral range above the bandwidth of the upsampled coded/decoded output signal of the speech coder, the differential signal substantially corresponds to the true input signal at 48 kHz.
In the first stage, i.e. the stage of the speech coder, a coder with low sampling frequency is thus used mostly, since in general a very low bit rate of the coded signal is aimed at. At present, there are several coders, also the coders mentioned, operating with bit rates of a few kilobit (two to eight kilobit or also above). The same coders, furthermore, permit a maximum sampling frequency of 8 kHz, since a greater audio bandwidth is not possible anyway with such a low bit rate and since coding with a low sampling frequency is more advantageous as regards the calculating expenditure. The maximum possible audio bandwidth is 4 kHz and in practical application is restricted to about 3.5 kHz. In case a bandwidth improvement is to be achieved then in the additional stage, i.e. in the stage including the audio coder, this additional stage will have to operate with a higher sampling frequency.
For matching the sampling frequencies, decimation and interpolation filters are used for downsampling and upsampling, respectively. As FIR filters (FIR=Finite Impulse Response) are used in general for obtaining an advantageous phase behavior, filter arrangements of several hundred coefficients or xe2x80x9ctapsxe2x80x9d can be required e.g. for matching from 8 kHz to 48 kHz.
It is the object of the present invention to provide methods of an apparatus for coding discrete signals and decoding coded discrete signals, respectively, which are capable of operating without complex upsampling filters.
This object is met by a method of coding according to claim 1, a method of decoding according to claim 13, an apparatus for coding according to claim 14, and an apparatus for decoding according to claim 15.
In accordance with a first aspect of the present invention, the object is met by a method of coding discrete first time signals sampled with a first sampling rate, by firstly generating second time signals, having a bandwidth corresponding to a second sampling rate, from the first time signals, with the second sampling rate being lower than the first sampling rate, secondly, coding the second time signals in accordance with a first coding algorithm in order to obtain coded second signals, third, decoding the coded second signals in accordance with the first coding algorithm in order to obtain coded/decoded second time signals having a bandwidth corresponding to the second sampling frequency, fourth, transforming the first time signals to the frequency domain to obtain first spectral values, fifth, generating second spectral values from the coded/decoded second time signals, the second spectral values being a representation of the coded/decoded second time signals in the frequency domain and having a time and frequency resolution substantially equal to the first spectral values, sixth, weighting the first spectral values with the second spectral values in order to obtain weighted spectral values which in number correspond to the number of the first spectral values, and coding the weighted spectral values in accordance with a second coding algorithm in order to obtain coded weighted spectral values.
Weighting the first spectral values and the second spectral values comprises the subtraction of the second spectral values from the first spectral values in to obtain differential spectral values.
In accordance with a second aspect of the present invention the above object is met by a method of decoding a coded discrete signal, by firstly decoding coded second signals to obtain coded/decoded second discrete time signals, with a first coding algorithm, secondly, decoding coded weighted spectral values with a second coding algorithm, to obtain weighted spectral values, thirdly, transforming the coded/decoded second discrete time signals to the frequency domain in order to obtain second spectral values, fourth, inversely weighting the weighted spectral values and the second spectral values to obtain first spectral values and retransforming the first spectral values to the time domain in order to obtain first discrete time signals.
In accordance with a third aspect of the present invention the above object is met by an apparatus for coding discrete first time signals sampled with a first sampling rate. The apparatus comprises several parts, such as, a generating device for generating second time signals, having a bandwidth corresponding to a second sampling rate, from the first time signals, with the second sampling rate being lower than the first sampling rate, a first coder for coding the second time signals in accordance with a first coding algorithm in order to obtain coded second signals, a decoder for decoding the coded second signals in accordance with the first coding algorithm in order to obtain coded/decoded second time signals having a bandwidth corresponding to the second sampling frequency, a transforming device for transforming the first time signals to the frequency domain to obtain first spectral values, a generating device for generating second spectral values from the coded/decoded second time signals, the second spectral values being a representation of the coded/decoded second time signals in the frequency domain and having a time and frequency resolution substantially equal to the first spectral values a weighting device for weighting the first spectral values with the second spectral values in order to obtain weighted spectral values which in number correspond to the number of the first spectral values, and a second coder for coding the weighted spectral values in accordance with a second coding algorithm in order to obtain coded weighted spectral values.
In accordance with a fourth aspect of the present invention the above object is met by an apparatus for decoding a coded time-discrete signal, comprising: a first decoder for decoding coded signals to obtain coded/decoded second discrete time signals, by means of a first coding algorithm; a second decoder for decoding coded weighted spectral values by means of a second coding algorithm, to obtain weighted spectral values; a transforming device for transforming the coded/decoded second discrete time signals to the frequency domain in order to obtain second spectral values; a weighting device for inversely weighting the weighted spectral values and the second spectral values to obtain first spectral values; and a transforming device for transforming the first spectral values to the time domain in order to obtain first discrete time signals.
An advantage of the present invention consists in that, with the apparatus for coding according to the invention (scalable audio coder), which comprises at least two separate coders, a second coder can operate in optimum marnner in consideration of the psychoacoustic model.
The invention is based on the realization that the upsampling filter involving much calculating time can be dispensed with when an audio coder or decoder, respectively, is employed which performs coding or decoding in the spectral range, and when the formation of the difference and, respectively, the formation of the inverse difference between the coded/decoded output signal of the coder or decoder of lower order and the original input signal, or the spectral representation of a signal based thereon, is carried out with a high sampling frequency in the frequency domain. It is thus no longer necessary to upsample the coded/decoded output signal of the coder of lower order by means of a conventional upsampling filter, but there are only two banks of filters necessary, namely one filter bank for just the coded/decoded output signal of the coder or lower order, and one filter bank for the original input signal with high sampling frequency.
Both of the filter banks mentioned deliver as output signals spectral values which are weighted by means of a suitable weighting means, which preferably is in the form of a subtracting means, in order to form weighted spectral values. These weighted spectral values then can be coded by means of a quantizer and coder in consideration of a psychoacoustic model. The data arising from quantizing and coding of the weighted spectral values can be fed to a bit formatting means preferably together with the coded signals of the coder of lower order, in order to be multiplexed in suitable manner, so that they can be transmitted or stored.
It is to be noted here that the savings in calculating time are in fact immense. In the afore-mentioned example, in which the speech coder processes signals sampled with 8 kHz and, furthermore, signals sampled with 48 kHz are to be coded, an upsampling FIR filter will require more than 100 multiplications per sampled value or sample, whereas a filter bank, which can be implemented by a MDCT as known to experts, requires merely ten to several ten (e.g. about 30) multiplications per sampled value.
It is to be pointed out here that with a scalable audio coder according to the present invention, the speech coder may also be replaced by an arbitrary coder according to the standards MPEG1 to MPEG3, as long as the two coders in the first and second stages are designed for two different sampling frequencies.