The present invention relates to scalable audio coders and in particular to methods of and apparatus for coding a time-discrete stereo signal.
Scalable audio coders are coders of modular construction. There are endeavors to employ existing voice coders capable of processing signals, which are sampled e.g. with 8 kHz, and of outputting data rates of, for example, 4.8 to 8 kilobit per second. These known coders, such as e.g. the coders G.729, G.723, FS1016 and CELP known to experts or parametric models of MPEG-4-Audio-VM, serve mainly for coding speech signals and in general are not suitable for coding higher-quality music signals since they are usually designed for signals sampled with 8 kHz, so that they can code only an audio bandwidth of 4 kHz at maximum. However, in general they exhibit fast operation and low arithmetic expenditure.
For audio coding of music signals, in order to obtain for example HIFI quality or CD quality, a scalable coder thus employs a combination of a voice coder and an audio coder that is capable of coding signals with a higher sampling rate, such as e.g. 48 kHz. It is of course also-possible to replace the above-mentioned voice coder by a different coder, for example a music/audio coder according to the standards MPEG1, MPEG2 or MPEG3.
Such a cascade connection of a voice coder with a higher-grade audio coder usually employs the method of differential coding in the time domain. An input signal having e.g. a sampling rate of 48 kHz is downsampled to the sampling frequency suitable for the voice coder by means of a downsampling filter. The downsampled signal is then coded. The coded signal can be fed directly to a bit stream formatting means for transmission thereof. However, it contains only signals with a bandwidth of e.g. 4 kHz at maximum. The coded signal, furthermore, is decoded again and upsampled by means of an upsampling filter. However, due to the downsampling filter, the signal then obtained contains only useful information with a bandwidth of e.g. 4 kHz. Furthermore, it is to be noted that the spectral content of the upsampled coded/decoded signal in the lower band range up to 4 kHz does not correspond exactly to the first 4 kHz band of the input signal sampled with 48 kHz, since coders in general introduce coding errors.
As was already pointed out, a scalable coder comprises both a generally known voice coder and an audio coder that is capable of processing signals with higher sampling rates. In order to be able to transmit signal components of the input signal whose frequencies are above 4 kHz, a difference is formed of the input signal with 8 kHz and the coded/decoded upsampled output signal of the voice coder for each individual time-discrete sampling value. This difference then may be quantized and coded by means of a known audio coder, as known to experts. It is to be noted here that the differential signal fed into the audio coder capable of coding signals with higher sampling rates, is much lower than the original in the lower frequency range, leaving apart coding errors of the voice coder. In the spectral range above the bandwidth of the upsampled coded/decoded output signal of the voice coder, the differential signal substantially corresponds to the true input signal sampled with e.g. 48 kHz.
In the first stage, i.e. the stage of the voice coder, a coder with low sampling frequency is thus used mostly, since in general a very low bit rate of the coded signal is aimed at. At present, there are several coders, also the coders mentioned, operating with bit rates of a few kilobit (two to eight kilobit or also above). The same coders, furthermore, permit a maximum sampling frequency of 8 kHz, since a greater audio bandwidth is not possible anyway with such a low bit rate and since coding with a low sampling frequency is more advantageous as regards the arithmetic expenditure. The maximum possible audio bandwidth is 4 kHz and in practical application is restricted to about 3.5 kHz. In case a bandwidth improvement is to be achieved now in the additional stage, i.e. in the stage including the audio coder, this additional stage will have to operate with a higher sampling frequency. For matching the sampling frequencies, decimation and interpolation filters are used for downsampling and upsampling, respectively.
However, so far only scalable coders for mono signals are known or implemented. However, it would be desirable to have a conception for scalable audio coders having joint-stereo capabilities. xe2x80x9cJoint-stereoxe2x80x9d is understood as stereo coding techniques, such as e.g. mid/side coding (M/S coding) or intensity-stereo coding (IS coding). When a separate scalable mono audio coder each is just employed for the left-hand (L) and right-hand (R) channels of a stereo signal, coding of a stereo signal is indeed possible, but coding does not take any account of joint-stereo techniques which may open up extensive saving possibilities in bit-saving coding of stereo signals.
It is the object of the present invention to make available a method of and an apparatus for coding a time-discrete stereo signals, which permit the utilization of joint-stereo techniques.
In accordance with a first aspect of the present invention, this object is met by a method of coding a time-discrete stereo signal, with the stereo signal having a first and a second channel, said method comprising the following steps: forming a mono signal from the stereo signal; coding the mono signal and transmitting the coded mono signal to a bit stream; decoding the coded mono signal; forming stereo information on the basis of the coded/decoded mono signal and the first and second channels; and coding the stereo information and transmitting the same to the bit stream.
In accordance with a second aspect of the present invention, this object is met by an apparatus for coding a time-discrete stereo signal, the stereo signal having a first and a second channel, said apparatus comprising: a device for forming a mono signal from the stereo signal; a mono coder for coding the mono signal and transmitting the coded mono signal to a bit stream; a mono decoder for decoding the coded mono signal; a device for forming stereo information on the basis of the coded/decoded mono signal and the first and second channels; and a stereo coder for coding the stereo information and for transmitting the same to the bit stream.
The present invention is based on the realization that a combination of joint-stereo techniques with the principle of scalability can be obtained when a mono signal is formed first, of the left-hand and right-hand channels of a stereo signal, which preferably can take place by summation. The mono signal is coded by means of a first coder, whereupon the signal resulting therefrom is fed to a bit stream multiplexer. The coded mono signal furthermore is decoded again in order to obtained a coded/decoded mono signal which differs from the original mono signal in that it has coding errors introduced by the first coder. From this coded/decoded mono signal and the left-hand and right-hand channels of the time-discrete stereo signal, items of stereo information can be produced which, for example, may be mid/side (M/S) information or intensity-stereo (IS) information or, under certain circumstances, also the original left-hand channel or the original right-hand channel. As will become apparent in the following, the coded/decoded mono signal itself or the difference of the original mono signal from the coded/decoded mono signal can also be used as stereo information in order to provide, together with the difference of left-hand and right-hand channels, which is also referred to as S signal, directly mid/side coding. The stereo information, by way of a second coder having the same construction as the first coder or a construction different from the first coder, can now be coded and also be fed to a bit stream multiplexer generating a bit stream from the coded mono signal and the coded stereo information as well as from the side information necessary for subsequent decoding.
The formation of the mono signal and coding thereof can take place in the time domain, when e.g. a voice coder is used as first coder or core coder. The formation and coding of stereo information preferably takes place in the frequency domain as recourse can then be taken to powerful coders operating in accordance with the psychoacoustic model.
However, it is also possible, prior to further processing, to transform the right-hand and left-hand channels to the frequency domain, with the result that a frequency domain coder can also be employed for coding the mono signal, which is capable of coding in as distortion-free manner as possible using the psychoacoustic model.
If for the first coder, i.e. for the coder for the mono signal, a coder is employed having a lower sampling rate than the time-discrete stereo signal to be coded, the mono signal formed from summation of the left-hand and right-hand channels must first be transformed to the lower sampling frequency, which is also referred to as downsampling. The mono signal transformed to the lower sampling frequency then is coded and decoded again, with the coded/decoded mono signal also having the lower sampling frequency. The coded/decoded mono signal, for permitting correlation thereof with the left-hand and right-hand channels sampled at a higher rate so as to provide stereo information, must be converted again to the sampling frequency of the time-discrete stereo signal, which is also referred to as upsampling. If this coded/decoded mono signal obtained by upsampling is subjected to frequency domain transformation, which prefereably may be implemented as MDCT (MDCT=modified discrete cosine transformation), the resulting transformed coded/decoded mono signal has the same time and frequency resolution as the original time-discrete stereo signal, i.e. the left-hand (L) and the right-hand (R) channel.
If, in constrast thereto, the first coder is operated with the same sampling rate as that inherent the time-discrete stereo signal, downsampling and upsampling of course can be dispensed with.